help with for loop----python 2.7.2

T

teddybubu

Mar 22, 2014

#1

I am trying to get all the element data from the rss below.
The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code....

try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen

from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)

for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text
print item
print title
print link

I

Ian Kelly

Mar 22, 2014

#2

I am trying to get all the element data from the rss below.
The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code....
[SNIP]

for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text

The three find method calls in the for loop are searching from the
document root (the "soup" variable), not from the item you're
currently iterating at. Try changing these to calls of item.find. And
note that calling one of the results "item" will replace the loop
variable. That won't affect the iteration, but it's bad practice to
refer to two different things by the same local name.

T

tad na

Mar 23, 2014

#3

I am trying to get all the element data from the rss below.

The only thing I am pulling is the first element.

I don't understand why the for loop does not go through the entire rss.

Here is my code....
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
from bs4 import BeautifulSoup
soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text
print item
print title
print link

OK . second problem

I can print the date. not sure how to do this one..
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")

x=0
for item in soup.find_all('item'):
title = item.find('title').text
link = item.find('link').text
date = item.find('pubDate')
# print date
print('+++++++++++++++++')
print data[x].title.text
print data[x].link.text
print data[x].guid.text
print data[x].pubDate
x = x + 1

T

tad na

Mar 23, 2014

#4

I am trying to get all the element data from the rss below.

The only thing I am pulling is the first element.

I don't understand why the for loop does not go through the entire rss.

Here is my code....

from urllib2 import urlopen

Click to expand...

except ImportError:

Click to expand...

from urllib.request import urlopen

Click to expand...

from bs4 import BeautifulSoup

Click to expand...

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))

Click to expand...

#print soup.find_all('item')

Click to expand...

#print (soup)

Click to expand...

for item in soup.find_all('item'):

Click to expand...

#for item in soup:

Click to expand...

title = soup.find('title').text

Click to expand...

link = soup.find('link').text

Click to expand...

item = soup.find('item').text

Click to expand...

print item

Click to expand...

print title

Click to expand...

print link

Click to expand...

OK . second problem

I can print the date. not sure how to do this one..

try:

from urllib2 import urlopen

except ImportError:

from urllib.request import urlopen

import urllib2

from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))

#print soup.find_all('item')

#print (soup)

data = soup.find_all("item")

x=0

for item in soup.find_all('item'):

title = item.find('title').text

link = item.find('link').text

date = item.find('pubDate')

# print date

print('+++++++++++++++++')

print data[x].title.text

print data[x].link.text

print data[x].guid.text

print data[x].pubDate

x = x + 1

meant to say CANNOT print the date

M

Mark Lawrence

Mar 23, 2014

#5

On 23/03/2014 17:30, tad na wrote:

Would you please use the mailing list
https://mail.python.org/mailman/listinfo/python-list or read and action
this https://wiki.python.org/moin/GoogleGroupsPython to prevent us
seeing double line spacing and single line paragraphs, thanks.

I

Ian Kelly

Mar 23, 2014

#6

OK . second problem
I can print the date. not sure how to do this one..

Why not? What happens when you try?

try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")

x=0
for item in soup.find_all('item'):
title = item.find('title').text
link = item.find('link').text
date = item.find('pubDate')
# print date
print('+++++++++++++++++')
print data[x].title.text
print data[x].link.text
print data[x].guid.text
print data[x].pubDate
x = x + 1

data[x] should be the same object as item, no? If you want to keep track of
the current iteration index, a cleaner way to do that is by using enumerate:

for x, item in enumerate(soup.find_all('item')):

As far as printing the pubDate goes, why not start by getting its text
property as you do with the other tags? From there you can either print the
string out directly or parse it into a datetime object.

T

tad na

Mar 23, 2014

#7

On 23/03/2014 17:30, tad na wrote:
Would you please use the mailing list
https://mail.python.org/mailman/listinfo/python-list or read and action
this https://wiki.python.org/moin/GoogleGroupsPython to prevent us
seeing double line spacing and single line paragraphs, thanks.

mark not sure what i did wrong. The double line in the code is mine.
it helps me keep things separate.

M

Mark Lawrence

Mar 23, 2014

#8

I've snipped the bulk of the message, but imagine what the above looks
like when its been back and forth through gg a few times, it's
effectively unreadable.

T

tad na

Mar 23, 2014

#9

OK . second problem
I can print the date. not sure how to do this one..

Click to expand...

Why not? What happens when you try?

try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")
x=0
for item in soup.find_all('item'):
title = item.find('title').text
link = item.find('link').text
date = item.find('pubDate')
# print date
print('+++++++++++++++++')
print data[x].title.text
print data[x].link.text
print data[x].guid.text
print data[x].pubDate
x = x + 1

Click to expand...

data[x] should be the same object as item, no? If you want to keep track of the current iteration index, a cleaner way to do that is by using enumerate:

for x, item in enumerate(soup.find_all('item')):

As far as printing the pubDate goes, why not start by getting its text property as you do with the other tags? From there you can either print the string out directly or parse it into a datetime object.

This is the error I get with
1. print data[x].pubDate.text
AttributeError: 'NoneType' object has no attribute 'text'
2. print data[x].pubDate
It results in "None"

I

Ian Kelly

Mar 23, 2014

#10

This is the error I get with
1. print data[x].pubDate.text
AttributeError: 'NoneType' object has no attribute 'text'
2. print data[x].pubDate
It results in "None"

So the problem is that it's not even finding the pubDate tag in the first
place. Some sites on the Web suggest that beautiful soup normalizes all
tags to lowercase; try looking for the pubdate tag instead.

Need help with this code	2	May 10, 2023
TypeError: not all arguments converted during string formatting	2	Dec 13, 2013
Crawling	1	Mar 10, 2021
strip away html tags from extracted links	2	Nov 29, 2013
Help with Loop	0	Mar 30, 2023
python-parser running Beautiful Soup only spits out one line of 10.What i have gotten wrong here?	1	Dec 25, 2010
Need Help with the BeautifulSoup problem, please	5	Dec 16, 2013
How to extract contents of inner text of html tag?	0	Mar 1, 2014

teddybubu

Ian Kelly

tad na

tad na

Mark Lawrence

Ian Kelly

tad na

Mark Lawrence

tad na

Ian Kelly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads