help with for loop----python 2.7.2

T

teddybubu

I am trying to get all the element data from the rss below.
The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code....


try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen

from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)

for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text
print item
print title
print link
 
I

Ian Kelly

I am trying to get all the element data from the rss below.
The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code....
[SNIP]

for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text

The three find method calls in the for loop are searching from the
document root (the "soup" variable), not from the item you're
currently iterating at. Try changing these to calls of item.find. And
note that calling one of the results "item" will replace the loop
variable. That won't affect the iteration, but it's bad practice to
refer to two different things by the same local name.
 
T

tad na

I am trying to get all the element data from the rss below.

The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code....
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
from bs4 import BeautifulSoup
soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text
print item
print title
print link
OK . second problem :)
I can print the date. not sure how to do this one..
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")

x=0
for item in soup.find_all('item'):
title = item.find('title').text
link = item.find('link').text
date = item.find('pubDate')
# print date
print('+++++++++++++++++')
print data[x].title.text
print data[x].link.text
print data[x].guid.text
print data[x].pubDate
x = x + 1
 
T

tad na

I am trying to get all the element data from the rss below.

The only thing I am pulling is the first element.


I don't understand why the for loop does not go through the entire rss.


Here is my code....

from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
from bs4 import BeautifulSoup
soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text
print item
print title
print link

OK . second problem :)

I can print the date. not sure how to do this one..

try:

from urllib2 import urlopen

except ImportError:

from urllib.request import urlopen

import urllib2

from bs4 import BeautifulSoup



soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))

#print soup.find_all('item')

#print (soup)

data = soup.find_all("item")



x=0

for item in soup.find_all('item'):

title = item.find('title').text

link = item.find('link').text

date = item.find('pubDate')

# print date

print('+++++++++++++++++')

print data[x].title.text

print data[x].link.text

print data[x].guid.text

print data[x].pubDate

x = x + 1

meant to say CANNOT print the date
 
I

Ian Kelly

OK . second problem :)
I can print the date. not sure how to do this one..

Why not? What happens when you try?
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")

x=0
for item in soup.find_all('item'):
title = item.find('title').text
link = item.find('link').text
date = item.find('pubDate')
# print date
print('+++++++++++++++++')
print data[x].title.text
print data[x].link.text
print data[x].guid.text
print data[x].pubDate
x = x + 1

data[x] should be the same object as item, no? If you want to keep track of
the current iteration index, a cleaner way to do that is by using enumerate:

for x, item in enumerate(soup.find_all('item')):

As far as printing the pubDate goes, why not start by getting its text
property as you do with the other tags? From there you can either print the
string out directly or parse it into a datetime object.
 
M

Mark Lawrence

I've snipped the bulk of the message, but imagine what the above looks
like when its been back and forth through gg a few times, it's
effectively unreadable.
 
T

tad na

OK . second problem :)
I can print the date.  not sure how to do this one..
Why not? What happens when you try?
try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")
x=0
for item in soup.find_all('item'):
    title = item.find('title').text
    link = item.find('link').text
    date = item.find('pubDate')
   # print date
    print('+++++++++++++++++')
    print data[x].title.text
    print data[x].link.text
    print data[x].guid.text
    print data[x].pubDate
    x = x + 1
data[x] should be the same object as item, no? If you want to keep track of the current iteration index, a cleaner way to do that is by using enumerate:
    for x, item in enumerate(soup.find_all('item')):
As far as printing the pubDate goes, why not start by getting its text property as you do with the other tags? From there you can either print the string out directly or parse it into a datetime object.

This is the error I get with
1. print data[x].pubDate.text
AttributeError: 'NoneType' object has no attribute 'text'
2. print data[x].pubDate
It results in "None"
 
I

Ian Kelly

This is the error I get with
1. print data[x].pubDate.text
AttributeError: 'NoneType' object has no attribute 'text'
2. print data[x].pubDate
It results in "None"

So the problem is that it's not even finding the pubDate tag in the first
place. Some sites on the Web suggest that beautiful soup normalizes all
tags to lowercase; try looking for the pubdate tag instead.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top