Beautiful Soup iterator question....

C

cjl

P:

I am screen-scraping a table. The table has an unknown number of rows,
but each row has exactly 8 cells. I would like to extract the data
from the cells, but the first three cells in each row have their data
nested inside other tags.

So I have the following code:

for row in table.findAll("tr"):
for cell in row.findAll("td"):
print cell.contents[0]

This code prints out all the data, but of course the first three cells
still contain their unwanted tags.

I would like to do something like this:

for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
row.findAll("td"):

Then treat each cell differently.

I can't figure this out. Can anyone point me in the right direction?

-CJL
 
S

Steve Holden

cjl said:
P:

I am screen-scraping a table. The table has an unknown number of rows,
but each row has exactly 8 cells. I would like to extract the data
from the cells, but the first three cells in each row have their data
nested inside other tags.

So I have the following code:

for row in table.findAll("tr"):
for cell in row.findAll("td"):
print cell.contents[0]

This code prints out all the data, but of course the first three cells
still contain their unwanted tags.

I would like to do something like this:

for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
row.findAll("td"):

Then treat each cell differently.

I can't figure this out. Can anyone point me in the right direction?
did you try something like (untested)

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = row.findAll("td")

No need for the "for" if you want to handle each cell differently, you
won;t be iterating over htem . And, as you saw, it doesn't work unless
row.findAll(...) returns a sequence of eight-item containers.

regards
Steve
 
P

Paul McGuire

did you try something like (untested)

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = row.findAll("td")

No need for the "for" if you want to handle each cell differently, you
won;t be iterating over htem . And, as you saw, it doesn't work unless
row.findAll(...) returns a sequence of eight-item containers.

One defensive approach to handle rows that might have too few or too
many elements, is to construct a larger list, and then slice the right
number of elements from it.

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = (row.findAll("td") + [None]*8)[:
8]

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top