comple list slices

W

William Meyer

Hi,

I have a list of rows which contains a list of cells (from a html table), and I
want to create an array of logical row groups (ie group rows by the rowspan). I
am only concerned with checking the rowspan of specific columns, so that makes
it easier, but I am having trouble implementing it in python. In perl/c I could
use a for loop and modify the control variable as I walked the array:

my (@rowgroups);
for (my ($i) = 0; $i < $#rows; $i++) {
my ($rowspan) = $rows[$i][0]->attr("rowspan") || 1;
$rowspan--; # 0 indexed

push @rowgroups, $rows[$i .. $i+$rowspan];

$i += $rowspan;
}

but in python I can only come up with this:

rowgroups = []
rowspan = 0
for i in rows:
if rowspan > 0:
rowspan -= 1
continue

rowspan = rows[j][0]["rowspan"] or 1
rowgroups.append(rows[ rows.index(i) : rows.index(i) + rowspan ])

rowspan -= 1

I really dont like this solution. Any ideas?
 
S

shandy.b

A couple questions:

1- what is j?
2- what does the rows[x][y] object look like? I assume it's a dict
that has a "rowspan" key. Can rows[x][y]["rowspan"] sometimes be 0?

Perhaps you're looking for something like this:
rowgroups = []
rowspan = 0
for i in range( len(rows) ):
if rowspan <= 0:
rowspan = rows[j][0]["rowspan"]
if rowspan == 0:
rowspan = 1

rowgroups.append(rows[ i : i + rowspan ])

rowspan -= 1

-sjbrown
 
J

johnzenger

Python lets you iterate through a list using an integer index, too,
although if you do so we will make fun of you. You can accomplish it
with a while loop, as in:

i = 0
while i < len(rows):
if rows == "This code looks like BASIC without the WEND, doesn't
it?":
rowgroups.append("Pretty much.")
i += 1 # or i += rowspan, whatever.

Do not try to do this with a for loop. In Python, "for i in xrange(5)"
is more like a "foreach $i ( {0,1,2,3,4,5})" in Perl, so changing i in
the loop will not change the value of i on the next loop iteration.
 
W

William Meyer

A couple questions:

1- what is j?
2- what does the rows[x][y] object look like? I assume it's a dict
that has a "rowspan" key. Can rows[x][y]["rowspan"] sometimes be 0?

Perhaps you're looking for something like this:
rowgroups = []
rowspan = 0
for i in range( len(rows) ):
if rowspan <= 0:
rowspan = rows[j][0]["rowspan"]
if rowspan == 0:
rowspan = 1

rowgroups.append(rows[ i : i + rowspan ])

rowspan -= 1

-sjbrown

oops, typo
row[j] should be i
that line should read:
rowspan = i[0]["rowspan"]

rows is a list of lists which contains td tag objects from beautifulsoup.
rowspan is a tag attribute. You solution is clearer then mine, but the approach
is the same. Is there finer grained control of loops in python? (or in a perfect
world a way to group rows this way with list comprehensions?)

Rowspan can be none, meaning 1, and it can be zero. Zero means the row extends
from the current cell until the end of the table. That isnt used much, and not
in the table I am scraping at all.
 
W

William Meyer

Python lets you iterate through a list using an integer index, too,
although if you do so we will make fun of you. You can accomplish it
with a while loop, as in:

i = 0
while i < len(rows):
if rows == "This code looks like BASIC without the WEND, doesn't
it?":



ahh, that would work. Yea its really ugly too. I will probably just use the
shandy.b's suggestion squirreled away in a method. I shouldnt even care, its
just one extra conditional per row
 
J

johnzenger

Although I don't know if this is faster or more efficient than your
current solution, it does look cooler:

def grouprows(inrows):
rows = []
rows[:] = inrows # makes a copy because we're going to be
deleting
while len(rows) > 0:
rowspan = rows[0]["rowspan"]
yield rows[0:rowspan] # "returns" this value, but control flow
unaffected
del rows[0:rowspan] # remove what we just returned from the
list, and loop

grouper = grouprows(copyrows)
print [x for x in grouper]

This is basically just a simple function that rips chunks off the front
of a list and returns them. Because the function uses "yield" rather
than "return," it becomes a generator, which can be treated by Python
as an iterator.
 
W

William Meyer

Although I don't know if this is faster or more efficient than your
current solution, it does look cooler:

def grouprows(inrows):
rows = []
rows[:] = inrows # makes a copy because we're going to be
deleting
while len(rows) > 0:
rowspan = rows[0]["rowspan"]
yield rows[0:rowspan] # "returns" this value, but control flow
unaffected
del rows[0:rowspan] # remove what we just returned from the
list, and loop

grouper = grouprows(copyrows)
print [x for x in grouper]

wow, i think this is much better then my solution. And you can easily call it
for subgroups:

grouper = grouprows(rows)
for x in grouper
grouperTwo = grouprows(x)
for y in grouperTwo

Do i need to copy the list in the iterator? (if I am not planning on using rows
again) The reference count for the list members will get bumped on yield right?
 
J

johnzenger

You don't need to copy the list; but if you don't, your original list
will be emptied.

Len(rows) recalculates each time the while loop begins. Now that I
think of it, "rows != []" is faster than "len(rows) > 0."

By the way, you can also do this using (gasp) a control index:

def grouprows(rows):
index = 0
while len(rows) > index:
rowspan = rows[index]["rowspan"]
yield rows[index:rowspan + index]
index += rowspan

....which kind of brings us back to where we started.
 
F

Felipe Almeida Lessa

Em Ter, 2006-02-28 às 09:10 -0800, (e-mail address removed) escreveu:
Although I don't know if this is faster or more efficient than your
current solution, it does look cooler: [snip]
print [x for x in grouper]

This is not cool. Do

print list(grouper)

--
"Quem excele em empregar a força militar subjulga os exércitos dos
outros povos sem travar batalha, toma cidades fortificadas dos outros
povos sem as atacar e destrói os estados dos outros povos sem lutas
prolongadas. Deve lutar sob o Céu com o propósito primordial da
'preservação'. Desse modo suas armas não se embotarão, e os ganhos
poderão ser preservados. Essa é a estratégia para planejar ofensivas."

-- Sun Tzu, em "A arte da guerra"
 
F

Fredrik Lundh

(e-mail address removed):
Len(rows) recalculates each time the while loop begins. Now that I
think of it, "rows != []" is faster than "len(rows) > 0."

the difference is very small, and "len(rows)" is faster than "rows != []"
(the latter creates a new list for each test).

and as usual, using the correct Python spelling ("rows") is a lot faster
(about three times in 2.4, according to timeit).

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,285
Messages
2,571,416
Members
48,108
Latest member
Virus9283

Latest Threads

Top