split an iteration

Robin Becker · Mar 31, 2005

This function from texlib in oedipus.sf.net is a real cpu hog and I determined
to see if it could be optimized.

def add_active_node(self, active_nodes, node):
"""Add a node to the active node list.
The node is added so that the list of active nodes is always
sorted by line number, and so that the set of (position, line,
fitness_class) tuples has no repeated values.
"""

index = 0

# Find the first index at which the active node's line number
# is equal to or greater than the line for 'node'. This gives
# us the insertion point.
while (index < len(active_nodes) and
active_nodes[index].line < node.line):
index = index + 1

insert_index = index

# Check if there's a node with the same line number and
# position and fitness. This lets us ensure that the list of
# active nodes always has unique (line, position, fitness)
# values.
while (index < len(active_nodes) and
active_nodes[index].line == node.line):
if (active_nodes[index].fitness_class == node.fitness_class and
active_nodes[index].position == node.position):
# A match, so just return without adding the node
return

index = index + 1

active_nodes.insert(insert_index, node)

I determined that len was being called a large number of times so did a first
rewrite as (minus comments)

def add_active_node(self, active_nodes, node):
index = 0
nan = len(active_nodes)
node_line = node.line
node_fitness_class = node.fitness_class
node_position = node.position

while index < nan and active_nodes[index].line<node_line:
index += 1

insert_index = index

while index<nan and active_nodes[index].line==node_line:
if (active_nodes[index].fitness_class==node_fitness_class and
active_nodes[index].position==node_position):
return

index += 1

active_nodes.insert(insert_index, node)

which gives a good speedup even so I decided to eliminate the index += 1 in the
first while loop which results in

def add_active_node(self, active_nodes, node):
nan = len(active_nodes)
node_line = node.line
node_fitness_class = node.fitness_class
node_position = node.position

insert_index = nan
for index, a in enumerate(active_nodes):
if a.line>=node_line:
insert_index = index
break
index = insert_index

while index<nan and active_nodes[index].line==node_line:
if (active_nodes[index].fitness_class==node_fitness_class and
active_nodes[index].position==node_position):
return

index += 1

active_nodes.insert(insert_index, node)

and this change also results in a significant speedup and is I believe is still
equivalent. I'm not sure exactly why this is faster than the while loop, but it is.

However, it seems harder to get the same speedup for the last while loop; that
loop is probably not such a problem so it's not terribly important.

Is there a fast way to get enumerate to operate over a slice of an iterable?

Peter Otten · Mar 31, 2005

Robin said:
Is there a fast way to get enumerate to operate over a slice of an
iterable?

I think you don't need that here:

e = enumerate(active_nodes)
for insert_index, a in e:
# ...

for index, a in e:
# ...

Peter

Raymond Hettinger · Mar 31, 2005

[Robin Becker]

This function from texlib in oedipus.sf.net is a real cpu hog and I determined
to see if it could be optimized.

def add_active_node(self, active_nodes, node):
"""Add a node to the active node list.
The node is added so that the list of active nodes is always
sorted by line number, and so that the set of (position, line,
fitness_class) tuples has no repeated values.
"""

If you can change the data structure to be an actual list of tuples, then the
bisect module can be used directly:

insert_index = bisect.bisect_left(active_nodes, node)
if active_nodes[insert_index] == node:
return # avoid creating a duplicate entry
active_nodes.insert(insert_index, node)

If the data structure cannot be changed to tuples, then try adding a custom
compare operation to the node class:

def __cmp__(self, other):
return cmp((self.line, self.position, self.fitness_class),
(other.line, other.position, other.fitness_class))

insert_index = nan
for index, a in enumerate(active_nodes):
if a.line>=node_line:
insert_index = index
break
index = insert_index

This loop can be squeezed a bit more using itertools.imap() and
operator.attrgetter() for the attribute lookup:

for index, aline in enumerate(imap(attrgetter('line'), active_nodes):
if aline > node_line:
. . .

Is there a fast way to get enumerate to operate over a slice of an iterable?

enumerate(s) is the same as izip(count(), s).
So, to start from position i, write:

for index, value in izip(count(i), s[i:]):
. . .

That being said, your best bet is to eliminate the initial linear search which
is likely consuming most of the clock cycles.

Also, I noticed that the code does not reference self. Accordingly, it is a
good candidate for being a staticmethod or standalone function.

Raymond Hettinger

Robin Becker · Mar 31, 2005

Peter said:
Robin Becker wrote:

I think you don't need that here:

e = enumerate(active_nodes)
for insert_index, a in e:
# ...

for index, a in e:
# ...

Peter

I tried your solution, but I think we miss the split element

eg for

>>> e = enumerate([0,1,2,3,4,5])
>>> for i,a in e:

Click to expand...

Click to expand...

.... if a==3: break
........ print i,a
....
4 4
5 5
I think the second loop needs to start at 3 ie the split needs to be
start, limit semantics

It would be nice to be able to fix it with a move back method.

Robin Becker · Mar 31, 2005

Raymond said:
[Robin Becker]

This function from texlib in oedipus.sf.net is a real cpu hog and I determined
to see if it could be optimized.

def add_active_node(self, active_nodes, node):
"""Add a node to the active node list.
The node is added so that the list of active nodes is always
sorted by line number, and so that the set of (position, line,
fitness_class) tuples has no repeated values.
"""

Click to expand...

If you can change the data structure to be an actual list of tuples, then the
bisect module can be used directly:

This is a way forward and I think is doable. The original Knuth algo
used only global integer arrays. The actual insert point depends on the
line attribute only which would be the first tuple element. So
apparently we always insert non-identical line breaks at the beginning
of their line group; I think we can do the insert check and then a bit
of checking to find the actual insert point using bisect_right handwave
handwave.....

insert_index = bisect.bisect_left(active_nodes, node)
if active_nodes[insert_index] == node:
return # avoid creating a duplicate entry
active_nodes.insert(insert_index, node)

If the data structure cannot be changed to tuples, then try adding a custom
compare operation to the node class:

def __cmp__(self, other):
return cmp((self.line, self.position, self.fitness_class),
(other.line, other.position, other.fitness_class))

insert_index = nan
for index, a in enumerate(active_nodes):
if a.line>=node_line:
insert_index = index
break
index = insert_index

Click to expand...

This loop can be squeezed a bit more using itertools.imap() and
operator.attrgetter() for the attribute lookup:

for index, aline in enumerate(imap(attrgetter('line'), active_nodes):
if aline > node_line:
. . .

Is there a fast way to get enumerate to operate over a slice of an iterable?

Click to expand...

enumerate(s) is the same as izip(count(), s).
So, to start from position i, write:

for index, value in izip(count(i), s[i:]):
. . .

That being said, your best bet is to eliminate the initial linear search which
is likely consuming most of the clock cycles.

Also, I noticed that the code does not reference self. Accordingly, it is a
good candidate for being a staticmethod or standalone function.

Raymond Hettinger

Peter Otten · Apr 1, 2005

Robin said:
eg for

e = enumerate([0,1,2,3,4,5])
for i,a in e:

Click to expand...

Click to expand...

... if a==3: break
...... print i,a
...
4 4
5 5
I think the second loop needs to start at 3 ie the split needs to be
start, limit semantics

It would be nice to be able to fix it with a move back method.

I have to reread your previous post, it seems. Meanwhile:
.... if a == 3:
.... for i, a in itertools.chain([(i, a)], e):
.... print i, a
.... break
....
3 3
4 4
5 5

Nesting the loops is necessary to handle empty lists and lists with no
matching item correctly. Alternatively, you could set a 'found' flag.

Another option:
.... print i, a
....
3 3
4 4
5 5

The extra function call might have a negative impact on performance, though.

Peter

palindrome iteration	53	Aug 27, 2010
Single put routine overlapping words during iteration	4	Jan 2, 2023
Trying to build a SARIMAX model to forecast the S&P500 trend	0	Nov 5, 2023
Z-Index/Drop-down menu issues	2	Jun 4, 2023
When draggable=false & draggable=true dont work ?	1	Jan 14, 2023
set and dict iteration	42	Aug 16, 2012
How to stop an animated image from sticking to top of page?	2	Feb 17, 2022
Implementing Many Stacks in the Same Program	1	Aug 10, 2021

split an iteration

Robin Becker

Peter Otten

Raymond Hettinger

Robin Becker

Robin Becker

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads