timing puzzle

Robin Becker · Nov 16, 2007

I'm trying to get my head round a timing puzzle I find whilst optimizing A
Kuchling's version of the Knuth line splitting algorithm from the oedipus
project. The puzzle is as follows in highly abstract form (here
active_nodes is a list of objects kept in a sorted order, but there are no
special methods on the objects for comparison etc)

for i in xrange(m):
.......
for A in active_nodes[:]:
......
if cond:
active_nodes.remove(A)
......
.....

is slow my test takes 2.7 seconds; profiling reveals we're calling A.remove
thousands of times and it's taking a lot of time. There appear to be no other
usages of active_nodes in the for A loop.

On a hunch I changed the above to

removed = []
aremoved = removed.append
for i in xrange(m):
.......
for iA,A in enumerate(active_nodes):
......
if cond:
aremoved(iA)
......
if removed:
for iA in reversed(removed):
del active_nodes[iA]
del removed[:]
......

this latter is 5 times faster, but I can't really see why. My only guess is that
appending to the removed list is relatively easy compared with deletes and the
reverse order delete is somehow moving less data about. I think in the original
Knuth the deletion would have been from some real kind of list and O(1) whereas
here deletions are O(n/2) on average. On the other hand we have to keep this
list in sorted order. What data structure should I be using? I should add that I
tried using a __cmp__ method to assist in doing the sorted insert, but that
interfered with the simple active_nodes.remove.

Neil Cerutti · Nov 16, 2007

I'm trying to get my head round a timing puzzle I find whilst
optimizing A Kuchling's version of the Knuth line splitting
algorithm from the oedipus project. The puzzle is as follows in
highly abstract form (here active_nodes is a list of objects
kept in a sorted order, but there are no special methods on the
objects for comparison etc)

for i in xrange(m):
.......
for A in active_nodes[:]:
......
if cond:
active_nodes.remove(A)
......
.....

is slow my test takes 2.7 seconds; profiling reveals we're
calling A.remove thousands of times and it's taking a lot of
time. There appear to be no other usages of active_nodes in the
for A loop.

On a hunch I changed the above to

removed = []
aremoved = removed.append
for i in xrange(m):
.......
for iA,A in enumerate(active_nodes):
......
if cond:
aremoved(iA)
......
if removed:

'if removed' is reduntant, as the loop body below will not be
executed if removed is empty.

for iA in reversed(removed):
del active_nodes[iA]
del removed[:]
......

this latter is 5 times faster, but I can't really see why.

You are no longer making m copies of active_nodes.

My only guess is that appending to the removed list is
relatively easy compared with deletes and the reverse order
delete is somehow moving less data about. I think in the
original Knuth the deletion would have been from some real kind
of list and O(1) whereas here deletions are O(n/2) on average.
On the other hand we have to keep this list in sorted order.
What data structure should I be using?

When you have to make many deletions from the middle of a
sequence, you would normally choose a linked list. Python doesn't
provide much support for linked lists, unfortunately.

Instead, filter your list. It looks like you can't use filter
directly, so just do it manually.

for i in xrange(m):
.......
saved_nodes = []
for A in active_nodes[:]:
......
if not cond:
saved_nodes.append(A)

......
active_nodes = saved_nodes
.....

Neil Cerutti · Nov 16, 2007

Instead, filter your list. It looks like you can't use filter
directly, so just do it manually.

for i in xrange(m):
.......
saved_nodes = []
for A in active_nodes[:]:

I meant to remove the slice. That line should be:

for A in active_nodes:

Robin Becker · Nov 16, 2007

Neil Cerutti wrote:
........

see why.

You are no longer making m copies of active_nodes.

my profiling indicated that the main problem was the removes.

.........

When you have to make many deletions from the middle of a
sequence, you would normally choose a linked list. Python doesn't
provide much support for linked lists, unfortunately.

Instead, filter your list. It looks like you can't use filter
directly, so just do it manually.

for i in xrange(m):
.......
saved_nodes = []
for A in active_nodes[:]:
......
if not cond:
saved_nodes.append(A)

......
active_nodes = saved_nodes
.....

I like this approach, better than mine.

Neil Cerutti · Nov 16, 2007

Neil Cerutti wrote:
.......

see why.

my profiling indicated that the main problem was the removes.

Yeah, I should've added, "for one thing." I'm glad Chris
correctly pointed out that you used del in one case, and remove
in the other.

sqlite3 puzzle	9	Jan 15, 2013
split an iteration	5	Mar 31, 2005
Blue J Ciphertext Program	2	Nov 22, 2023
doctest annoyance/puzzle	2	Sep 5, 2010
minfuds - coding puzzle	0	Jul 15, 2013
Creating books with Sets	1	Sep 27, 2022
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
How to use Densenet121 in monai	0	Feb 16, 2024

timing puzzle

Robin Becker

Neil Cerutti

Neil Cerutti

Robin Becker

Neil Cerutti

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads