R
Robin Becker
I'm trying to get my head round a timing puzzle I find whilst optimizing A
Kuchling's version of the Knuth line splitting algorithm from the oedipus
project. The puzzle is as follows in highly abstract form (here
active_nodes is a list of objects kept in a sorted order, but there are no
special methods on the objects for comparison etc)
for i in xrange(m):
.......
for A in active_nodes[:]:
......
if cond:
active_nodes.remove(A)
......
.....
is slow my test takes 2.7 seconds; profiling reveals we're calling A.remove
thousands of times and it's taking a lot of time. There appear to be no other
usages of active_nodes in the for A loop.
On a hunch I changed the above to
removed = []
aremoved = removed.append
for i in xrange(m):
.......
for iA,A in enumerate(active_nodes):
......
if cond:
aremoved(iA)
......
if removed:
for iA in reversed(removed):
del active_nodes[iA]
del removed[:]
......
this latter is 5 times faster, but I can't really see why. My only guess is that
appending to the removed list is relatively easy compared with deletes and the
reverse order delete is somehow moving less data about. I think in the original
Knuth the deletion would have been from some real kind of list and O(1) whereas
here deletions are O(n/2) on average. On the other hand we have to keep this
list in sorted order. What data structure should I be using? I should add that I
tried using a __cmp__ method to assist in doing the sorted insert, but that
interfered with the simple active_nodes.remove.
Kuchling's version of the Knuth line splitting algorithm from the oedipus
project. The puzzle is as follows in highly abstract form (here
active_nodes is a list of objects kept in a sorted order, but there are no
special methods on the objects for comparison etc)
for i in xrange(m):
.......
for A in active_nodes[:]:
......
if cond:
active_nodes.remove(A)
......
.....
is slow my test takes 2.7 seconds; profiling reveals we're calling A.remove
thousands of times and it's taking a lot of time. There appear to be no other
usages of active_nodes in the for A loop.
On a hunch I changed the above to
removed = []
aremoved = removed.append
for i in xrange(m):
.......
for iA,A in enumerate(active_nodes):
......
if cond:
aremoved(iA)
......
if removed:
for iA in reversed(removed):
del active_nodes[iA]
del removed[:]
......
this latter is 5 times faster, but I can't really see why. My only guess is that
appending to the removed list is relatively easy compared with deletes and the
reverse order delete is somehow moving less data about. I think in the original
Knuth the deletion would have been from some real kind of list and O(1) whereas
here deletions are O(n/2) on average. On the other hand we have to keep this
list in sorted order. What data structure should I be using? I should add that I
tried using a __cmp__ method to assist in doing the sorted insert, but that
interfered with the simple active_nodes.remove.