Timing Difference: insert vs. append & reverse

J

John Keeling

Dear all,
I tried the test program below. My interest is to examine timing
differences between insert vs. append & reverse for a list. My results
on my XP Python 2.3.4 are as follows:
time_reverse 0.889999389648
time_insert 15.7750005722
Over multiple runs ... the time taken to insert at the head of a list,
vs. the time taken to append to a list and then reverse it is
typically 16 or 17 times longer.
I would have expected the insert operation to be faster than the
combined append & reverse operations. Is this behaviour surprising, or
is there a good reason why there is this large performance difference?
Thanks,
John

from time import time

def test():
time_reverse =0.0
time_insert =0.0

for lindx in xrange(100):
tmp1 =[]
time_reverse -=time()
for indx in xrange(10000):
tmp1.append(indx)
tmp1.reverse()
time_reverse +=time()

tmp2 =[]
time_insert -=time()
for indx in xrange(10000):
tmp2.insert(0, indx)
time_insert +=time()
assert tmp1==tmp2
print "time_reverse ", time_reverse
print "time_insert ", time_insert

test()
 
D

Duncan Booth

(e-mail address removed) (John Keeling) wrote in
I tried the test program below. My interest is to examine timing
differences between insert vs. append & reverse for a list. My results
on my XP Python 2.3.4 are as follows:
time_reverse 0.889999389648
time_insert 15.7750005722
Over multiple runs ... the time taken to insert at the head of a list,
vs. the time taken to append to a list and then reverse it is
typically 16 or 17 times longer.
I would have expected the insert operation to be faster than the
combined append & reverse operations. Is this behaviour surprising, or
is there a good reason why there is this large performance difference?

A list is implemented by an array of pointers to the objects it contains.

Every time you call 'insert(0, indx)', all of the pointers already in the
list have to be moved up once position before the new one can be inserted
at the beginning.

When you call 'append(indx)' the pointers only have to be copied if there
isn't enough space in the currently allocated block for the new element. If
there is space then there is no need to copy the existing elements, just
put the new element on the end and update the length field. Whenever a new
block does have to be allocated that particular append will be no faster
than an insert, but some extra space will be allocated just in case you do
wish to extend the list further.

If you expected insert to be faster, perhaps you thought that Python used a
linked-list implementation. It doesn't do this, because in practice (for
most applications) a list based implementation gives better performance.
 
I

Istvan Albert

John said:
I tried the test program below. My interest is to examine timing
differences between insert vs. append & reverse for a list.

But what you are measuring is the difference between
one particular insert (where the index is 0) and the
combination of two other builtin operators that (in this
particular case) have the same effect.

While your observation makes a good point, in the end
we should pleased not surprised when we find ways to
speed up these specific cases.

Istvan.
 
D

Duncan Smith

John Keeling said:
Dear all,
I tried the test program below. My interest is to examine timing
differences between insert vs. append & reverse for a list. My results
on my XP Python 2.3.4 are as follows:
time_reverse 0.889999389648
time_insert 15.7750005722
Over multiple runs ... the time taken to insert at the head of a list,
vs. the time taken to append to a list and then reverse it is
typically 16 or 17 times longer.

Shouldn't that really be insert vs. reverse & append & reverse. :)

Duncan
 
J

John Keeling

Duncan,
A list is implemented by an array of pointers to the objects it contains.
That explains it.
If you expected insert to be faster, perhaps you thought that Python used a
linked-list implementation. It doesn't do this,

Yes, that is how I was thinking. It is interesting to see how
functions may perform very differently from how we may initially
expect. I've also noted an approximate two time difference between
"del list[index] and list.pop(index)". I know that they do different
things (the return value from pop), however in many places my code
uses pop and ignores the return value when it would be faster to use
del. I am making changes based on these observations now.
Thanks,
John
 
J

John Keeling

Duncan,
A list is implemented by an array of pointers to the objects it contains.
That explains it.
If you expected insert to be faster, perhaps you thought that Python used a
linked-list implementation. It doesn't do this,

Yes, that is how I was thinking. It is interesting to see how
functions may perform very differently from how we may initially
expect. I've also noted an approximate two time difference between
"del list[index] and list.pop(index)". I know that they do different
things (the return value from pop), however in many places my code
uses pop and ignores the return value when it would be faster to use
del. I am making changes based on these observations now.
Thanks,
John
 
T

Terry Reedy

John Keeling said:
expect. I've also noted an approximate two time difference between
"del list[index] and list.pop(index)".

Here is one way to get some insight into such things:
.... del lisp[index]
........ return lisp.pop(index)
.... 0 SET_LINENO 1

3 SET_LINENO 2
6 LOAD_FAST 0 (lisp)
9 LOAD_FAST 1 (index)
12 DELETE_SUBSCR
13 LOAD_CONST 0 (None)
16 RETURN_VALUE 0 SET_LINENO 1

3 SET_LINENO 2
6 LOAD_FAST 0 (lisp)
9 LOAD_ATTR 1 (pop)
12 LOAD_FAST 1 (index)
15 CALL_FUNCTION 1
18 RETURN_VALUE
19 LOAD_CONST 0 (None)
22 RETURN_VALUE

You can be pretty sure that the specific opcode DELETE_SUBSCR (ipt) is
faster than the generic CALL_FUNCTION, which will eventually call (I
presume) the same C code. (The extra attribute lookup and load take some
extra time too, but the call should be the main culprit).

Terry J. Reedy
 
J

John Keeling

It is so cool to be able to get the disassembly
so easily.... thanks v. much Terry.
Cheers,
John
 
B

Bryan Olson

Duncan said:
A list is implemented by an array of pointers to the objects it contains.

Every time you call 'insert(0, indx)', all of the pointers already in the
list have to be moved up once position before the new one can be inserted
at the beginning.

When you call 'append(indx)' the pointers only have to be copied if there
isn't enough space in the currently allocated block for the new element. If
there is space then there is no need to copy the existing elements, just
put the new element on the end and update the length field. Whenever a new
block does have to be allocated that particular append will be no faster
than an insert, but some extra space will be allocated just in case you do
wish to extend the list further.

If you expected insert to be faster, perhaps you thought that Python used a
linked-list implementation. It doesn't do this, because in practice (for
most applications) a [array] based implementation gives better performance.

True, but an array implementation can easily support amortized
constant-time insert/delete at *either* end (without breaking
constant-time indexing). The same trick of keeping extra space
at the tail end can work at the head; it requires keeping one
additional offset to show where the occupied cells start.
 
T

Terry Reedy

John Keeling said:
It is so cool to be able to get the disassembly
so easily.... thanks v. much Terry.

Since byte code is not (typically) written by hand, the symbolic names can
be long enough to be mostly self-explanatory, rather than short and crytic.
If any are not, for you, the Lib Manual dis module section has subsection
with all codes explained.

TJR
 
D

Duncan Booth

(e-mail address removed) (Bryan Olson) wrote in

If you expected insert to be faster, perhaps you thought that Python
used a linked-list implementation. It doesn't do this, because in
practice (for most applications) a [array] based implementation gives
better performance.

True, but an array implementation can easily support amortized
constant-time insert/delete at *either* end (without breaking
constant-time indexing). The same trick of keeping extra space
at the tail end can work at the head; it requires keeping one
additional offset to show where the occupied cells start.

If the OP had said he expected insert and append to be the same speed I
might buy that, but he expected insert to be faster than append.
 
J

John Keeling

Clarification...

Duncan Booth said:
If the OP had said he expected insert and append to be the same speed I
might buy that, but he expected insert to be faster than append.

Actually, I never said that ... I said "I would have expected the
insert operation to be faster than the combined append & reverse
operations." The object of my test code was to fill out a list in
the reverse order to which I had the list items available.
I would have expected:
tmp2 =[]
for indx in xrange(10000):
tmp2.insert(0, indx)

to be faster than

tmp1 =[]
for indx in xrange(10000):
tmp1.append(indx)
tmp1.reverse()

because the insert case does not require the reverse operation. This
would be the case if the insert operated at (approximately) the same
speed as the append, rather then the insert (at position 0) being
16-17 times slower than the append. I guess I somewhat expected it to
be as Bryan Olson stated:"an array implementation can easily support
amortized constant-time insert/delete at *either* end (without
breaking constant-time indexing)."

John
 
B

Bryan Olson

Duncan said:
Bryan Olson: [Duncan Booth had written:]
If you expected insert to be faster, perhaps you thought that Python
used a linked-list implementation. It doesn't do this, because in
practice (for most applications) a [array] based implementation gives
better performance.

True, but an array implementation can easily support amortized
constant-time insert/delete at *either* end (without breaking
constant-time indexing). The same trick of keeping extra space
at the tail end can work at the head; it requires keeping one
additional offset to show where the occupied cells start.

If the OP had said he expected insert and append to be the same speed I
might buy that but [...]

Hmmm ... let me clarify what I'm selling: We can make lists
perform efficiently as double-ended queues without breaking
anything or perceptibly hurting anyone's efficiency. Until this
thread, I hadn't realized that Python's lists are much slower
than Perl's in important cases:

http://perlmonks.thepen.com/17890.html


Would this be a PEP-size change, or just a small fix?
 
T

Tim Peters

[Bryan Olson]
Hmmm ... let me clarify what I'm selling: We can make lists
perform efficiently as double-ended queues without breaking
anything or perceptibly hurting anyone's efficiency. Until this
thread, I hadn't realized that Python's lists are much slower
than Perl's in important cases:

http://perlmonks.thepen.com/17890.html

Would this be a PEP-size change, or just a small fix?

"Futile" is closer -- it's been debated to death repeatedly. Python
2.4 adds a deque type instead, with O(1) insert/remove at both ends,
regardless of access pattern. That's O(1) per operation (not just
amortized O(1) -- the deque type never needs to move entries).
 
B

Bryan Olson

Tim said:
> [Bryan Olson]
>>[...] I hadn't realized that Python's lists are much slower
>>than Perl's in important cases:
>>
>> http://perlmonks.thepen.com/17890.html
>>
>>Would this be a PEP-size change, or just a small fix?
>
> "Futile" is closer -- it's been debated to death repeatedly. Python
> 2.4 adds a deque type instead, with O(1) insert/remove at both ends,
> regardless of access pattern. That's O(1) per operation (not just
> amortized O(1) -- the deque type never needs to move entries).

But it breaks (efficient) indexing. Since we can have it all
like Perl, why not?

Googling up "dequeue" together with "O(1)" in comp.lang.python,
(and with the misspelling "deque" which returns more results) I
see some discussion, but very little reason against the
enhancement. One post by Tim Peters suggests difficulty of
implementation as a major reason.
 
T

Tim Peters

[Bryan Olson, on complicating the list object again, and 2.4's deque type]
But it breaks (efficient) indexing. Since we can have it all
like Perl, why not?

As I said, the best Perl can do is O(1) amortized. 2.4's deque does
better than that, and that can be important in classes like Python's
Queue.Queue where threads typically block waiting for pops and pushes
(but has no use at all for random access).

If you follow the link you gave last time and read to the bottom
following the other links, you'll find that Perl lists had
quadratic-time behavior under the steady-state queue pattern for a
long time. That was eventually fixed -- or so they say. Small
details are both tricky and vital. 2.4's deque implementation is
obviously immune to "bad" patterns, steady-state queue or otherwise.

Most immediately damning, adding another member to the list struct (to
keep track of the "low bound") would increase the size of every list
object by 8 bytes, on 32-bit boxes. Python lists are easy to spell,
use and access, and some Python apps use millions of small lists.
They wouldn't appreciate the RAM hit for a mostly-useless feature.
Most Perl programmers seem to be so confused by Perl lists that they
only use them when they have to, to shift function arguments in and
out <0.6 wink>. That's a use case for lists Python doesn't have at
all.

You can pursue it if you want to, but with the 2.4 deque type I have
no interest in messing more with the list type.
 
B

Bryan Olson

Tim said:
> [Bryan Olson, on complicating the list object again, and 2.4's deque type]
>
>>But it breaks (efficient) indexing. Since we can have it all
>>like Perl, why not?
>
>
> As I said, the best Perl can do is O(1) amortized.

In fact, as both of us said.
> If you follow the link you gave last time and read to the bottom
> following the other links, you'll find that Perl lists had
> quadratic-time behavior under the steady-state queue pattern for a
> long time. That was eventually fixed -- or so they say. Small
> details are both tricky and vital.

Agreed. But those Perl wizards, misguided as they may be, are
pretty sharp.
> 2.4's deque implementation is
> obviously immune to "bad" patterns, steady-state queue or otherwise.
>
> Most immediately damning, adding another member to the list struct (to
> keep track of the "low bound") would increase the size of every list
> object by 8 bytes, on 32-bit boxes. Python lists are easy to spell,
> use and access, and some Python apps use millions of small lists.
> They wouldn't appreciate the RAM hit for a mostly-useless feature.
> Most Perl programmers seem to be so confused by Perl lists that they
> only use them when they have to, to shift function arguments in and
> out <0.6 wink>.

We must know different Perl programmers.
> That's a use case for lists Python doesn't have at
> all.
>
> You can pursue it if you want to, but with the 2.4 deque type I have
> no interest in messing more with the list type.

I'm talking about facilities and their implementations, not
people. True, when I pointed out that Perl nails this one, I
was kinda' thinking the comparison might motivate Pythoners to
pursue the same enhancement. It was certainly *not* meant to
deride anyone who contributed to implementing Python.

Is Tim the superior Pythoner? Duh. Does Python rock? Sure.
Is saving four-or-eight bytes more or less valuable than
providing efficient insert at both ends? With all due respect
to Python and the people who implemented it, Perl kicks on this
one.
 
T

Tim Peters

[Bryan Olson, on deques & Python's list type]
....
Agreed. But those Perl wizards, misguided as they may be, are
pretty sharp.
Yup.

....

[Tim]
You can pursue it if you want to, but with the 2.4 deque type I have
no interest in messing more with the list type.
[Bryan]
I'm talking about facilities and their implementations, not people.

Sure! I'm one of the handful of people who might actually "do
something" about this kind of issue, and I was telling you that I
won't. Your chances of seeing what you suggest are highly correlated
with finding someone who will "do something" <wink>. I don't know
whether Raymond Hettinger is interested in pursuing this further, but
if he isn't either (that's my guess), then the only realistic chance
is if you do the work yourself.
True, when I pointed out that Perl nails this one,

Which part I disagree with, for reasons already given.
I was kinda' thinking the comparison might motivate Pythoners to
pursue the same enhancement.

And the desire for efficient "both ends" operation led to 2.4's deque
type, which isn't "the same" enhancement because it didn't end up in
the base list type, but is a better enhancement for people who truly
need both-ends performance.
It was certainly *not* meant to deride anyone who contributed to implementing
Python.

I didn't read it that way.
Is Tim the superior Pythoner? Duh. Does Python rock? Sure.
Is saving four-or-eight bytes more or less valuable than
providing efficient insert at both ends?

In the basic list type, yes, it's more valuable in Python to save the
8 bytes. The speed of "left end" insert/remove is insignificant for
most Python apps, and is quite fast anyway for small lists. It's a
major concern for *some* Python apps, and the deque type serves those
better than fudging the list type could. The majority who don't care
don't pay for it.
With all due respect to Python and the people who implemented it, Perl kicks
on this one.

I agree that Perl has a good approach here. I think Python's is better, though.
 
B

Bryan Olson

Tim said:
> I'm one of the handful of people who might actually "do
> something" about this kind of issue, and I was telling you that I
> won't. Your chances of seeing what you suggest are highly correlated
> with finding someone who will "do something" <wink>. I don't know
> whether Raymond Hettinger is interested in pursuing this further, but
> if he isn't either (that's my guess), then the only realistic chance
> is if you do the work yourself.

Looking at the source, I'm worried. Append and pop[-1] are not
really amortized O(1); at best they're commonly-average O(1).
Alternating appends and pops at certain border values will call
realloc for every operation. The pop-reallocs free the extra
memory; if the allocator uses that memory for other requests,
the following append will have to copy the entire list.

> In the basic list type, yes, it's more valuable in Python to save the
> 8 bytes. The speed of "left end" insert/remove is insignificant for
> most Python apps, and is quite fast anyway for small lists. It's a
> major concern for *some* Python apps, and the deque type serves those
> better than fudging the list type could.

The leave-it-to-realloc method seems to be an effort to save one
word (either a pointer or a size) per list. With two more
words, I think we could make operations on both ends amortized
O(1). The only lists for which this would be a substantial
portion are empty lists. Currently, empty lists require four
words (type_pointer, refcount, size=0, item_pointer=NULL) plus
malloc's bookkeeping. Any non-empty list additionally allocates
space for at least 8 pointers, plus malloc's bookkeeping.
 
T

Tim Peters

[Bryan Olson]
Looking at the source, I'm worried. Append and pop[-1] are not
really amortized O(1); at best they're commonly-average O(1).
Alternating appends and pops at certain border values will call
realloc for every operation. The pop-reallocs free the extra
memory; if the allocator uses that memory for other requests,
the following append will have to copy the entire list.
....

The leave-it-to-realloc method seems to be an effort to save one
word (either a pointer or a size) per list.

What are you looking at? I've been talking about Python 2.4 (as
evidenced by my saying 2.4 over and over <wink>). Its list
implementation is not realloc-happy, and has both allocated-size and
used-size members. The earlier leave-it-to-realloc gimmick was indeed
an extreme effort to save bytes. The 2.4 implementation is simpler,
clearer, faster, and better-behaved in several respects.
With two more words, I think we could make operations on both ends
amortize O(1).

As before, I don't care about this. The 2.4 deque is O(1) on both
ends straightforwardly and more efficiently.
The only lists for which this would be a substantial
portion are empty lists. Currently, empty lists require four
words (type_pointer, refcount, size=0, item_pointer=NULL)

Plus 3 for the gc header (every list object gets one, although that's
not apparent from staring at listobject.h), plus 1 (not in 2.3 but in
2.4) to store the # of allocated slots. That's a total of 8.
plus malloc's bookkeeping.

The list struct is allocated via pymalloc, which allocates in 8-byte
chunks with trivial overhead beyond that (just a few percent). So not
even 1 bit can be added to the 2.4 list struct without losing another
8 bytes, under *most* C compilers for 32-bit machines. The Microsoft
C compilers are exceptions (they stick 4 bytes of padding in the gc
header, so there are 4 wasted bytes now (2.4) in the list struct under
MSVC).
Any non-empty list additionally allocates space for at least 8 pointers, ...

In 2.4 that's been reduced to 4.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,666
Latest member
selsetu

Latest Threads

Top