list.pop(0) vs. collections.dequeue

S

Steve Howell

The v2.6.4 version of the tutorial says this:

'''
It is also possible to use a list as a queue, where the first element
added is the first element retrieved (“first-in, first-out”); however,
lists are not efficient for this purpose. While appends and pops from
the end of list are fast, doing inserts or pops from the beginning of
a list is slow (because all of the other elements have to be shifted
by one).
'''

Is that really true in CPython? It seems like you could advance the
pointer instead of shifting all the elements. It would create some
nuances with respect to reclaiming the memory, but it seems like an
easy way to make lists perform better under a pretty reasonable use
case.

Does anybody know off the top of their head if the "have-to-be-shifted-
by-one" warning is actually valid?


http://docs.python.org/tutorial/datastructures.html
 
C

Chris Rebert

The v2.6.4 version of the tutorial says this:

'''
It is also possible to use a list as a queue, where the first element
added is the first element retrieved (“first-in, first-outâ€); however,
lists are not efficient for this purpose. While appends and pops from
the end of list are fast, doing inserts or pops from the beginning of
a list is slow (because all of the other elements have to be shifted
by one).
'''

Is that really true in CPython?  It seems like you could advance the
pointer instead of shifting all the elements.  It would create some
nuances with respect to reclaiming the memory, but it seems like an
easy way to make lists perform better under a pretty reasonable use
case.

Does anybody know off the top of their head if the "have-to-be-shifted-
by-one" warning is actually valid?

Judging by the "Sorted dictionary" thread responses: Yes.

Cheers,
Chris
 
S

Steve Howell

Judging by the "Sorted dictionary" thread responses: Yes.

I think you are referring to this comment:

'''
Insertion and deletion are still O(n) as all items to the right of the
inserted/deleted one have to be shifted by one place.
'''

http://groups.google.com/group/comp.lang.python/browse_thread/thread/d3699724d94d5b5a

I can certainly see why most reasonable implementations of a list
would have insertion/deletion in the middle of the list be O(N), but I
don't think that limitation has to apply for the special cases of the
beginning and end of the list.
 
S

Steve Howell

Why do you think the documentation has such obvious errors?

I wasn't making any assumptions, hence the question mark. The Python
docs are very good, but even the best of projects make advances that
aren't reflected in the docs.
CPython's list type uses an array of pointers to store its members. The
type is optimized for the most common list operations in Python:
iteration and appending. Python code rarely changes the head or middle
of a list. The dequeue type is an optimized data structure for popping
and inserting data at both ends.

I disagree that Python code rarely pops elements off the top of a
list. There are perfectly valid use cases for wanting a list over a
dequeue without having to pay O(N) for pop(0). Maybe we are just
quibbling over the meaning of "rarely."
When you list.pop() or list.insert() the pointers in the internal array
must be shifted. appending is much faster because the internal array is
overallocated meaning it contains free slots at the tail of the data
structure. A linked list of pointers requires more memory and iteration
is slower since since it can't utilize the CPU's cache as good as an array.

I am not proposing a linked list of pointers. I am wondering about
something like this:

p = &p[1];
(and then reclaim p[0] as free memory, I already said I understood
that was the tricky bit)

The pointer arithmetic for accessing each element would still work in O
(1), right?
 
A

Arnaud Delobelle

Steve Howell said:
I think you are referring to this comment:

'''
Insertion and deletion are still O(n) as all items to the right of the
inserted/deleted one have to be shifted by one place.
'''


I can certainly see why most reasonable implementations of a list
would have insertion/deletion in the middle of the list be O(N), but I
don't think that limitation has to apply for the special cases of the
beginning and end of the list.

I made the comment you quoted. In CPython, it is O(n) to delete/insert
an element at the start of the list - I know it because I looked at the
implementation a while ago. This is why collections.deque exists I
guess. I don't know how they are implemented but insertion/deletion at
either end are O(1) and so is random access. So they are the data
structure that you want.

If you want evidence for lists, rather than my word, try:
import timeit
timeit.Timer('while t:del t[0]', 't=[0]*100000').timeit(1) 1.8452401161193848
timeit.Timer('while t:del t[-1]', 't=[0]*100000').timeit(1)
0.017747163772583008
'while t:del t[0]',
'from collections import deque; t=deque([0]*100000)'
).timeit(1)
0.022077083587646484 'while t:del t[-1]',
'from collections import deque; t=deque([0]*100000)'
).timeit(1)
0.027813911437988281
 
S

Steve Howell

I think you are referring to this comment:
'''
Insertion and deletion are still O(n) as all items to the right of the
inserted/deleted one have to be shifted by one place.
'''
I can certainly see why most reasonable implementations of a list
would have insertion/deletion in the middle of the list be O(N), but I
don't think that limitation has to apply for the special cases of the
beginning and end of the list.

I made the comment you quoted.  In CPython, it is O(n) to delete/insert
an element at the start of the list - I know it because I looked at the
implementation a while ago.  This is why collections.deque exists I
guess.  I don't know how they are implemented but insertion/deletion at
either end are O(1) and so is random access.  So they are the data
structure that you want.

If you want evidence for lists, rather than my word, try:
import timeit
timeit.Timer('while t:del t[0]', 't=[0]*100000').timeit(1) 1.8452401161193848
timeit.Timer('while t:del t[-1]', 't=[0]*100000').timeit(1)
0.017747163772583008
timeit.Timer(

    'while t:del t[0]',
    'from collections import deque; t=deque([0]*100000)'
).timeit(1)
0.022077083587646484>>> timeit.Timer(

    'while t:del t[-1]',
    'from collections import deque; t=deque([0]*100000)'
).timeit(1)
0.027813911437988281

Ok, thanks, good to know. I assume it's the colorly named
list_ass_slice that would have to special case ilow == 0 and you'd
need a memory manager that would let you realloc from ilow:ihigh to
ilow+n:high. Am I reading that much of the code correctly?
 
T

Terry Reedy

The v2.6.4 version of the tutorial says this:
Is that really true in CPython? It seems like you could advance the
pointer instead of shifting all the elements. It would create some
nuances with respect to reclaiming the memory, but it seems like an
easy way to make lists perform better under a pretty reasonable use
case.

Something like this was one proposed (ie, leave space at both ends of a
list) but was rejected as over-complicating the list implementation for
*relatively* rare use cases. I believe deque was written subsequently to
address such other use cases.
 
S

Steve Howell

I was speaking from my own point of view. I've written several tenths of
thousands of lines of Python code in the last seven years, mostly
related to data manipulation, web applications and operating system
interaction but also GUI stuff and scientific code. I can't recall any
performance critical or prominent code that modifies the head of a list
a lot.

That maybe would be an argument for just striking the paragraph from
the tutorial. If it's rare that people pop the head off the list in
code that is performance critical or prominent, why bother to even
mention it in the tutorial?
Of course there a use cases where you may want to use list.pop(). Unless
you need a FILO structure you can always replace a LILO with a FIFO --
instead of list.insert(0, value) and list.pop(0) use list.append(value)
and list.pop(). It's not possible to optimize a data structure for all
use cases.
I am not proposing a linked list of pointers.  I am wondering about
something like this:
p = &p[1];
(and then reclaim p[0] as free memory, I already said I understood
that was the tricky bit)
The pointer arithmetic for accessing each element would still work in O
(1), right?

You mean it's an impossible trick unless you come up with your own
memory management system. Realloc(), the standard function to change the
size of chunk of allocated memory in C, doesn't support your desired
operation.

Impossible is a strong word. You could be lazy about giving the
memory back, and just wait till the whole list is garbage collected.
I don't think there's anything in Python's contract that says memory
has to be freed the exact moment you stop using it, especially since
we're talking about doing an O(N) memmove just to free up one
pointer's worth of memory.

I know the Python programmer could simply iterate through the list
rather than popping off unused elements, but that just means that you
not only tie up the memory for the pointers just as long, but you also
prevent the objects themselves from being garbage collected.

In my case I'm not really concerned about giving the memory back right
away, it's more about keeping my code simple. Once I'm done with an
element, I do just want to pop it off and keep all the simplicity of
the lists, but I am just concerned enough speed that for a 1000
element list, I don't want to be doing 1000 memmoves for an average of
500 pointers, which effectively moves about a million bytes around for
no reason.

I suppose the solution is to either give up the sugar of lists, or try
to wrap something like deque or list-with-offset into a sequence.
 
S

Steve Howell

Something like this was one proposed (ie, leave space at both ends of a
list) but was rejected as over-complicating the list implementation for
*relatively* rare use cases. I believe deque was written subsequently to
address such other use cases.

Bummer. I guess I get to do my own over-complicating of code, being
in that unfortunate minority.
 
D

Dave Angel

Steve said:
Why do you think the documentation has such obvious errors?

I wasn't making any assumptions, hence the question mark. The Python
docs are very good, but even the best of projects make advances that
aren't reflected in the docs.

CPython's list type uses an array of pointers to store its members. The
type is optimized for the most common list operations in Python:
iteration and appending. Python code rarely changes the head or middle
of a list. The dequeue type is an optimized data structure for popping
and inserting data at both ends.

I disagree that Python code rarely pops elements off the top of a
list. There are perfectly valid use cases for wanting a list over a
dequeue without having to pay O(N) for pop(0). Maybe we are just
quibbling over the meaning of "rarely."

When you list.pop() or list.insert() the pointers in the internal array
must be shifted. appending is much faster because the internal array is
overallocated meaning it contains free slots at the tail of the data
structure. A linked list of pointers requires more memory and iteration
is slower since since it can't utilize the CPU's cache as good as an array.

I am not proposing a linked list of pointers. I am wondering about
something like this:

p =p[1];
(and then reclaim p[0] as free memory, I already said I understood
that was the tricky bit)

The pointer arithmetic for accessing each element would still work in O
(1), right?
I think it was Dijkstr (sp?) who said you can accomplish anything with
just one more level of indirection. Clearly each attribute (variable)
that has a binding to a given list must point to a fixed piece of
memory, that cannot safely be moved, because there's no efficient way to
find all the attributes. That fixed piece is the list object, and I
expect it's 16 or 20 bytes, on a 32bit implementation. It must in turn
point to the actual malloc'ed block that contains pointers to all the
elements of the list. So that block will be 4*n where n is the number
of reserved cells, at least as large as len(). This is the block where
copying takes place when you insert or delete from the beginning.

The list object must contain a pointer to the beginning of this block,
or it wouldn't be able to free() it later. So you'd be suggesting a
second pointer that actually points to the current 0th pointer. And a
pop would simply increment this second pointer.

Such an approach might be worthwhile if you expect lots of pops and
pushes, with a minimal net change. But of course each time you did a
pop, you'd have to decide whether it was time to normalize/compact the
block.

As you say, reclaiming the 0th element of this block to the memory pool
would be tricky. Doubly so, because 1) the C memory allocator has no
such notion as resizing the beginning of a block. and 2) it would have
nothing useful to do with the 4 bytes freed. The minimum allocated
block in Python is probably 16 bytes of actual address space. I'd guess
that's 4 bytes for malloc's overhead, and 8 bytes for the minimum object
header, and 4 bytes for data. To check me, try:
(11074136, 11074152)

Anyway, this could be done, where once the two pointers get some
distance apart, you do a realloc, and copy everything. But of course
you'd want to build some hysteresis into it, to avoid thrashing.

There wouldn't be much of a performance hit, but it would increase every
list size by 4 bytes minimum. So I doubt it would be a list replacement.

This'd be an interesting project.to do as an addon module.

DaveA
 
D

Dave Angel

Arnaud said:
I made the comment you quoted. In CPython, it is O(n) to delete/insert
an element at the start of the list - I know it because I looked at the
implementation a while ago. This is why collections.deque exists I
guess. I don't know how they are implemented but insertion/deletion at
either end are O(1) and so is random access. So they are the data
structure that you want.
Not according to the 2.6 docs.

Indexed access is O(1) at both ends but slows to O(n) in the middle. For
fast random access, use lists instead.

That sounds to me like a doubly-linked list implementation.

<snip>
 
S

Steve Howell

I wasn't making any assumptions, hence the question mark.  The Python
docs are very good, but even the best of projects make advances that
aren't reflected in the docs.
I disagree that Python code rarely pops elements off the top of a
list.  There are perfectly valid use cases for wanting a list over a
dequeue without having to pay O(N) for pop(0).  Maybe we are just
quibbling over the meaning of "rarely."
I am not proposing a linked list of pointers.  I am wondering about
something like this:
p =p[1];
(and then reclaim p[0] as free memory, I already said I understood
that was the tricky bit)
The pointer arithmetic for accessing each element would still work in O
(1), right?

I think it was Dijkstr (sp?) who said you can accomplish anything with
just one more level of indirection.  Clearly each attribute (variable)
that has a binding to a given list must point to a fixed piece of
memory, that cannot safely be moved, because there's no efficient way to
find all the attributes.   That fixed piece is the list object, and I
expect it's 16 or 20 bytes, on a 32bit implementation.  It must in turn
point to the actual malloc'ed block that contains pointers to all the
elements of the list.  So that block will be 4*n where n is the number
of reserved cells, at least as large as len().  This is the block where
copying takes place when you insert or delete from the beginning.

The indirection is already in Python, as it (at least appears to me)
that everything is deferenced off of an ob_item pointer:

http://svn.python.org/view/python/trunk/Objects/listobject.c?view=markup
The list object must contain a pointer to the beginning of this block,
or it wouldn't be able to free() it later.  So you'd be suggesting a
second pointer that actually points to the current 0th pointer.  And a
pop would simply increment this second pointer.

Yes, ob_item would point to the 0th element pointer and pop would
simply increment it.

The additional bookkeeping would be the original pointer.
Such an approach might be worthwhile if you expect lots of pops and
pushes, with a minimal net change.  But of course each time you did a
pop, you'd have to decide whether it was time to normalize/compact  the
block.

Yes, and that of course is the tricky bit.
As you say, reclaiming the 0th element of this block to the memory pool
would be tricky.  

I should clarify that a bit. Reclaiming the 0th element *cheaply* is
tricky, unless you want to rewrite the memory manager. But I also
think you can, of course, defer reclaiming the element.
Doubly so, because  1) the C memory allocator has no
such notion as resizing the beginning of a block. and 2) it would have
nothing useful to do with the 4 bytes freed.  The minimum allocated
block in Python is probably 16 bytes of actual address space.  I'd guess
that's 4 bytes for malloc's overhead, and 8 bytes for the minimum object
header, and 4 bytes for data.  To check me, try:

 >>> a = 5.3
 >>> b = 49.6
 >>> id(a), id(b)
(11074136, 11074152)

Anyway, this could be done, where once the two pointers get some
distance apart, you do a realloc, and copy everything.  But of course
you'd want to build some hysteresis into it, to avoid thrashing.
, but

There wouldn't be any additional thrashing beyond what happens now.
You'd simply avoid the first N-1 memmoves of up to kN bytes at the
cost of not reclaiming those k(N-1) bytes right away. I suppose it's
a more "bursty" penalty you'd be paying, but the peak of the "bursty"
curve is no wider than the "constant" curve, it's just N times
narrower!
There wouldn't be much of a performance hit, but it would increase every
list size by 4 bytes minimum.  So I doubt it would be a list replacement.

I don't think that's the reason people would oppose this, although you
are true about the penalty. Memory's cheap, you'd need about a
quarter million lists just to fill up a meg.

CPU cycles, on the other hand, are expensive, as users' demand for
responsive programs seems to do a better job of keeping up with
Moore's Law.

I'd also argue that the memory you keep around to avoid unnecessary
memmoves() will almost always be dwarfed by the memory used by the
list elements themselves.
 
S

Steve Howell

How else are you going to teach new Python developers that they should
favor append() and pop() in place of insert() and pop(0)? :)


Your proposal would increase the size of every list object of
sizeof(ptr) or ssize_t (32 or 64bits) in order to store the information
where the first element is. It would also unnecessarily complicate the
code and possible slow down a lot of list operations.

64 bits per list is negligible.

Adding a check for (ilow == 0) is even more negligible.

You are not "unnecessarily" complicating code for something as widely
used as Python lists if it achieves the desired benefit at minimal
cost. You are complicating the code, but not "unnecessarily."
The simplicity of Python is gained with some performance drawbacks. You
have to learn to use Python algorithms. You can't simply re implement a
fast C algorithm and expect it to be fast in Python, too.

I actually do expect Python to solve performance problems for me that
are more easily solved in core than in Python itself. So I guess
that's where we differ.
 
S

Steven D'Aprano

I know the Python programmer could simply iterate through the list
rather than popping off unused elements, but that just means that you
not only tie up the memory for the pointers just as long, but you also
prevent the objects themselves from being garbage collected.

In my case I'm not really concerned about giving the memory back right
away, it's more about keeping my code simple. Once I'm done with an
element, I do just want to pop it off and keep all the simplicity of the
lists, but I am just concerned enough speed that for a 1000 element
list, I don't want to be doing 1000 memmoves for an average of 500
pointers, which effectively moves about a million bytes around for no
reason.

I suppose the solution is to either give up the sugar of lists, or try
to wrap something like deque or list-with-offset into a sequence.


I don't understand what the actual problem you're trying to solve is.
Despite your disclaimer about not being concerned about reclaiming the
memory, it sounds like you're trying to micro-optimize memory usage. That
is, you give me the impression that you prefer this:

while alist:
x = alist.pop(0)
process(x)

over this:

for x in alist:
process(x)
# allow alist to be garbage collected when it goes out of scope


That strikes me as a pointlessly trivial optimization, even if deleting
at the start of the list was efficient.

But whatever your reason, if you want to insert and delete efficiently
from both ends of the sequence, use a deque. If you are only doing a
small number of insertions/deletions at the beginning, and so don't care
about inefficiency, use a list.

If you only want to insert/delete from one end, use a list. Instead of:

alist.insert(0, x)
alist.pop(0)

use:

alist.append(x)
alist.pop()

That's fast and efficient. In some cases it doesn't matter which order
the list is, but if it does matter, the worst case is that you need to
call alist.reverse() occasionally, which is quite fast. Or iterate over
the list in reverse order, which is even faster.

So what am I missing?
 
R

Roy Smith

Christian Heimes said:
Why do you think the documentation has such obvious errors?

CPython's list type uses an array of pointers to store its members. The
type is optimized for the most common list operations in Python:
iteration and appending. Python code rarely changes the head or middle
of a list. The dequeue type is an optimized data structure for popping
and inserting data at both ends.

When you list.pop() or list.insert() the pointers in the internal array
must be shifted.

Well, at least in theory you could make pop(0) run in O(1). All you need
to do is have each list store an offset. Initially it's 0, and pop(0)
would just increment the offset. Then, all references to alist would
turn into array[i+offset].

Of course, that's a lot of complexity to optimize a relatively rare use
case. You're probably better off just using a dequeue :)
 
R

Roy Smith

Steve Howell said:
In my case I'm not really concerned about giving the memory back right
away, it's more about keeping my code simple. Once I'm done with an
element, I do just want to pop it off and keep all the simplicity of
the lists, but I am just concerned enough speed that for a 1000
element list, I don't want to be doing 1000 memmoves for an average of
500 pointers, which effectively moves about a million bytes around for
no reason.

If you really want to pop all the elements from a long list, reverse the
list and pop them off the end. Popping every element off the beginning of
the list is O(n^2), as you pointed out. Reversing the list is O(n), and
each pop after that is O(1), so the overall complexity is O(n).
 
S

Steve Howell

I don't understand what the actual problem you're trying to solve is.
Despite your disclaimer about not being concerned about reclaiming the
memory, it sounds like you're trying to micro-optimize memory usage.

My discussion of memory probably distracted you from the fact that I'm
not proposing a micro-optimization of memory; I am proposing a mega-
optimization of CPU.

This innocent program here literally moves about a million bytes of
memory around for no good reason:

lst = []
for i in range(2000):
lst.append(i)
while lst:
print lst.pop(0)

Why? Because list.pop(0) is implemented in O(N) instead of O(1).

Why wouldn't you get a competent C programmer simply make
list_ass_slice smart enough to make list.pop(0) O(1)?

The brilliant computer scientist, Christian Heimes, provides the
answers, and I am paraphrasing here, of course:

1) You can save 64 bits for every list by not storing an extra
pointer!
2) You can simplify the CPython implementation!
3) You can avoid the oh-so-expensive check "if ilow == 1" for all
operations that don't need this optimization!

Sounds like two micro-optimizations to me (and a copout to boot).
That
is, you give me the impression that you prefer this:

while alist:
    x = alist.pop(0)
    process(x)

over this:

for x in alist:
    process(x)
# allow alist to be garbage collected when it goes out of scope

No, to be more precise, I prefer this implementation of a recursive
parser (using lists) to one that would have to use deque's because of
the lameness of Python's list implementation:

https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py
 
S

Steve Howell

If you really want to pop all the elements from a long list, reverse the
list and pop them off the end.  Popping every element off the beginning of
the list is O(n^2), as you pointed out.  Reversing the list is O(n), and
each pop after that is O(1), so the overall complexity is O(n).

I really want to use list *normally* with all its perfectly good
semantics and reasonable implementation, except for its blind spot
with respect to popping the first element off the list. The whole
reason I use CPython vs. C in the first place is that CPython
programmers can generally program basic data structures better than I
can. But list.pop(0) is the exception. And, with the possible
exception of dicts, lists are the most fundamental data structures
that Python has.

I know Python's number one concern will never be speed, but if Python
makes an O(1) operation into an unnecessarily O(N) operation for no
good reasons other than "it's too complicated, " or it "adds another
pointer to the structure," or "it adds another conditional check to
list_ass_slice for operations that aren't popping off the top," I
think it's reasonable to challenge the design philosophy.
 
A

Aahz

I really want to use list *normally* with all its perfectly good
semantics and reasonable implementation, except for its blind spot
with respect to popping the first element off the list. The whole
reason I use CPython vs. C in the first place is that CPython
programmers can generally program basic data structures better than I
can. But list.pop(0) is the exception. And, with the possible
exception of dicts, lists are the most fundamental data structures
that Python has.

I know Python's number one concern will never be speed, but if Python
makes an O(1) operation into an unnecessarily O(N) operation for no
good reasons other than "it's too complicated, " or it "adds another
pointer to the structure," or "it adds another conditional check to
list_ass_slice for operations that aren't popping off the top," I
think it's reasonable to challenge the design philosophy.

"Rough consensus and running code."

You have a good point, but nobody will ever give your idea serious
attention until there's a patch and benchmarks.
 
S

Steve Howell

"Rough consensus and running code."

You have a good point, but nobody will ever give your idea serious
attention until there's a patch and benchmarks.

Here is a benchmark of O(N*N) vs. O(N) for two C programs. One does
memmove; the other simply advances the pointer.

showell@showell-laptop:~$ time ./slow

real 0m1.446s
user 0m1.444s
sys 0m0.004s
showell@showell-laptop:~$ time ./fast

real 0m0.003s
user 0m0.004s
sys 0m0.000s
showell@showell-laptop:~$ diff slow.c fast.c
23,24c23
< lst.size -= 1;
< memmove(lst.p, lst.p + 1, lst.size);
---
> lst.p += 1;
showell@showell-laptop:~$ cat slow.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

struct ob_item {
int whatever;
};

struct list {
struct ob_item *p;
int size;
};

struct list make_list(int n)
{
struct list lst;
lst.p = malloc(n);
lst.size = n;
return lst;
}

void list_pop_left(struct list lst) {
lst.size -= 1;
memmove(lst.p, lst.p + 1, lst.size);
}

int main() {
struct list lst;
int i;
int n = 40000;
int t;

lst = make_list(n);
for (i = 0; i < n; ++i) {
list_pop_left(lst);
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top