list.pop(0) vs. collections.dequeue

Alf P. Steinbach · Jan 23, 2010

* Steve Howell:

I really want to use list *normally* with all its perfectly good
semantics and reasonable implementation, except for its blind spot
with respect to popping the first element off the list. The whole
reason I use CPython vs. C in the first place is that CPython
programmers can generally program basic data structures better than I
can. But list.pop(0) is the exception. And, with the possible
exception of dicts, lists are the most fundamental data structures
that Python has.

Having optimized list.pop() for first element, then you would have a blind spot
with respect to insertion at the front... Which could then be optimized for the
cases where there is some buffer space at the front, left over from a pop. I
think it would just be harder to understand and harder to explain. And I think
that due to that, as usual people would build their own elaborate
"explanations", with erroneous conclusions, and would then use it in inefficient
ways (like, popping from the middle or inserting at the front).

I think the "harder to use correctly" is the strongest argument against the
optimization, but there is also a non-obvious *memory overhead*...

Having popped just one element at the front, in order for the implementation to
not /accumulate/ unused memory, that is, in order to avoid an ongoing /memory
leak/, extending the buffer to accomodate e.g. an append() can no longer be done
as a (on relevant systems) constant time reallocation (letting the OS do its
virtual memory paging magic). Instead, a new buffer would have to be allocated
and all data copied over. One could still have amortized constant time for
appends by using a doubling buffer (which is the strategy used in C++
'std::vector'), but at the cost of wasting some memory, a percentage...

The implementation as a pure array is easy to understand and use correctly, and
doesn't waste memory.

But it would IMHO have been better if it wasn't called "list", which brings in
the wrong associations for someone used to other languages. The name does
matter. E.g. I don't think this discussion about a pop optimization would have
been there except for the name, which makes that optimization sound reasonable.
Perhaps some standard alternative name could be devised. Like, "array" would
have been nice, except that that's already grabbed by the array module.

I know Python's number one concern will never be speed, but if Python
makes an O(1) operation into an unnecessarily O(N) operation for no
good reasons other than "it's too complicated, " or it "adds another
pointer to the structure," or "it adds another conditional check to
list_ass_slice for operations that aren't popping off the top," I
think it's reasonable to challenge the design philosophy.

See above. In addition to being more difficult /to use/ correctly, that is,
being much easier to misunderstand, it incurs a memory overhead -- not the one
directly from the pop optimization, but by the requirements of buffer extension.
Essentially, as discussed above, it would then have to use a doubling buffer.

Cheers & hth.,

- Alf

Steve Howell · Jan 23, 2010

"Rough consensus and running code."

You have a good point, but nobody will ever give your idea serious
attention until there's a patch and benchmarks.

Another benchmark is that deques are slower than lists for accessing
elements.

showell@showell-laptop:~$ python foo.py
0.0215361118317 <- list
0.0429010391235 <- deque

import time
from collections import deque

n = 40000
lst = []
for i in range(n):
lst.append(i)

t = time.time()
for i in range(n):
lst
print time.time() - t

lst = deque(lst)
t = time.time()
for i in range(n):
lst
print time.time() - t

So substituting deque for list suffers not just in convenience, but
also in performance.

Terry Reedy · Jan 23, 2010

I really want to use list *normally* with all its perfectly good
semantics and reasonable implementation, except for its blind spot
with respect to popping the first element off the list.

It was not designed for that. .pop() was added to lists about 10 years
ago because I asked for it (with no parameter, pop off end only) and
wrote what would now be a PEP -- and because Tim Peters later supported
the idea. Adding the optional parameter was something of an afterthought
(never publicly discussed as far as I know) for occasional use for
'short' lists where O(n) is tolerable. You have half persuaded me that
that the parameter addition was a mistake. Perhaps is is too attractice
a nuisance for some people ;=).

The whole
reason I use CPython vs. C in the first place is that CPython
programmers can generally program basic data structures better than I
can.

They have given us three options other that .pop(0).

1. listiterator
2. queue.Queue
3. collections.deque\

Why are you so stubborn about not using a data structure intended for
your use case instead of one that was not?

There is also
4. a two-list design for queues: iterator through one (or pop() from a
reversed version thereof) while appending to another; switch when the
first is empty. I plan to write it up with tests some day this year.

I know Python's number one concern will never be speed, but if Python
makes an O(1) operation into an unnecessarily O(N) operation for no
good reasons other than "it's too complicated, " or it "adds another
pointer to the structure," or "it adds another conditional check to
list_ass_slice for operations that aren't popping off the top," I
think it's reasonable to challenge the design philosophy.

Challenge yes, mock no.

Part of writing good basic data structures is not adding needless
complication from featuritis and not penalizing 99.99% of access to
satify a .01% need better satisfied another way.

Terry Jan Reedy

Steve Howell · Jan 23, 2010

Challenge yes, mock no.

Part of writing good basic data structures is not adding needless
complication from featuritis and not penalizing 99.99% of access to
satify a .01% need better satisfied another way.

I would like to challenge your assertion that advancing ob_item
instead of doing memmove during list_ass_slice would impact the
performance of list accesses in any way. It would only slow down
operations that add/insert items into the list by, and then only by a
single conditional statement, and those add/insert operations are
already O(N) to begin with.

Alf P. Steinbach · Jan 23, 2010

* Steve Howell:

I would like to challenge your assertion that advancing ob_item
instead of doing memmove during list_ass_slice would impact the
performance of list accesses in any way. It would only slow down
operations that add/insert items into the list by, and then only by a
single conditional statement, and those add/insert operations are
already O(N) to begin with.

I'm sorry, no, the last part is incorrect.

Appending to a 'list' can currently be constant time, if OS reallocation is
constant time (as the string '+' optimization relies on).

With the pop optimization it can no longer be constant time without risking an
accumulation of unused memory, a memory leak, although it can be amortized
constant time, at the cost of wasting some percentage of memory.

Cheers & hth.,

- Alf

Steve Howell · Jan 23, 2010

* Steve Howell:

I'm sorry, no, the last part is incorrect.

Appending to a 'list' can currently be constant time, if OS reallocation is
constant time (as the string '+' optimization relies on).

That's true, but it's also irrelevant, as the pop optimization would
happen in a branch of code that never gets executed during list
appending:

if (d < 0) { /* Delete -d items */
/* ADD CODE HERE TO AVOID memmove
when ilow == 0 (i.e. ihigh + d == 0)
*/
memmove(&item[ihigh+d], &item[ihigh],
(Py_SIZE(a) - ihigh)*sizeof(PyObject *));
list_resize(a, Py_SIZE(a) + d);
item = a->ob_item;
}

With the pop optimization it can no longer be constant time without risking an
accumulation of unused memory, a memory leak, although it can be amortized
constant time, at the cost of wasting some percentage of memory.

list_resize already overallocates memory to allow room for growth.
Whenever you did an append to the list that would force a realloc, you
could first check to see if there is unused stuff at the front and do
the memmove then and reclaim the unfreed memory. So instead of doing
a paying for memmove on every pop, you only pay for it when the list
goes to size 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, etc.

Terry Reedy · Jan 23, 2010

I would like to challenge your assertion that advancing ob_item
instead of doing memmove during list_ass_slice would impact the
performance of list accesses in any way. It would only slow down
operations that add/insert items into the list by, and then only by a
single conditional statement, and those add/insert operations are
already O(N) to begin with.

If you try writing a full patch, as I believe someone did, or at least a
prototype thereof, when the idea was discussed, you might have a better
idea of what the tradeoffs are and why it was rejected.

For instance, when you append to a full list, it is resized. I believe
it is now doubled, but the policy has varied over the years. If you turn
list from essentially a stack to a deque, you complicate the resizing
policy and have to consider the addition of a shift policy. Do you split
the over-allocated fore and aft or increase the overallocation from
double to, say, triple? If the former, then for normal usage that never
uses the fore part, the over-allocation has been effectively reduced
from 2x to 1.5x (which is about what it once was, I believe). This means
more frequent resizings and copyings, which means slower operation for
most use cases. Adding extra usually wasted space is also a nuisance.

Terry Jan Reedy

Steve Howell · Jan 23, 2010

It was not designed for that. .pop() was added to lists about 10 years
ago because I asked for it (with no parameter, pop off end only) and
wrote what would now be a PEP -- and because Tim Peters later supported
the idea. Adding the optional parameter was something of an afterthought
(never publicly discussed as far as I know) for occasional use for
'short' lists where O(n) is tolerable. You have half persuaded me that
that the parameter addition was a mistake. Perhaps is is too attractice
a nuisance for some people ;=).

pop(0) is a useful idiom in parsers. You can see examples in
ElementTree and lib2to3.

Even without pop(0), people would still write code like this, found in
pstats.py:

arg = args[0]
args = args[1:]

It is sometimes overkill (and even inappropriate) to use a queue when
really you just want a list. Iterators are great, but they also have
slightly different semantics than the list itself.

There is nothing wrong with a language specification that allows users
to do insert, delete, and pop on a list. Once you freeze the language
specification, then you can turn your attention to improving the
implementation.

Alf P. Steinbach · Jan 23, 2010

* Steve Howell:

* Steve Howell:

I'm sorry, no, the last part is incorrect.

Appending to a 'list' can currently be constant time, if OS reallocation is
constant time (as the string '+' optimization relies on).

Click to expand...

That's true, but it's also irrelevant, as the pop optimization would
happen in a branch of code that never gets executed during list
appending:

if (d < 0) { /* Delete -d items */
/* ADD CODE HERE TO AVOID memmove
when ilow == 0 (i.e. ihigh + d == 0)
*/
memmove(&item[ihigh+d], &item[ihigh],
(Py_SIZE(a) - ihigh)*sizeof(PyObject *));
list_resize(a, Py_SIZE(a) + d);
item = a->ob_item;
}

With the pop optimization it can no longer be constant time without risking an
accumulation of unused memory, a memory leak, although it can be amortized
constant time, at the cost of wasting some percentage of memory.

Click to expand...

list_resize already overallocates memory to allow room for growth.
Whenever you did an append to the list that would force a realloc, you
could first check to see if there is unused stuff at the front and do
the memmove then and reclaim the unfreed memory. So instead of doing
a paying for memmove on every pop [at front], you only pay for it when
the list goes to size 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, etc.

Oh. If 'list' already uses a doubling buffer then the only overhead from the
optimization would, AFAICS, be a single add in every indexing. Which might be
acceptable (at least it sounds very reasonable in the context of Python).

Re terminology: I write "doubling buffer" to mean increase of buffer size by a
factor. It's often 2, but might be e.g. 1.5, whatever. The point of using a
constant factor is to avoid quadratic time for loops doing appending, i.e. the
constant factor size increase yields amortized constant time per append.

The sizes that you quote above, on the other hand, look like rather arbitrary
buffer size increases where the size to increase by is limited to a certain
small range. With copying or moving of everything at each buffer size increase
that would not yield amortized constant time. It yield linear time, and
quadratic time for a loop doing appends.

But, anyway, if 'list' already uses a doubling buffer then the only overhead
from the pop optimization would, AFAICS, be a single add in every indexing.

On the third & gripping hand, however, a proof-of-concept actual implementation
(patch) would be needed to ensure that one doesn't overlook any showstopper or
serious problem, and to provide timings. And the special case would have to be
documented as a special case. Hm, it would be nice if the Python docs offered
complexity (time) guarantees in general...

Cheers,

- Alf

Steve Howell · Jan 23, 2010

If you try writing a full patch, as I believe someone did, or at least a
prototype thereof, when the idea was discussed, you might have a better
idea of what the tradeoffs are and why it was rejected.

For instance, when you append to a full list, it is resized. I believe
it is now doubled, but the policy has varied over the years. If you turn
list from essentially a stack to a deque, you complicate the resizing
policy and have to consider the addition of a shift policy. Do you split
the over-allocated fore and aft or increase the overallocation from
double to, say, triple? If the former, then for normal usage that never
uses the fore part, the over-allocation has been effectively reduced
from 2x to 1.5x (which is about what it once was, I believe). This means
more frequent resizings and copyings, which means slower operation for
most use cases. Adding extra usually wasted space is also a nuisance.

It looks like most of the complication would be in list_resize.

I'm gonna oversimplify a bit, but tell me if this is the gist of it.

I would have ob_item continue to always refer to first element of the
list, and then I'd have to introduce another variable to refer to the
start of our allocated memory, ob_start_memory, whenever you do a
realloc/free/malloc. I'd have a notion of fore_wastage, which would
either be a variable I maintain or something that I just calculate as
needed from the difference of ob_item and ob_start_memory.

In deciding whether I want to give memory back to the memory manager,
I just need to adjust my calculations to account for fore and aft
wastage to see if it's time to do a shrink, and before shrinking, I do
the memmove.

On growth, I would just always do a memmove right away if there is
fore_wastage, and then do the normal procedure for aft wastage.

For the most common scenario of append, append, append, the only
penalty is having to skip over fore_wastage logic by checking for
fore_wastage == 0 or ob_item == ob_start_memory.

For the scenario of several appends followed by several pops, I get
the big win of only doing log 2 N memmoves instead of N as I shrink
the list down to zero.

If I start alternating between pops and appends, it's basically a
wash...instead of doing the memmove on the pop, I do it on the next
append.

If I were to pop the top element and then prepend a new element, to be
truly efficient, I'd want to use reserved right away, but for
simplicity, I would probably not complicate list_ass_slice and just do
the normal resize() dance, which would give me memmove in one
direction followed immediately by a memmove in the other direction
when I come back to list_ass_slice. (But it would all still be a
wash, since I would have to do the same number of memmoves in the
current implementation.)

A lot of the essential complexity here seems to come from the fact
that realloc() isn't a very robust abstraction. It seems to be
expensive to tell it you want to shrink, and it also does not have an
interface to tell it to give you a little growing room. On the other
hand, the code within list_resize() actually provides a nice framework
for amortizing memmoves exponentially.

Steven D'Aprano · Jan 23, 2010

This innocent program here literally moves about a million bytes of
memory around for no good reason:

lst = []
for i in range(2000):
lst.append(i)
while lst:
print lst.pop(0)

Why? Because list.pop(0) is implemented in O(N) instead of O(1).

Why wouldn't you get a competent C programmer simply make list_ass_slice
smart enough to make list.pop(0) O(1)?

Because there are always trade-offs, and the competent C programmers who
coded the implementation for lists choose different tradeoffs to the ones
you would prefer.

Seems to me that the simple solution to your problem is for you to
implement your own data structure that makes whatever trade-offs you
like. If it is good enough and popular enough, it might even replace the
existing list implementation.

No, to be more precise, I prefer this implementation of a recursive
parser (using lists) to one that would have to use deque's because of
the lameness of Python's list implementation:

https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py

That's a lot of code. Am I supposed to study the whole module, or can you
give me a hint as to what you're referring to? The lack of docstrings and
comments doesn't fill me with enthusiasm for reading it.

Nevertheless, on the basis of a quick scan, I suppose that you're
probably talking about the nested function called recurse:

def recurse(prefix_lines):
while prefix_lines:
prefix, line = prefix_lines[0]
if line == '':
prefix_lines.pop(0)
append('')
else:
block_size = get_block(prefix_lines)
if block_size == 1:
prefix_lines.pop(0)
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
block = prefix_lines[:block_size]
prefix_lines = prefix_lines[block_size:]
branch_method(output, block, recurse)
return

Since you're not even looking at the results of the pop, why don't you
just call del prefix_lines[0]? It probably won't perform any better, but
it is more idiomatic.

An alternative would be to do exactly what you want lists to do: track
the start of the list. Untested:

def recurse(prefix_lines):
start = 0
end = len(prefix_lines)
while start < end:
prefix, line = prefix_lines[start]
if line == '':
start += 1
append('')
else:
block_size = get_block(prefix_lines)
if block_size == 1:
start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
block = prefix_lines[:block_size]
start = block_size
branch_method(output, block, recurse)
return

No more O(N) deletions. Problem solved.

Arnaud Delobelle · Jan 23, 2010

Dave Angel said:
Not according to the 2.6 docs.

Indexed access is O(1) at both ends but slows to O(n) in the
middle. For fast random access, use lists instead.

Yes you are correct. This will teach me (again!) to check my facts.

That sounds to me like a doubly-linked list implementation.

I've just looked and it is a doubly-linked list of 'blocks' of size
BLOCKLEN, which is 62 on the source I have (I guess it's 62 because then
the whole block structure is 64 exactly, 62 + 1 for each link). So a
small list will have near constant random access, in a way.

Roy Smith · Jan 23, 2010

Steve Howell said:
This innocent program here literally moves about a million bytes of
memory around for no good reason:

lst = []
for i in range(2000):
lst.append(i)
while lst:
print lst.pop(0)

Why? Because list.pop(0) is implemented in O(N) instead of O(1).

I think you're being a little pedantic here. Yes, it is true that pop(0)
is O(n), and that if you put an O(n) operation in a loop, you get O(n^2)
run time.

The problem is that while it is well-known that putting something that's
O(n) in a loop gets you O(n^2), it's not well known that pop(0) for a
Python list is O(n). This is where you and I apparently start to differ in
what we think about this.

You are arguing that this is a bug in the implementation of list. While I
suppose there's some validity to that argument, I disagree. What I would
argue (and have done so several times over the years, with little success)
is that this is a bug in the documentation!

I'm looking at http://tinyurl.com/cdbwog. It shows all the operations of a
list. What it does not show is their cost. For pop(), it has a note:

"The pop() method is only supported by the list and array types. The
optional argument i defaults to -1, so that by default the last item is
removed and returned".

There's nothing there that gives any hint that pop(0) is any more expensive
than pop(-1). That is "secret knowledge", which you only get by following
the newsgroup discussions or looking at the implementation. You shouldn't
have to do either. There's lots of possible ways list could be
implemented. Without knowing the details, I'm left to guess about
important stuff like the cost of operations.

Every one of these operations should list the cost. Even if it's something
as vague as, "While not guaranteed by the language spec, in the current
implemenation of CPython ...".

Roy Smith · Jan 23, 2010

"Alf P. Steinbach said:
But it would IMHO have been better if it wasn't called "list", which brings
in the wrong associations for someone used to other languages.

+1.

When I first started using Python (back in the 1.4 days), I assumed a list
was a singly-linked list. Which, of course, leads to the assumption that
pop(0) is O(1), and lots of other wrong thinking(*).

Oddly enough, I was going to write in the above paragraph, "like a C++ STL
list", until I happened to glance at the STL docs and refreshed my memory
that an STL list is doubly-linked. Which just goes to show that making
assumptions based on names is a bad idea.

So, we're right back to my statement earlier in this thread that the docs
are deficient in that they describe behavior with no hint about cost.
Given that, it should be no surprise that users make incorrect assumptions
about cost.

(*) I suspect somebody is going to point out that list.pop was added in
some version later than 1.4, but that's a detail.

Steven D'Aprano · Jan 23, 2010

+1.

When I first started using Python (back in the 1.4 days), I assumed a
list was a singly-linked list.

Why would you do that? I can think of at least eight different
implementations of the abstract list data structure:

constant-size array
variable-size array
variable-size array with amortised O(1) appends
singly-linked list
singly-linked list with CDR coding
doubly-linked list
skip list
indexable skip list

One can reasonably disregard constant-sized arrays as a possibility,
given that Python lists aren't fixed size (pity the poor Pascal and
Fortran coders who are stuck with static arrays!), but the rest are all
reasonable possibilities. Why assume one specific implementation in the
absence of documentation promising certain performance characteristics?

Oddly enough, I was going to write in the above paragraph, "like a C++
STL list", until I happened to glance at the STL docs and refreshed my
memory that an STL list is doubly-linked. Which just goes to show that
making assumptions based on names is a bad idea.

Exactly

So, we're right back to my statement earlier in this thread that the
docs are deficient in that they describe behavior with no hint about
cost. Given that, it should be no surprise that users make incorrect
assumptions about cost.

There are quite a few problems with having the documentation specify cost:

(1) Who is going to do it? Any volunteers?

(2) Big-oh notation can be misleading, especially for naive users, or
those whose intuition for what's fast has been shaped by other languages.
Big-oh doesn't tell you whether something is fast or slow, only how it
scales -- and sometimes not even then.

(3) Having documented a particular performance, that discourages
implementation changes. Any would-be patch or new implementation not only
has to consider that the functional behaviour doesn't change, but that
the performance doesn't either.

In practice the Python developers are unlikely to make an implementation
change which leads to radically worse performance, particularly for
critical types like list and dict. But in other cases, they might choose
to change big-oh behaviour, and not wish to be tied down by documentation
of the cost of operations.

(4) How much detail is necessary? What about degenerate cases? E.g. dict
lookup in CPython is typically O(1) amortised, but if all the keys hash
to the same value, it falls to O(N).

(5) Should the language guarantee such degenerate behaviour? Who decides
which costs are guaranteed and which are not?

(6) Such performance guarantees should be implementation specific, not
language specific. CPython is only one implementation of the language out
of many.

Alf P. Steinbach · Jan 23, 2010

* Steven D'Aprano:

Why would you do that? I can think of at least eight different
implementations of the abstract list data structure:

constant-size array
variable-size array
variable-size array with amortised O(1) appends
singly-linked list
singly-linked list with CDR coding
doubly-linked list
skip list
indexable skip list

One can reasonably disregard constant-sized arrays as a possibility,
given that Python lists aren't fixed size (pity the poor Pascal and
Fortran coders who are stuck with static arrays!), but the rest are all
reasonable possibilities.

A linked list implementation would yield O(n) indexing. A great many loops in
e.g. Python libraries code now having linear time would then get quadratic time,
O(n^2). Those libraries would then be effectively unusable without extensive
rewriting: one version for ordinary Python and one for 'list-as-list' Pythons...

Thus, the linked list implementations are IMO *not* reasonable.

And the reason is precisely the implied complexity guarantees, especially on
indexing -- which could reasonably be O(log n), but not worse than that.

Why assume one specific implementation in the
absence of documentation promising certain performance characteristics?

Exactly

There are quite a few problems with having the documentation specify cost:

(1) Who is going to do it? Any volunteers?

This problem must have been addressed at each time the documentation for some
version of Python was written or updated.

(2) Big-oh notation can be misleading, especially for naive users, or
those whose intuition for what's fast has been shaped by other languages.
Big-oh doesn't tell you whether something is fast or slow, only how it
scales -- and sometimes not even then.

It's how things scale that are of interest.

Big-oh tells you an upper asymptotic limit.

That's sufficient for e.g. the C++ standard -- which, by the way, constitutes
a concrete example of the practicality of specifying complexity.

(3) Having documented a particular performance, that discourages
implementation changes. Any would-be patch or new implementation not only
has to consider that the functional behaviour doesn't change, but that
the performance doesn't either.

In practice the Python developers are unlikely to make an implementation
change which leads to radically worse performance, particularly for
critical types like list and dict. But in other cases, they might choose
to change big-oh behaviour, and not wish to be tied down by documentation
of the cost of operations.

Say that there was an O(log n) documented worst complexity for 'list' indexing.
Above you have described it as "reasonable" to break that, having O(n)
complexity... But in light of my comments on that, and especially thinking a bit
about maintainance of two or more! versions of various libraries, don't you
agree that it would be Just Bad(TM)?

(4) How much detail is necessary? What about degenerate cases? E.g. dict
lookup in CPython is typically O(1) amortised, but if all the keys hash
to the same value, it falls to O(N).

From N1745, the Technical Report 1 on C++ library extensions (will be part of
the C++0x standard), table 21 specifying general requirements of unordered
associative containers:

expression: b.find(k)
return type: iterator;
assertion: Returns an iterator pointing to an element with key equivalent
to k, or b.end() if no such element exists.
complexity: Average case O(1), worst case O(b.size()).

(5) Should the language guarantee such degenerate behaviour? Who decides
which costs are guaranteed and which are not?

I think the C++ standard (the latest draft of C++0x is freely available as PDF
from the commitee pages) provides good guidance in this regard.

(6) Such performance guarantees should be implementation specific, not
language specific. CPython is only one implementation of the language out
of many.

Disagree Very Strongly. An implementation may offer stricter guarantees. But
what matters regarding e.g. avoiding having to maintain two or three or umpteen
versions of a library, is the set of language level complexity guarantees.

Cheers,

- Alf

Steve Howell · Jan 23, 2010

deques are optimized for accessing, inserting and removing data from
both ends. For anything else it's slower than the list type. The fact
was explained in this very thread yesterday.

And the benchmark confirmed it. The slowness is fairly negligible,
though.

Steve Howell · Jan 23, 2010

Steve Howell said:
Steve Howell said:

This innocent program here literally moves about a million bytes of
memory around for no good reason:

Click to expand...

lst = []
for i in range(2000):
lst.append(i)
while lst:
print lst.pop(0)

Click to expand...

Why? Because list.pop(0) is implemented in O(N) instead of O(1).

Click to expand...

I think you're being a little pedantic here. Yes, it is true that pop(0)
is O(n), and that if you put an O(n) operation in a loop, you get O(n^2)
run time.

The problem is that while it is well-known that putting something that's
O(n) in a loop gets you O(n^2), it's not well known that pop(0) for a
Python list is O(n). This is where you and I apparently start to differ in
what we think about this.

The source for Python is open. It pretty clearly shows that you move
N bytes when you pop from the top of the list.

Less clear is how linear the performance of memmove is. My benchmarks
on the C program show that, at least on my computer, the results do
not seem to contradict the "roughly linear" assumption.

You are arguing that this is a bug in the implementation of list. While I
suppose there's some validity to that argument, I disagree. What I would
argue (and have done so several times over the years, with little success)
is that this is a bug in the documentation!

I'm looking athttp://tinyurl.com/cdbwog. It shows all the operations of a
list. What it does not show is their cost. For pop(), it has a note:

"The pop() method is only supported by the list and array types. The
optional argument i defaults to -1, so that by default the last item is
removed and returned".

There's nothing there that gives any hint that pop(0) is any more expensive
than pop(-1). That is "secret knowledge", which you only get by following
the newsgroup discussions or looking at the implementation. You shouldn't
have to do either. There's lots of possible ways list could be
implemented. Without knowing the details, I'm left to guess about
important stuff like the cost of operations.

Every one of these operations should list the cost. Even if it's something
as vague as, "While not guaranteed by the language spec, in the current
implemenation of CPython ...".

I agree with that.

Steve Howell · Jan 23, 2010

An alternative would be to do exactly what you want lists to do: track
the start of the list. Untested:

def recurse(prefix_lines):
start = 0
end = len(prefix_lines)
while start < end:
prefix, line = prefix_lines[start]
if line == '':
start += 1
append('')
else:
block_size = get_block(prefix_lines)
if block_size == 1:
start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
block = prefix_lines[:block_size]
start = block_size
branch_method(output, block, recurse)
return

No more O(N) deletions. Problem solved.

A minor modification of your solution does work, but it also slightly
+ complicates the implementation. Keeping track of the start variable
requires bookkeeping not just in recurse(), but also in methods that
it calls. This is a pretty small program, so it's acceptable to pass
around an offset variable to anybody else who might want to be
consuming the list.

############ Generic indentation stuff follows

-def get_indented_block(prefix_lines):
- prefix, line = prefix_lines[0]
+def get_indented_block(prefix_lines, start):
+ prefix, line = prefix_lines[start]
len_prefix = len(prefix)
i = 1
- while i < len(prefix_lines):
- new_prefix, line = prefix_lines
+ while i + start < len(prefix_lines):
+ new_prefix, line = prefix_lines[start+i]
if line and len(new_prefix) <= len_prefix:
break
i += 1
- while i-1 > 0 and prefix_lines[i-1][1] == '':
+ while i-1 > 0 and prefix_lines[start+i-1][1] == '':
i -= 1
return i

@@ -190,15 +190,16 @@
):
append = output.append
def recurse(prefix_lines):
- while prefix_lines:
- prefix, line = prefix_lines[0]
+ start = 0
+ while start < len(prefix_lines):
+ prefix, line = prefix_lines[start]
if line == '':
- prefix_lines.pop(0)
+ start += 1
append('')
else:
- block_size = get_block(prefix_lines)
+ block_size = get_block(prefix_lines, start)
if block_size == 1:
- prefix_lines.pop(0)
+ start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
@@ -208,8 +209,8 @@
else:
append(prefix + leaf_method(line))
else:
- block = prefix_lines[:block_size]
- prefix_lines = prefix_lines[block_size:]
+ block = prefix_lines[start:start+block_size]
+ start += block_size
branch_method(output, block, recurse)
return
prefix_lines = map(indentation_method, lines)

Steve Howell · Jan 23, 2010

This innocent program here literally moves about a million bytes of
memory around for no good reason:

Click to expand...

lst = []
for i in range(2000):
lst.append(i)
while lst:
print lst.pop(0)

Click to expand...

Why? Because list.pop(0) is implemented in O(N) instead of O(1).

Click to expand...

Why wouldn't you get a competent C programmer simply make list_ass_slice
smart enough to make list.pop(0) O(1)?

Click to expand...

Because there are always trade-offs, and the competent C programmers who
coded the implementation for lists choose different tradeoffs to the ones
you would prefer.

Seems to me that the simple solution to your problem is for you to
implement your own data structure that makes whatever trade-offs you
like. If it is good enough and popular enough, it might even replace the
existing list implementation.

The data structure that would make the tradeoffs I want would be
implemented within CPython itself. I give a sketch of the changes
elsewhere in this thread.

Terry Reedy said:

'''
If you try writing a full patch, as I believe someone did, or at least
a
prototype thereof, when the idea was discussed, you might have a
better
idea of what the tradeoffs are and why it was rejected.
'''

I have to run, but tomorrow I will try to dig through python-dev
archives and find the patch. If anybody has hints on where to look
for it (anybody remember the author, for example?), it would be much
appreciated.

If the patch looks simple, I will try to pitch the idea that its time
has come. Now that the specification of the language itself is
frozen, I think there might be more room for improving
implementations. Also, I might be able to make the argument that
tradeoffs of memory vs. CPU vs. code complexity have different forces
in the 2010s.

Thanks for your reply.

python -- question with list.pop(0)	0	Jun 7, 2004
Calculate rang and derang of ordering of subsets	0	Feb 6, 2024
Calculate rang and derang of ordering of subsets	1	Feb 6, 2024
Performance of list vs. set equality operations	19	Apr 6, 2010
Running Scripts vs Interactive mode	8	Feb 23, 2011
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020
List comprehension vs filter()	6	Apr 20, 2011
Lowest Value in List	5	Oct 2, 2013

list.pop(0) vs. collections.dequeue

Alf P. Steinbach

Steve Howell

Terry Reedy

Steve Howell

Alf P. Steinbach

Steve Howell

Terry Reedy

Steve Howell

Alf P. Steinbach

Steve Howell

Steven D'Aprano

Arnaud Delobelle

Roy Smith

Roy Smith

Steven D'Aprano

Alf P. Steinbach

Steve Howell

Steve Howell

Steve Howell

Steve Howell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads