A
Alf P. Steinbach
* Steve Howell:
Having optimized list.pop() for first element, then you would have a blind spot
with respect to insertion at the front... Which could then be optimized for the
cases where there is some buffer space at the front, left over from a pop. I
think it would just be harder to understand and harder to explain. And I think
that due to that, as usual people would build their own elaborate
"explanations", with erroneous conclusions, and would then use it in inefficient
ways (like, popping from the middle or inserting at the front).
I think the "harder to use correctly" is the strongest argument against the
optimization, but there is also a non-obvious *memory overhead*...
Having popped just one element at the front, in order for the implementation to
not /accumulate/ unused memory, that is, in order to avoid an ongoing /memory
leak/, extending the buffer to accomodate e.g. an append() can no longer be done
as a (on relevant systems) constant time reallocation (letting the OS do its
virtual memory paging magic). Instead, a new buffer would have to be allocated
and all data copied over. One could still have amortized constant time for
appends by using a doubling buffer (which is the strategy used in C++
'std::vector'), but at the cost of wasting some memory, a percentage...
The implementation as a pure array is easy to understand and use correctly, and
doesn't waste memory.
But it would IMHO have been better if it wasn't called "list", which brings in
the wrong associations for someone used to other languages. The name does
matter. E.g. I don't think this discussion about a pop optimization would have
been there except for the name, which makes that optimization sound reasonable.
Perhaps some standard alternative name could be devised. Like, "array" would
have been nice, except that that's already grabbed by the array module.
See above. In addition to being more difficult /to use/ correctly, that is,
being much easier to misunderstand, it incurs a memory overhead -- not the one
directly from the pop optimization, but by the requirements of buffer extension.
Essentially, as discussed above, it would then have to use a doubling buffer.
Cheers & hth.,
- Alf
I really want to use list *normally* with all its perfectly good
semantics and reasonable implementation, except for its blind spot
with respect to popping the first element off the list. The whole
reason I use CPython vs. C in the first place is that CPython
programmers can generally program basic data structures better than I
can. But list.pop(0) is the exception. And, with the possible
exception of dicts, lists are the most fundamental data structures
that Python has.
Having optimized list.pop() for first element, then you would have a blind spot
with respect to insertion at the front... Which could then be optimized for the
cases where there is some buffer space at the front, left over from a pop. I
think it would just be harder to understand and harder to explain. And I think
that due to that, as usual people would build their own elaborate
"explanations", with erroneous conclusions, and would then use it in inefficient
ways (like, popping from the middle or inserting at the front).
I think the "harder to use correctly" is the strongest argument against the
optimization, but there is also a non-obvious *memory overhead*...
Having popped just one element at the front, in order for the implementation to
not /accumulate/ unused memory, that is, in order to avoid an ongoing /memory
leak/, extending the buffer to accomodate e.g. an append() can no longer be done
as a (on relevant systems) constant time reallocation (letting the OS do its
virtual memory paging magic). Instead, a new buffer would have to be allocated
and all data copied over. One could still have amortized constant time for
appends by using a doubling buffer (which is the strategy used in C++
'std::vector'), but at the cost of wasting some memory, a percentage...
The implementation as a pure array is easy to understand and use correctly, and
doesn't waste memory.
But it would IMHO have been better if it wasn't called "list", which brings in
the wrong associations for someone used to other languages. The name does
matter. E.g. I don't think this discussion about a pop optimization would have
been there except for the name, which makes that optimization sound reasonable.
Perhaps some standard alternative name could be devised. Like, "array" would
have been nice, except that that's already grabbed by the array module.
I know Python's number one concern will never be speed, but if Python
makes an O(1) operation into an unnecessarily O(N) operation for no
good reasons other than "it's too complicated, " or it "adds another
pointer to the structure," or "it adds another conditional check to
list_ass_slice for operations that aren't popping off the top," I
think it's reasonable to challenge the design philosophy.
See above. In addition to being more difficult /to use/ correctly, that is,
being much easier to misunderstand, it incurs a memory overhead -- not the one
directly from the pop optimization, but by the requirements of buffer extension.
Essentially, as discussed above, it would then have to use a doubling buffer.
Cheers & hth.,
- Alf