Steve said:
Why do you think the documentation has such obvious errors?
I wasn't making any assumptions, hence the question mark. The Python
docs are very good, but even the best of projects make advances that
aren't reflected in the docs.
CPython's list type uses an array of pointers to store its members. The
type is optimized for the most common list operations in Python:
iteration and appending. Python code rarely changes the head or middle
of a list. The dequeue type is an optimized data structure for popping
and inserting data at both ends.
I disagree that Python code rarely pops elements off the top of a
list. There are perfectly valid use cases for wanting a list over a
dequeue without having to pay O(N) for pop(0). Maybe we are just
quibbling over the meaning of "rarely."
When you list.pop() or list.insert() the pointers in the internal array
must be shifted. appending is much faster because the internal array is
overallocated meaning it contains free slots at the tail of the data
structure. A linked list of pointers requires more memory and iteration
is slower since since it can't utilize the CPU's cache as good as an array.
I am not proposing a linked list of pointers. I am wondering about
something like this:
p =p[1];
(and then reclaim p[0] as free memory, I already said I understood
that was the tricky bit)
The pointer arithmetic for accessing each element would still work in O
(1), right?
I think it was Dijkstr (sp?) who said you can accomplish anything with
just one more level of indirection. Clearly each attribute (variable)
that has a binding to a given list must point to a fixed piece of
memory, that cannot safely be moved, because there's no efficient way to
find all the attributes. That fixed piece is the list object, and I
expect it's 16 or 20 bytes, on a 32bit implementation. It must in turn
point to the actual malloc'ed block that contains pointers to all the
elements of the list. So that block will be 4*n where n is the number
of reserved cells, at least as large as len(). This is the block where
copying takes place when you insert or delete from the beginning.
The list object must contain a pointer to the beginning of this block,
or it wouldn't be able to free() it later. So you'd be suggesting a
second pointer that actually points to the current 0th pointer. And a
pop would simply increment this second pointer.
Such an approach might be worthwhile if you expect lots of pops and
pushes, with a minimal net change. But of course each time you did a
pop, you'd have to decide whether it was time to normalize/compact the
block.
As you say, reclaiming the 0th element of this block to the memory pool
would be tricky. Doubly so, because 1) the C memory allocator has no
such notion as resizing the beginning of a block. and 2) it would have
nothing useful to do with the 4 bytes freed. The minimum allocated
block in Python is probably 16 bytes of actual address space. I'd guess
that's 4 bytes for malloc's overhead, and 8 bytes for the minimum object
header, and 4 bytes for data. To check me, try:
(11074136, 11074152)
Anyway, this could be done, where once the two pointers get some
distance apart, you do a realloc, and copy everything. But of course
you'd want to build some hysteresis into it, to avoid thrashing.
There wouldn't be much of a performance hit, but it would increase every
list size by 4 bytes minimum. So I doubt it would be a list replacement.
This'd be an interesting project.to do as an addon module.
DaveA