Hi,
I'm writing code which deals with lists of millions to tens of
millions of
elements. It seems to me that python gobbles up memory faster than it
should in these cases.
For example, I start the python interpreter and then execute the
following code:
f = []
for i in range(1e7):
... f.append(0.1)
In another window on my Linux system I then run top. The python
process is now using 155 MB of memory. Why is python using such a
large amount of memory, nearly nearly 16 bytes for a float in this
case?
What system are you using and what python version?
The above should generate 10**7 references to a single float value of 0.1,
so it's not floats, at least python 2.3.2 for windows nt.
The references the list should be just 32-bit pointer, I think, with some
boilerplate for the list. However, range allocates a list of individual numbers
once you get past the 100 or so that are shared for optimizing small lists. That's
a list of 1e7 references to ~1e7 integers, and the integers will be objects, not
just 32-bit values, so that is what takes up the first chunk of memory, though it
will be released for reallocation (not back to windows) at the end of the loop.
You started f empty, and as you add to its length, it has to grow, which is done
by allocating ever larger spaces, by some percentage (not just a single item space,
of course, or appending would be deadly slow). As the list grows, the space for the
out-grown list versions will get released, but still (I think) for reallocation,
not back to windows. So you see a lot of space, but most of it is waiting for
garbage collection or sitting in pools as freed re-usable-by-python space.
For comparison, you might run a fresh program with just
f = [0.1]*10**7
and see how much space that uses.
Strategy has changed over time re memory allocation, so ymmv according to platform and version.
For storing a homogeneous sequence of numeric items like floats or ints etc., check
into the array module, where each element just takes its C size effectively.
I'm not sure about allocation/reallocation during growth of an array. Perhaps it
releases directly back to windows. I would have hoped that there would be a way
to allocate an array with a single memory allocation for a specified initial
capacity (as opposed to size), as you can do IIRC with C++ STL vectors, but I couldn't
find a capacity-setting mechanism. Did I miss it?
And the initializer has to be a string or list. Why not any sequence that can deliver
the right type elements? Then at least you could make an initializer object that could
act like a list without allocating a big initialization list just to chuck it.
That's not a good substitute for capacity setting though ;-/
Regards,
Bengt Richter