memory usage

D

Dave

Hi,
I'm writing code which deals with lists of millions to tens of
millions of
elements. It seems to me that python gobbles up memory faster than it
should in these cases.

For example, I start the python interpreter and then execute the
following code:
.... f.append(0.1)

In another window on my Linux system I then run top. The python
process is now using 155 MB of memory. Why is python using such a
large amount of memory, nearly nearly 16 bytes for a float in this
case?
 
R

Rainer Deyke

Dave said:
In another window on my Linux system I then run top. The python
process is now using 155 MB of memory. Why is python using such a
large amount of memory, nearly nearly 16 bytes for a float in this
case?

2 lists.
10000000 floats.
10000000 integers.
20000002 reference counts.
20000002 type identifiers.
Whatever extra memory the memory manager uses to keep track of 20000002
separately allocated small blocks of memory.
 
B

Bengt Richter

Hi,
I'm writing code which deals with lists of millions to tens of
millions of
elements. It seems to me that python gobbles up memory faster than it
should in these cases.

For example, I start the python interpreter and then execute the
following code:
f = []
for i in range(1e7):
... f.append(0.1)

In another window on my Linux system I then run top. The python
process is now using 155 MB of memory. Why is python using such a
large amount of memory, nearly nearly 16 bytes for a float in this
case?
What system are you using and what python version?
The above should generate 10**7 references to a single float value of 0.1,
so it's not floats, at least python 2.3.2 for windows nt.
The references the list should be just 32-bit pointer, I think, with some
boilerplate for the list. However, range allocates a list of individual numbers
once you get past the 100 or so that are shared for optimizing small lists. That's
a list of 1e7 references to ~1e7 integers, and the integers will be objects, not
just 32-bit values, so that is what takes up the first chunk of memory, though it
will be released for reallocation (not back to windows) at the end of the loop.
You started f empty, and as you add to its length, it has to grow, which is done
by allocating ever larger spaces, by some percentage (not just a single item space,
of course, or appending would be deadly slow). As the list grows, the space for the
out-grown list versions will get released, but still (I think) for reallocation,
not back to windows. So you see a lot of space, but most of it is waiting for
garbage collection or sitting in pools as freed re-usable-by-python space.

For comparison, you might run a fresh program with just
f = [0.1]*10**7
and see how much space that uses.

Strategy has changed over time re memory allocation, so ymmv according to platform and version.

For storing a homogeneous sequence of numeric items like floats or ints etc., check
into the array module, where each element just takes its C size effectively.

I'm not sure about allocation/reallocation during growth of an array. Perhaps it
releases directly back to windows. I would have hoped that there would be a way
to allocate an array with a single memory allocation for a specified initial
capacity (as opposed to size), as you can do IIRC with C++ STL vectors, but I couldn't
find a capacity-setting mechanism. Did I miss it?

And the initializer has to be a string or list. Why not any sequence that can deliver
the right type elements? Then at least you could make an initializer object that could
act like a list without allocating a big initialization list just to chuck it.

That's not a good substitute for capacity setting though ;-/

Regards,
Bengt Richter
 
T

Terry Reedy

Dave said:
Hi,
I'm writing code which deals with lists of millions to tens of millions of
elements. It seems to me that python gobbles up memory faster than it
should in these cases.

Then you might want to use Numerical Python arrays, which wrap C arrays of
native data (ie, 8 bytes per float).
For example, I start the python interpreter and then execute the
following code:
f = []
for i in range(1e7):

You will be happier with xrange(), which was added for this sort of usage.
... f.append(0.1)

In another window on my Linux system I then run top. The python
process is now using 155 MB of memory. Why is python using such a
large amount of memory, nearly nearly 16 bytes for a float in this
case?

8 bytes of data, 8 bytes of overhead, I believe.

Terry J. Reedy
 
R

Robert Toop

Dave said:
Hi,
I'm writing code which deals with lists of millions to tens of
millions of
elements. It seems to me that python gobbles up memory faster than it
should in these cases.

For example, I start the python interpreter and then execute the
following code:
f = []
for i in range(1e7):
... f.append(0.1)

In another window on my Linux system I then run top. The python
process is now using 155 MB of memory. Why is python using such a
large amount of memory, nearly nearly 16 bytes for a float in this
case?

8 bytes for each float, and I'm guessing 8 bytes for 2 pointers to form the
list.
One needs 2 pointers in order to facilitate nearby forward/backward
reference,
and to be able to remove an element without having to search from the start.
An extensible list cannot be a pointer-free contiguous array unless all
elements
are copied each time you append to it.

Access to a linked list like this requires not only extra memory,
but extra time, to say nothing of large RAM to avoid virtual memory paging.
You'd be better off with static-dimensioned (hence contiguous) arrays.
Of course, that might be impractical for your app.

If your lists require dynamic extension and sparse random access,
you'd be better off with an indexed tree (database-like) storage scheme.
I'm too new to python to know if such packages exist, but I'll bet they do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,172
Messages
2,570,933
Members
47,473
Latest member
ChristelPe

Latest Threads

Top