memory use with regard to large pickle files

C

Catherine Moroney

I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine
 
M

Martin v. Löwis

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I would say so, yes. As you use 5GiB of memory, it seems you are
running a 64-bit system.

On such a system, each pointer takes 8 bytes. In addition,
each object takes at least 16 bytes; if it's variable-sized,
it takes at least 24 bytes, plus the actual data in the object.

OTOH, in a pickle, a pointer takes no space, unless it's a
shared pointer (i.e. backwards reference), which takes
as many digits as you need to encode the "object number"
in the pickle. Each primitive object takes only a single byte
overhead (as opposed to 24), causing quite drastic space
reductions. Of course, non-primitive objects take more, as
they need to encode the class they are instances of.
Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

In Python 2.6, there is the sys.getsizeof function. For
earlier versions, the asizeof package gives similar results.

Regards,
Martin
 
L

Lawrence D'Oliveiro

I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

Job for a database?
 
A

Aaron Brady

Catherine said:
I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine

There's always the 'shelve' module.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top