Tracking memory usage and object life time.

B

Berteun Damman

Hello,

I have programmed some python script that loads a graph (the
mathemical one with vertices and edges) into memory, does some
transformations on it, and then tries to find shortest paths in this
graph, typically several tens of thousands. This works fine.

Then I made a test for this, so I could time it, run it several times
and take a look at the best time, et cetera. But it so happens that
the first time the test is run, is always the fastest. If I track
memory usage of Python in top, I see it starts out with around 80 MB
and slowly grows to 500MB. This might cause the slowdown (which is
about a factor 5 for large graphs).

When I run a test, I disable the garbage collection during the test
run (as is adviced), but just before starting a test I instruct the
garbage collector to collect. Running the test without disabling the
garbage collect doesn't show any difference though.

Where possible I explicitly 'del' some of the larger data structures
that have been created after I don't need them anymore. I furthermore
don't really see why there would be references to these larger objects
left. (I can be mistaken of course).

I understand this might be a bit of a vague problem, but does someone
have any idea why the memory usage keeps growing? And whether there is
some tool that assists me in keeping track of the objects currently
alive and the amount of memory they occupy?

The best I now can do is run the whole script several times (from a
shell script) -- but this also forces Python to reparse the graph
input again, and do some other stuff it only has to do once. And it's
also more difficult to examine values and results this way.

Berteun
 
B

Bjoern Schliessmann

Berteun said:
When I run a test, I disable the garbage collection during the
test run (as is adviced), but just before starting a test I
instruct the garbage collector to collect. Running the test
without disabling the garbage collect doesn't show any difference
though.

Did you check the return value of gc.collect? Also, try using
other "insight" facilities provided by the gc module.
Where possible I explicitly 'del' some of the larger data
structures that have been created after I don't need them anymore.

You cannot "del" structures, you only "del" names. Objects are
deleted when they are not bound to any names when and if the
garbage collector "wants" to delete them.
I furthermore don't really see why there would be references to
these larger objects left. (I can be mistaken of course).

Be sure to check for cyclic references, they can be a problem for
the GC.

Regards,


Björn
 
B

Berteun Damman

Did you check the return value of gc.collect? Also, try using
other "insight" facilities provided by the gc module.
gc.collect states it cannot find any unreachable objects. Meanwhile
the number of objects the garbage collector has to keep track of keeps
increasing.
You cannot "del" structures, you only "del" names. Objects are
deleted when they are not bound to any names when and if the
garbage collector "wants" to delete them.
I understand, but just before I del the name, I ask for the refererres
to the object the name indicates, and there's only one object. Since
it is a local variable, I think this is logical.

This object is a dictionary which contains strings as keys, and heaps
as values. This heap consists of tuples. Every string is referenced
more than once (that's logical), the heaps are only referenced once.
So I would expect those to be destroyed if I destroy the dictionary. I
furthermore assume that if I call gc.collect() I force the garbage
collector to collect? Even if it wouldn't "want" to collect
otherwise?
Be sure to check for cyclic references, they can be a problem for
the GC.
I don't see how these could occur. It's basically something like list
(of lists possibly) of ints/strings. No list containing itself. I'll
see whether I can make a stripped down version which exhibits the same
memory growth.

Berteun
 
I

Istvan Albert

that have been created after I don't need them anymore. I furthermore
don't really see why there would be references to these larger objects
left. (I can be mistaken of course).

This could be tricky because you have a graph that (probably) allows
you to walk its nodes, thus even having a single other reference to
any of the nodes could keep the entire graph "alive"
The best I now can do is run the whole script several times (from a
shell script) -- but this also forces Python to reparse the graph
input again, and do some other stuff it only has to do once. A

you could pickle and save the graph once the initial processing is
done. That way subsequent runs will load substantially faster.

i.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top