M
mark.engelberg
I am having trouble identifying the source of a memory leak in a
Windows Python program. The basic gist is as follows:
1. Generate a directed graph (approx. 1000 nodes).
2. Write the graph to a file.
3. Use the os.system command to invoke another program which processes
the graph file (graphViz), and generates a gif image of the graph.
4. Use another os.system command to delete the intermediate file, which
is no longer needed.
5. Append to a global list variable some summary information about the
graph (a few floats, a few ints, and a couple of short strings).
6. Repeat the above steps several thousand times.
7. Sort the global summary list.
8. Write the summary information out to a file.
When running this program, Python consumes all memory after about a
thousand graphs.
I have confirmed the following:
1. I am closing all files that I open.
2. Although within the directed graph, nodes refer back and forth to
each other, the main object which manages the directed graph is not
part of anything cyclical. It's reference count is in fact going to
zero when the graph goes out of scope.
3. After running gc.collect(), I have confirmed that the nodes are not
in the garbage list. I interpret this to mean that although their may
be cyclical references among them, the collector was able to determine
that they were unreachable as a group, and successfully collected them.
4. I am generating all these graphs in a for loop (using xrange), not
through a recursive process that would cause the stack to grow.
So really the only thing that should be growing here is the global list
that maintains summary information, and since I'm only adding little
bits of data, it's hard to imagine how this could be exhausting 2GB of
memory after just a thousand iterations.
Can anyone help me think of other possible sources of a leak? Are
there any classic "python-gotchas" I'm missing here, such as something
leaking from calling os.system so many times? Any other suggestions as
to what strategies I could employ to track down the leak?
Thanks,
Mark
Windows Python program. The basic gist is as follows:
1. Generate a directed graph (approx. 1000 nodes).
2. Write the graph to a file.
3. Use the os.system command to invoke another program which processes
the graph file (graphViz), and generates a gif image of the graph.
4. Use another os.system command to delete the intermediate file, which
is no longer needed.
5. Append to a global list variable some summary information about the
graph (a few floats, a few ints, and a couple of short strings).
6. Repeat the above steps several thousand times.
7. Sort the global summary list.
8. Write the summary information out to a file.
When running this program, Python consumes all memory after about a
thousand graphs.
I have confirmed the following:
1. I am closing all files that I open.
2. Although within the directed graph, nodes refer back and forth to
each other, the main object which manages the directed graph is not
part of anything cyclical. It's reference count is in fact going to
zero when the graph goes out of scope.
3. After running gc.collect(), I have confirmed that the nodes are not
in the garbage list. I interpret this to mean that although their may
be cyclical references among them, the collector was able to determine
that they were unreachable as a group, and successfully collected them.
4. I am generating all these graphs in a for loop (using xrange), not
through a recursive process that would cause the stack to grow.
So really the only thing that should be growing here is the global list
that maintains summary information, and since I'm only adding little
bits of data, it's hard to imagine how this could be exhausting 2GB of
memory after just a thousand iterations.
Can anyone help me think of other possible sources of a leak? Are
there any classic "python-gotchas" I'm missing here, such as something
leaking from calling os.system so many times? Any other suggestions as
to what strategies I could employ to track down the leak?
Thanks,
Mark