The amount of data I read in is actually small.
So the problem is probably elsewhere... Sorry, since you were talking
about huge dataset, the good old "read-whole-file-in-memory" antipattern
seemed an obvious guess.
If you see my algorithm above it deals with 2000 nodes and each node
has ot of attributes.
When I close the program my computer becomes stable and performs as
usual. I check the performance in Performance monitor and using "top"
and the total memory is being used and on top of that around half a gig
swap memory is also being used.
Please give some helpful pointers to overcome such memory errors.
A real memory leak would cause the memory usage to keep increasing as
long as your program is running. If this is not the case, it's not a
"memory error", but a design/program error. FWIW, apps like Zope can end
up using a whole lot of memory, but there's no known memory-leak problem
AFAIK. And believe me, a Zope app can end up managing a *really huge
lot* of objects (>= many thousands).
I revisited my code to find nothing so obvious which would let this
leak happen. How to kill cross references in the program.
Using weakref and/or gc might help.
FWIW, the default memory management in Python is based on
reference-counting. As long as anything keeps a reference to an object,
this object will stay alive. If you have lot of cross-references and
2000+ big objects, you may effectively end up eating all the ram and
more. The gc module can detect and manage some cyclic references (obj A
has a ref on obj B which has a ref on obj A). The weakref module uses
'proxy' references that let reference-counting do it's job (I guess the
doc will be much more explicit than me).
Another possible improvement could be to use the flyweight design
pattern to share memory for some attributes :
- a general (while somewhat Java-oriented) explanation:
http://www.exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm
- two Python exemples (the second being based on the first)
http://www.suttoncourtenay.org.uk/duncan/accu/pythonpatterns.html#flyweight
http://push.cx/2006/python-flyweights
HTH