B
bryanjugglercryptographer
MrsEntity said:Based on heapy, a db based solution would be serious overkill.
I've embraced overkill and my life is better for it. Don't confuse overkillwith cost. Overkill is your friend.
The facts of the case: You need to save some derived strings for each of 2Minput lines. Even half the input runs over the 2GB RAM in your (virtual) machine. You're using Ubuntu 12.04 in Virtualbox on Win7/64, Python 2.7/64.
That screams "sqlite3". It's overkill, in a good way. It's already there for the importing.
Other approaches? You could try to keep everything in RAM, but use less. Tim Chase pointed out the memory-efficiency of named tuples. You could save some more by switching to Win7/32, Python 2.7/32; VirtualBox makes trying such alternatives quick and easy.
Or you could add memory. Compared to good old 32-bit, 64-bit operation consumes significantly more memory and supports vastly more memory. There's a bit of a mis-match in a 64-bit system with just 2GB of RAM. I know, sounds weird, "just" two billion bytes of RAM. I'll rephrase: just ten dollars worth of RAM. Less if you buy it where I do.
I don't know why the memory profiling tools are misleading you. I can thinkof plausible explanations, but they'd just be guesses. There's nothing allthat surprising in running out of RAM, given what you've explained. A couple K per line is easy to burn.
-Bryan