A
Artur Siekielski
Hi.
I'm using CPython 2.7 and Linux. In order to make parallel
computations on a large list of objects I want to use multiple
processes (by using multiprocessing module). In the first step I fill
the list with objects and then I fork() my worker processes that do
the job.
This should work optimally in the aspect of memory usage because Linux
implements copy-on-write in forked processes. So I should have only
one physical list of objects (the worker processes don't change the
objects on the list). The problem is that after a short time children
processes are using more and more memory (they don't create new
objects - they only read objects from the list and write computation
result to the database).
After investigation I concluded the source of this must be
incrementing of a reference counter when getting an object from the
list. It changes only one int but OS must copy the whole memory page
to the child process. I reimplemented the function for getting the
element (from the file listobject.c) but omitting the PY_INCREF call
and it solved my problems with increasing memory.
The questions is: are there any better ways to have a real read-only
list (in terms of memory representation of objects)? My solution is of
course not safe. I thought about weakrefs but it seems they cannot be
used here because getting a real reference from a weakref increases a
reference counter. Maybe another option would be to store reference
counters not in objects, but in a separate array to minimize number of
memory pages they occupy...
I'm using CPython 2.7 and Linux. In order to make parallel
computations on a large list of objects I want to use multiple
processes (by using multiprocessing module). In the first step I fill
the list with objects and then I fork() my worker processes that do
the job.
This should work optimally in the aspect of memory usage because Linux
implements copy-on-write in forked processes. So I should have only
one physical list of objects (the worker processes don't change the
objects on the list). The problem is that after a short time children
processes are using more and more memory (they don't create new
objects - they only read objects from the list and write computation
result to the database).
After investigation I concluded the source of this must be
incrementing of a reference counter when getting an object from the
list. It changes only one int but OS must copy the whole memory page
to the child process. I reimplemented the function for getting the
element (from the file listobject.c) but omitting the PY_INCREF call
and it solved my problems with increasing memory.
The questions is: are there any better ways to have a real read-only
list (in terms of memory representation of objects)? My solution is of
course not safe. I thought about weakrefs but it seems they cannot be
used here because getting a real reference from a weakref increases a
reference counter. Maybe another option would be to store reference
counters not in objects, but in a separate array to minimize number of
memory pages they occupy...