run function in separate process

M

malkarouri

Hi everyone,

I have written a function that runs functions in separate processes. I
hope you can help me improving it, and I would like to submit it to
the Python cookbook if its quality is good enough.

I was writing a numerical program (using numpy) which uses huge
amounts of memory, the memory increasing with time. The program
structure was essentially:

for radius in radii:
result = do_work(params)

where do_work actually uses a large number of temporary arrays. The
variable params is large as well and is the result of computations
before the loop.

After playing with gc for some time, trying to convince it to to
release the memory, I gave up. I will be happy, by the way, if
somebody points me to a web page/reference that says how to call a
function then reclaim the whole memory back in python.

Meanwhile, the best that I could do is fork a process, compute the
results, and return them back to the parent process. This I
implemented in the following function, which is kinda working for me
now, but I am sure it can be much improved. There should be a better
way to return the result that a temporary file, for example. I
actually thought of posting this after noticing that the pypy project
had what I thought was a similar thing in their testing, but they
probably dealt with it differently in the autotest driver [1]; I am
not sure.

Here is the function:

def run_in_separate_process(f, *args, **kwds):
from os import tmpnam, fork, waitpid, remove
from sys import exit
from pickle import load, dump
from contextlib import closing
fname = tmpnam()
pid = fork()
if pid > 0: #parent
waitpid(pid, 0) # should have checked for correct finishing
with closing(file(fname)) as f:
result = load(f)
remove(fname)
return result
else: #child
result = f(*args, **kwds)
with closing(file(fname,'w')) as f:
dump(result, f)
exit(0)


To be used as:

for radius in radii:
result = run_in_separate_process (do_work, params)

[1] http://codespeak.net/pipermail/pypy-dev/2006q3/003273.html



Regards,

Muhammad Alkarouri
 
K

kyosohma

Hi everyone,

I have written a function that runs functions in separate processes. I
hope you can help me improving it, and I would like to submit it to
the Python cookbook if its quality is good enough.

I was writing a numerical program (using numpy) which uses huge
amounts of memory, the memory increasing with time. The program
structure was essentially:

for radius in radii:
result = do_work(params)

where do_work actually uses a large number of temporary arrays. The
variable params is large as well and is the result of computations
before the loop.

After playing with gc for some time, trying to convince it to to
release the memory, I gave up. I will be happy, by the way, if
somebody points me to a web page/reference that says how to call a
function then reclaim the whole memory back in python.

Meanwhile, the best that I could do is fork a process, compute the
results, and return them back to the parent process. This I
implemented in the following function, which is kinda working for me
now, but I am sure it can be much improved. There should be a better
way to return the result that a temporary file, for example. I
actually thought of posting this after noticing that the pypy project
had what I thought was a similar thing in their testing, but they
probably dealt with it differently in the autotest driver [1]; I am
not sure.

Here is the function:

def run_in_separate_process(f, *args, **kwds):
from os import tmpnam, fork, waitpid, remove
from sys import exit
from pickle import load, dump
from contextlib import closing
fname = tmpnam()
pid = fork()
if pid > 0: #parent
waitpid(pid, 0) # should have checked for correct finishing
with closing(file(fname)) as f:
result = load(f)
remove(fname)
return result
else: #child
result = f(*args, **kwds)
with closing(file(fname,'w')) as f:
dump(result, f)
exit(0)

To be used as:

for radius in radii:
result = run_in_separate_process (do_work, params)

[1]http://codespeak.net/pipermail/pypy-dev/2006q3/003273.html

Regards,

Muhammad Alkarouri

I found a post on a similar topic that looks like it may give you some
ideas:

http://mail.python.org/pipermail/python-list/2004-October/285400.html
http://www.artima.com/forums/flat.jsp?forum=106&thread=174099
http://www.nabble.com/memory-manage-in-python-fu-t3386442.html
http://www.thescripts.com/forum/thread620226.html

Mike
 
A

Alex Martelli

somebody points me to a web page/reference that says how to call a
function then reclaim the whole memory back in python.

Meanwhile, the best that I could do is fork a process, compute the
results, and return them back to the parent process. This I

That's my favorite way to ensure that all resources get reclaimed: let
the operating system do the job.
implemented in the following function, which is kinda working for me
now, but I am sure it can be much improved. There should be a better
way to return the result that a temporary file, for example. I

You can use a pipe. I.e. (untested code):

def run_in_separate_process(f, *a, **k):
import os, sys, cPickle
pread, pwrite = os.pipe()
pid = os.fork()
if pid>0:
os.close(pwrite)
with os.fdopen(pread, 'rb') as f:
return cPickle.load(f)
else:
os.close(pread)
result = f(*a, **k)
with os.fdopen(pwrite, 'wb') as f:
cPickle.dump(f, -1)
sys.exit()

Using cPickle instead of pickle, and a negative protocol (on the files
pedantically specified as binary:), meaning the latest and greatest
available pickling protocol, rather than the default 0, should improve
performance.


Alex
 
M

malkarouri

Thanks Mike for you answer. I will use the occasion to add some
comments on the links and on my approach.

I am programming in Python 2.5, mainly to avoid the bug that memory
arenas were never freed before.
The program is working on both Mac OS X (intel) and Linux, so I prefer
portable approaches.

On Apr 11, 3:34 pm, (e-mail address removed) wrote:
[...]
I found a post on a similar topic that looks like it may give you some
ideas:

http://mail.python.org/pipermail/python-list/2004-October/285400.html

I see the comment about using mmap as valuable. I tried to use that
using numpy.memmap but I wasn't successful. I don't remember why at
the moment.
The other tricks are problem-dependent, and my case is not like them
(I believe).

Good ideas. I hope that python will grow a replacable gc one day. I
think that pypy already has a choice at the moment.

Bingo! This thread actually reaches more or less the same conclusion.
In fact, Alex Martelli describes the exact pattern in
http://mail.python.org/pipermail/python-list/2007-March/431910.html

I probably got the idea from a previous thread by him or somebody
else. It should be much earlier than March, though, as my program was
working since last year.

So, let's say the function I have written is an implementation of
Alex's architectural pattern. Probably makes it easier to get in the
cookbook:)

Regards,

Muhammad
 
M

malkarouri

On Apr 11, 3:58 pm, (e-mail address removed) (Alex Martelli) wrote:
[...]
That's my favorite way to ensure that all resources get reclaimed: let
the operating system do the job.

Thanks a lot, Alex, for confirming the basic idea. I will be playing
with your function later today, and will give more feedback.
I think I avoided the pipe on the mistaken belief that pipes cannot be
binary. I know, I should've tested. And I avoided pickle at the time
because I had a structure that was unpicklable (grown by me using a
mixture of python, C, ctypes and pyrex at the time). The structure is
improved now, and I will go for the more standard approach..

Regards,

Muhammad
 
M

malkarouri

On Apr 11, 4:36 pm, (e-mail address removed) wrote:
[...]
.. And I avoided pickle at the time
because I had a structure that was unpicklable (grown by me using a
mixture of python, C, ctypes and pyrex at the time). The structure is
improved now, and I will go for the more standard approach..

Sorry, I was speaking about an older version of my code. The code is
already using pickle, and yes, cPickle is better.

Still trying the code. So far, after modifying the line:

cPickle.dump(f, -1)

to:

cPickle.dump(result, f, -1)

it is working.

Regards,

Muhammad
 
M

malkarouri

After playing with Alex's implementation, and adding some support for
exceptions, this is what I came up with. I hope I am not getting too
clever for my needs:

import os, cPickle
def run_in_separate_process_2(f, *args, **kwds):
pread, pwrite = os.pipe()
pid = os.fork()
if pid > 0:
os.close(pwrite)
with os.fdopen(pread, 'rb') as f:
status, result = cPickle.load(f)
os.waitpid(pid, 0)
if status == 0:
return result
else:
raise result
else:
os.close(pread)
try:
result = f(*args, **kwds)
status = 0
except Exception, exc:
result = exc
status = 1
with os.fdopen(pwrite, 'wb') as f:
try:
cPickle.dump((status,result), f,
cPickle.HIGHEST_PROTOCOL)
except cPickle.PicklingError, exc:
cPickle.dump((2,exc), f, cPickle.HIGHEST_PROTOCOL)
f.close()
os._exit(0)



Basically, the function is called in the child process, and a status
code is returned in addition to the result. The status is 0 if the
function returns normally, 1 if it raises an exception, and 2 if the
result is unpicklable. Some cases are deliberately not handled, like a
SystemExit or a KeyboardInterrupt show up as EOF errors in the
unpickling in the parent. Some cases are inadvertently not handled,
these are called bugs. And the original exception trace is lost. Any
comments?

Regards,

Muhammad Alkarouri
 
M

malkarouri

After playing a little with Alex's function, I got to:

import os, cPickle
def run_in_separate_process_2(f, *args, **kwds):
pread, pwrite = os.pipe()
pid = os.fork()
if pid > 0:
os.close(pwrite)
with os.fdopen(pread, 'rb') as f:
status, result = cPickle.load(f)
os.waitpid(pid, 0)
if status == 0:
return result
else:
raise result
else:
os.close(pread)
try:
result = f(*args, **kwds)
status = 0
except Exception, exc:
result = exc
status = 1
with os.fdopen(pwrite, 'wb') as f:
try:
cPickle.dump((status,result), f,
cPickle.HIGHEST_PROTOCOL)
except cPickle.PicklingError, exc:
cPickle.dump((2,exc), f, cPickle.HIGHEST_PROTOCOL)
f.close()
os._exit(0)


It handles exceptions as well, partially. Basically the child process
returns a status code as well as a result. If the status is 0, then
the function returned successfully and its result is returned. If the
status is 1, then the function raised an exception, which will be
raised in the parent. If the status is 2, then the function has
returned successfully but the result is not picklable, an exception is
raised.
Exceptions such as SystemExit and KeyboardInterrupt in the child are
not checked and will result in an EOFError in the parent.

Any comments?

Regards,

Muhammad
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top