Is this a bug in multiprocessing or in my script?

E

erikcw

Hi,

I'm trying to get multiprocessing to working consistently with my
script. I keep getting random tracebacks with no helpful
information. Sometimes it works, sometimes it doesn't.

Traceback (most recent call last):
File "scraper.py", line 144, in <module>
print pool.map(scrape, range(10))
File "/usr/lib/python2.6/multiprocessing/pool.py", line 148, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.6/multiprocessing/pool.py", line 422, in get
raise self._value
TypeError: expected string or buffer

It's not always the same traceback, but they are always short like
this. I'm running Python 2.6.2 on Ubuntu 9.04.

Any idea how I can debug this?

Thanks!
Erik
 
S

sturlamolden

It's not always the same traceback, but they are always short like
this. I'm running Python 2.6.2 on Ubuntu 9.04.

Any idea how I can debug this?

In my experience, multiprocessing is fragile. Scripts tend fo fail for
no obvious reason, case processes to be orphaned and linger, system-
wide resource leaks, etc. For example, multiprocessing uses os._exit
to stop a spawned process, even though it inevitably results in
resource leaks on Linux (it should use sys.exit). Gaël Varoquaux and I
noticed this when we implemented shared memory ndarrays for numpy; we
consistently got memory leaks with System V IPC for no obvious reason.
Even after Jesse Noller was informed of the problem (about half a year
ago), the bug still lingers. It is easy edit multiprocessing's
forking.py file on you own, but bugs like this is a pain in the ass,
and I suspect multiprocessing has many of them. Of course unless you
show us you whole script, identifying the source of your bug will be
impossible. But it may very likely be in multiprocessing as well. The
quality of this module is not impressing. I am beginning to think that
multiprocessing should never have made it into the Python standard
library. The GIL cannot be that bad! If you can't stand the GIL, get a
Unix (or Mac, Linux, Cygwin) and use os.fork. Or simply switch to a
non-GIL Python: IronPython or Jython.

Allow me to show you something better. With os.fork we can write code
like this:

class parallel(object):

def __enter__(self):
# call os.fork

def __exit__(self, exc_type, exc_value, traceback):
# call sys.exit in the child processes and
# os.waitpid in the parent

def __call__(self, iterable):
# return different sub-subsequences depending on
# child or parent status


with parallel() as p:
# parallel block starts here

for item in p(iterable):
# whatever

# parallel block ends here

This makes parallel code a lot cleaner than anything you can do with
multiprocessing, allowing you to use constructs similar to OpenMP.
Further, if you make 'parallel' a dummy context manager, you can
develop and test the algorithms serially. The only drawback is that
you have to use Cygwin to get os.fork on Windows, and forking will be
less efficient (no copy-on-write optimization). Well, this is just one
example of why Windows sucks from the perspective of the programmer.
But it also shows that you can do much better by not using
multiprocessing at all.

The only case I can think of where multiprocessing would be usesful,
is I/O bound code on Windows. But here you will almost always resort
to C extension modules. For I/O bound code, Python tends to give you a
200x speed penalty over C. If you are resorting to C anyway, you can
just use OpenMP in C for your parallel processing. We can thus forget
about multiprocessing here as well, given that we have access to the C
code. If we don't, it is still very likely that the C code releases
the GIL, and we can get away with using Python threads instead of
multiprocessing.

IMHO, if you are using multiprocessing, you are very likely to have a
design problem.

Regards,
Sturla
 
J

Jesse Noller

In my experience,multiprocessingis fragile. Scripts tend fo fail for
no obvious reason, case processes to be orphaned and linger, system-
wide resource leaks, etc. For example,multiprocessinguses os._exit
to stop a spawned process, even though it inevitably results in
resource leaks on Linux (it should use sys.exit). Gaël Varoquaux and I
noticed this when we implemented shared memory ndarrays for numpy; we
consistently got memory leaks with System V IPC for no obvious reason.
Even after Jesse Noller was informed of the problem (about half a year
ago), the bug still lingers. It is easy editmultiprocessing's
forking.py file on you own, but bugs like this is a pain in the ass,
and I suspectmultiprocessinghas many of them. Of course unless you
show us you whole script, identifying the source of your bug will be
impossible. But it may very likely be inmultiprocessingas well. The
quality of this module is not impressing. I am beginning to think thatmultiprocessingshould never have made it into the Python standard
library. The GIL cannot be that bad! If you can't stand the GIL, get a
Unix (or Mac, Linux, Cygwin) and use os.fork. Or simply switch to a
non-GIL Python: IronPython or Jython.

Allow me to show you something better. With os.fork we can write code
like this:

class parallel(object):

   def __enter__(self):
       # call os.fork

   def __exit__(self, exc_type, exc_value, traceback):
       # call sys.exit in the child processes and
       # os.waitpid in the parent

   def __call__(self, iterable):
       # return different sub-subsequences depending on
       # child or parent status

with parallel() as p:
    # parallel block starts here

    for item in p(iterable):
        # whatever

    # parallel block ends here

This makes parallel code a lot cleaner than anything you can do withmultiprocessing, allowing you to use constructs similar to OpenMP.
Further, if you make 'parallel' a dummy context manager, you can
develop and test the algorithms serially. The only drawback is that
you have to use Cygwin to get os.fork on Windows, and forking will be
less efficient (no copy-on-write optimization). Well, this is just one
example of why Windows sucks from the perspective of the programmer.
But it also shows that you can do much better by notusingmultiprocessingat all.

The only case I can think of wheremultiprocessingwould be usesful,
is I/O bound code on Windows. But here you will almost always resort
to C extension modules. For I/O bound code, Python tends to give you a
200x speed penalty over C. If you are resorting to C anyway, you can
just use OpenMP in C for your parallel processing. We can thus forget
aboutmultiprocessinghere as well, given that we have access to the C
code. If we don't, it is still very likely that the C code releases
the GIL, and we can get away withusingPython threads instead ofmultiprocessing.

IMHO, if you areusingmultiprocessing, you are very likely to have a
design problem.

Regards,
Sturla

Sturla;

That bug was fixed unless I'm missing something. Also, patches and
continued bug reports are welcome.

jesse
 
R

ryles

Traceback (most recent call last):
  File "scraper.py", line 144, in <module>
    print pool.map(scrape, range(10))
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 148, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 422, in get
    raise self._value
TypeError: expected string or buffer

This is almost certainly due to your scrape call raising an exception.
In the parent process, multiprocessing will detect if one of its
workers have terminated with an exception and then re-raise it.
However, only the exception and not the original traceback is made
available, which is making debugging more difficult for you. Here's a
simple example which demonstrates this behavior:

*** from multiprocessing import Pool
*** def evil_on_8(x):
.... if x == 8: raise ValueError("I DONT LIKE THE NUMBER 8")
.... return x + 1
....
*** pool = Pool(processes=4)[1, 2, 3, 4, 5]
*** pool.map(evil_on_8, range(10)) # 8 will cause evilness.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/bb/real/3ps/lib/python2.6/multiprocessing/pool.py", line 148,
in map
return self.map_async(func, iterable, chunksize).get()
File "/bb/real/3ps/lib/python2.6/multiprocessing/pool.py", line 422,
in get
raise self._value
ValueError: I DONT LIKE THE NUMBER 8
***

My recommendation is that you wrap your scrape code inside a try/
except and log any exception. I usually do this with logging.exception
(), or if logging is not in use, the traceback module. After that you
can simply re-raise it.
 
P

Piet van Oostrum

s> It is still in SVN. Change every call to os._exit to sys.exit
s> please. :)

Calling os.exit in a child process may be dangerous. It can cause
unflushed buffers to be flushed twice: once in the parent and once in
the child.
 
S

sturlamolden

Calling os.exit in a child process may be dangerous. It can cause
unflushed buffers to be flushed twice: once in the parent and once in
the child.

I assume you mean sys.exit. If this is the case, multiprocessing needs
a mechanism to chose between os._exit and sys.exit for child
processes. Calling os._exit might also be dangerous because it could
prevent necessary clean-up code from executing (e.g. in C
extensions). I had a case where shared memory on Linux (System V IPC)
leaked due to os._exit. The deallocator for my extension type never
got to execute in child processes. The deallocator was needed to
release the shared segment when its reference count dropped to 0.
Changing to sys.exit solved the problem. On Windows there was no leak,
because the kernel did the reference counting.
 
S

sturlamolden

http://bugs.python.org/issue6653

In the future please use the bug tracker to file and track bugs with,
so things are not as lossy.

Ok, sorry :)

Also see Piet's comment here. He has a valid case against sys.exit in
some cases. Thus it appears that both ways of shutting down child
processes might be dangerous: If we don't want buffers to flush we
have to use os._exit. If we want clean-up code to execute we have to
use sys.exit. If we want both we are screwed. :(
 
J

Jesse Noller

Ok, sorry :)

Also see Piet's comment here. He has a valid case against sys.exit in
some cases. Thus it appears that both ways of shutting down child
processes might be dangerous: If we don't want buffers to flush we
have to use os._exit. If we want clean-up code to execute we have to
use sys.exit. If we want both we are screwed. :(

Comments around this bug should go in the bug report - again, so we
don't loose them. I do not personally subscribe to this group , so
it's very easy to miss things.

jesse
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top