M
Matt Chaput
Hi,
I'm having a problem with the multiprocessing package.
I'm trying to use a simple pattern where a supervisor object starts a
bunch of worker processes, instantiating them with two queues (a job
queue for tasks to complete and an results queue for the results). The
supervisor puts all the jobs in the "job" queue, then join()s the
workers, and then pulls all the completed results off the "results" queue.
(I don't think I can just use something like Pool.imap_unordered for
this because the workers need to be objects with state.)
Here's a simplified example:
http://pastie.org/850512
The problem is that seemingly randomly, but almost always, the worker
processes will deadlock at some point and stop working before they
complete. This will leave the whole program stalled forever. This seems
more likely the more work each worker does (to the point where adding
the time.sleep(0.01) as seen in the example code above guarantees it).
The problem seems to occur on both Windows and Mac OS X.
I've tried many random variations of the code (e.g. using JoinableQueue,
calling cancel_join_thread() on one or both queues even though I have no
idea what it does, etc.) but keep having the problem.
Am I just using multiprocessing wrong? Is this a bug? Any advice?
Thanks,
Matt
I'm having a problem with the multiprocessing package.
I'm trying to use a simple pattern where a supervisor object starts a
bunch of worker processes, instantiating them with two queues (a job
queue for tasks to complete and an results queue for the results). The
supervisor puts all the jobs in the "job" queue, then join()s the
workers, and then pulls all the completed results off the "results" queue.
(I don't think I can just use something like Pool.imap_unordered for
this because the workers need to be objects with state.)
Here's a simplified example:
http://pastie.org/850512
The problem is that seemingly randomly, but almost always, the worker
processes will deadlock at some point and stop working before they
complete. This will leave the whole program stalled forever. This seems
more likely the more work each worker does (to the point where adding
the time.sleep(0.01) as seen in the example code above guarantees it).
The problem seems to occur on both Windows and Mac OS X.
I've tried many random variations of the code (e.g. using JoinableQueue,
calling cancel_join_thread() on one or both queues even though I have no
idea what it does, etc.) but keep having the problem.
Am I just using multiprocessing wrong? Is this a bug? Any advice?
Thanks,
Matt