S
Sergey Fedorov
Hi All,
I have a web-service that needs to handle a bunch of work requests. Each
job involves IO call (DB, external web-services to fetch some data), so
part of the time is spent on the blocking IO call. On the other side, after
getting the data the job involves computational part (using numpy/pandas on
time series dataframes).
Service runs on multicore machine, so I want to use parallelism as much as
possible (especially considering python's GIL) and due to decent number of
IO, I want to use multiple threads inside each process so none of CPUs will
stale due to IO delays.
It'd be the best scenario to use pool of processes and thread pool (because
each worker will need to keep some state, like db connections). I already
have my own thread pool implementation, that uses some load-balancing and
fair-scheduling techniques that are specific to my problem domain.
I'm curious if there is any multiprocessing module that I missed and which
I can reuse. As it turned out, the on in the multiprocessing module doesn't
support custom Process class (if there were, I would be able to derive it
and add the functionality I need) (
http://stackoverflow.com/questions/740844/python-multiprocessing-pool-of-custom-processes).
Is there any alternative module that I can reuse?
If not, what's the best way to notify caller that the task finished its
execution (aka multiprocessing.Pool's apply() function behavior)? What
primitives are better to use for that purpose (in case I'll have to go with
my own implementation of multiprocessing pool)? Any reference to good
blog/educational resource will be highly appreciated!
If you believe that my solution is not optimal and have better/easier
solution (hope I specified my problem good enough), please share your
thoughts
Thanks in advance!
I have a web-service that needs to handle a bunch of work requests. Each
job involves IO call (DB, external web-services to fetch some data), so
part of the time is spent on the blocking IO call. On the other side, after
getting the data the job involves computational part (using numpy/pandas on
time series dataframes).
Service runs on multicore machine, so I want to use parallelism as much as
possible (especially considering python's GIL) and due to decent number of
IO, I want to use multiple threads inside each process so none of CPUs will
stale due to IO delays.
It'd be the best scenario to use pool of processes and thread pool (because
each worker will need to keep some state, like db connections). I already
have my own thread pool implementation, that uses some load-balancing and
fair-scheduling techniques that are specific to my problem domain.
I'm curious if there is any multiprocessing module that I missed and which
I can reuse. As it turned out, the on in the multiprocessing module doesn't
support custom Process class (if there were, I would be able to derive it
and add the functionality I need) (
http://stackoverflow.com/questions/740844/python-multiprocessing-pool-of-custom-processes).
Is there any alternative module that I can reuse?
If not, what's the best way to notify caller that the task finished its
execution (aka multiprocessing.Pool's apply() function behavior)? What
primitives are better to use for that purpose (in case I'll have to go with
my own implementation of multiprocessing pool)? Any reference to good
blog/educational resource will be highly appreciated!
If you believe that my solution is not optimal and have better/easier
solution (hope I specified my problem good enough), please share your
thoughts
Thanks in advance!