multiprocessing module in async db query

S

Sheng

This looks like a tornado problem, but trust me, it is almost all
about the mechanism of multiprocessing module.

I borrowed the idea from http://gist.github.com/312676 to implement an
async db query web service using tornado.

p = multiprocessing.Pool(4)
class QueryHandler(tornado.web.RequestHandler):
...
@tornado.web.asynchronous
def get(self):
...
p.apply_async(async_func, [sql_command, arg1, arg2, arg3, ],
callback_func)

def callback_func(self, data):
self.write(data)

def async_func(sql_command, arg1, arg2, arg3):
'''
do the actual query job
'''
...
# data is the query result by executing sql_command
return data

So the workflow is like this,

get() --> fork a subprocess to process the query request in
async_func() -> when async_func() returns, callback_func uses the
return result of async_func as the input argument, and send the query
result to the client.

So the problem is the the query result as the result of sql_command
might be too big to store them all in the memory, which in our case is
stored in the variable "data". Can I send return from the async method
early, say immediately after the query returns with the first result
set, then stream the results to the browser. In other words, can
async_func somehow notify callback_func to prepare receiving the data
before async_func actually returns?
 
P

Philip Semanchuk

This looks like a tornado problem, but trust me, it is almost all
about the mechanism of multiprocessing module.
[snip]


So the workflow is like this,

get() --> fork a subprocess to process the query request in
async_func() -> when async_func() returns, callback_func uses the
return result of async_func as the input argument, and send the query
result to the client.

So the problem is the the query result as the result of sql_command
might be too big to store them all in the memory, which in our case is
stored in the variable "data". Can I send return from the async method
early, say immediately after the query returns with the first result
set, then stream the results to the browser. In other words, can
async_func somehow notify callback_func to prepare receiving the data
before async_func actually returns?

Hi Sheng,
Have you looked at multiprocessing.Queue objects?


HTH
Philip
 
J

John Nagle

This looks like a tornado problem, but trust me, it is almost all
about the mechanism of multiprocessing module.
[snip]


So the workflow is like this,

get() --> fork a subprocess to process the query request in
async_func() -> when async_func() returns, callback_func uses the
return result of async_func as the input argument, and send the query
result to the client.

So the problem is the the query result as the result of sql_command
might be too big to store them all in the memory, which in our case is
stored in the variable "data". Can I send return from the async method
early, say immediately after the query returns with the first result
set, then stream the results to the browser. In other words, can
async_func somehow notify callback_func to prepare receiving the data
before async_func actually returns?

Hi Sheng,
Have you looked at multiprocessing.Queue objects?

Make sure that, having made a request of the database, you
quickly read all the results. Until you finish the transaction,
the database has locks set, and other transactions may stall.
"Streaming" out to a network connection while still reading from
the database is undesirable.

If you're doing really big SELECTs, consider using LIMIT and
OFFSET in SQL to break them up into smaller bites. Especially
if the user is paging through the results.

John Nagle
 
S

Sheng

Hi Philip,

multiprocessing.Queue is used to transfer data between processes, how
it could be helpful for solving my problem? Thanks!

Sheng

This looks like a tornado problem, but trust me, it is almost all
about the mechanism of multiprocessing module.
[snip]

So the workflow is like this,
get() --> fork a subprocess to process the query request in
async_func() -> when async_func() returns, callback_func uses the
return result of async_func as the input argument, and send the query
result to the client.
So the problem is the the query result as the result of sql_command
might be too big to store them all in the memory, which in our case is
stored in the variable "data". Can I send return from the async method
early, say immediately after the query returns with the first result
set, then stream the results to the browser. In other words, can
async_func somehow notify callback_func to prepare receiving the data
before async_func actually returns?

Hi Sheng,
Have you looked at multiprocessing.Queue objects?

HTH
Philip
 
P

Philip Semanchuk

Hi Philip,

multiprocessing.Queue is used to transfer data between processes, how
it could be helpful for solving my problem? Thanks!

I misunderstood -- I thought transferring data between processes *was* your problem. If both of your functions are in the same process, I don't understand how multiprocessing figures into it at all.

If you want a function to start returning results before that function completes, and you want those results to be processed by other code *in the same process*, then you'll have to use threads. A Queue object for threads exists in the standard library too. You might find that useful.

HTH
Philip

This looks like a tornado problem, but trust me, it is almost all
about the mechanism of multiprocessing module.
[snip]

So the workflow is like this,
get() --> fork a subprocess to process the query request in
async_func() -> when async_func() returns, callback_func uses the
return result of async_func as the input argument, and send the query
result to the client.
So the problem is the the query result as the result of sql_command
might be too big to store them all in the memory, which in our case is
stored in the variable "data". Can I send return from the async method
early, say immediately after the query returns with the first result
set, then stream the results to the browser. In other words, can
async_func somehow notify callback_func to prepare receiving the data
before async_func actually returns?

Hi Sheng,
Have you looked at multiprocessing.Queue objects?

HTH
Philip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top