Speeding up network access: threading?

J

Jens Müller

Hello,

what would be best practise for speeding up a larger number of http-get
requests done via urllib? Until now they are made in sequence, each request
taking up to one second. The results must be merged into a list, while the
original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple
threads.
Are there any python best practises, or even existing modules, for creating
and handling a task queue with a fixed number of concurrent threads?

Thanks and regards!
 
E

exarkun

Hello,

what would be best practise for speeding up a larger number of http-get
requests done via urllib? Until now they are made in sequence, each
request taking up to one second. The results must be merged into a
list, while the original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple
threads.
Are there any python best practises, or even existing modules, for
creating and handling a task queue with a fixed number of concurrent
threads?

Using multiple threads is one approach. There are a few thread pool
implementations lying about; one is part of Twisted,
<http://twistedmatrix.com/documents/current/api/twisted.python.threadpool.ThreadPool.html>.

Another approach is to use non-blocking or asynchronous I/O to make
multiple requests without using multiple threads. Twisted can help you
out with this, too. There's two async HTTP client APIs available. The
older one:

http://twistedmatrix.com/documents/current/api/twisted.web.client.getPage.html
http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html

And the newer one, introduced in 9.0:

http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html

Jean-Paul
 
T

Terry Reedy

Hello,

what would be best practise for speeding up a larger number of http-get
requests done via urllib? Until now they are made in sequence, each
request taking up to one second. The results must be merged into a list,
while the original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple
threads.
Are there any python best practises, or even existing modules, for
creating and handling a task queue with a fixed number of concurrent
threads?

I believe code of this type has been published here in various threads.
The fairly obvious thing to do is use a queue.queue for tasks and
another for results and a pool of threads that read, fetch, and write.
 
J

Jens Müller

Hello,
The fairly obvious thing to do is use a queue.queue for tasks and another
for results and a pool of threads that read, fetch, and write.

Thanks, indeed.

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

Jens
 
J

Jens Müller

Hello,
The fairly obvious thing to do is use a queue.queue for tasks and another
for results and a pool of threads that read, fetch, and write.

Thanks, indeed.

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

Jens
 
M

MRAB

Jens said:
Hello,


Thanks, indeed.

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.
Terry said "queue". not "list". Use the Queue class (it's thread-safe)
in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
it's called the "queue" module).
 
A

Antoine Pitrou

Le Tue, 05 Jan 2010 15:04:56 +0100, Jens Müller a écrit :
Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

The built-in list type is thread-safe, but is doesn't provide the waiting
features that queue.Queue provides.

Regards

Antoine.
 
J

Jens Müller

Hi and sorry for double posting - had mailer problems,
Terry said "queue". not "list". Use the Queue class (it's thread-safe)
in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
it's called the "queue" module).

Yes yes, I know. I use a queue to realize the thread pool queue, that works
all right.

But each worker thread calculates a result and needs to make it avaialable
to the application in the main thread again. Therefore, it appends its
result to a common list. This seems works as well, but I was thinking of
possible conflict situations that maybe could happen when two threads append
their results to that same result list at the same moment.

Regards,
Jens
 
S

Steve Holden

Jens said:
Hi and sorry for double posting - had mailer problems,


Yes yes, I know. I use a queue to realize the thread pool queue, that
works all right.

But each worker thread calculates a result and needs to make it
avaialable to the application in the main thread again. Therefore, it
appends its result to a common list. This seems works as well, but I was
thinking of possible conflict situations that maybe could happen when
two threads append their results to that same result list at the same
moment.
If you don't need to take anything off the list ever, just create a
separate thread that reads items from an output Queue and appends them
to the list.

If you *do* take them off, then use a Queue.

regards
Steve
 
S

Steve Holden

Jens said:
Hi and sorry for double posting - had mailer problems,


Yes yes, I know. I use a queue to realize the thread pool queue, that
works all right.

But each worker thread calculates a result and needs to make it
avaialable to the application in the main thread again. Therefore, it
appends its result to a common list. This seems works as well, but I was
thinking of possible conflict situations that maybe could happen when
two threads append their results to that same result list at the same
moment.
If you don't need to take anything off the list ever, just create a
separate thread that reads items from an output Queue and appends them
to the list.

If you *do* take them off, then use a Queue.

regards
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top