Speeding up network access: threading?

Jens Müller · Jan 4, 2010

Hello,

what would be best practise for speeding up a larger number of http-get
requests done via urllib? Until now they are made in sequence, each request
taking up to one second. The results must be merged into a list, while the
original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple
threads.
Are there any python best practises, or even existing modules, for creating
and handling a task queue with a fixed number of concurrent threads?

Thanks and regards!

exarkun · Jan 4, 2010

Hello,

what would be best practise for speeding up a larger number of http-get
requests done via urllib? Until now they are made in sequence, each
request taking up to one second. The results must be merged into a
list, while the original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple
threads.
Are there any python best practises, or even existing modules, for
creating and handling a task queue with a fixed number of concurrent
threads?

Using multiple threads is one approach. There are a few thread pool
implementations lying about; one is part of Twisted,
<http://twistedmatrix.com/documents/current/api/twisted.python.threadpool.ThreadPool.html>.

Another approach is to use non-blocking or asynchronous I/O to make
multiple requests without using multiple threads. Twisted can help you
out with this, too. There's two async HTTP client APIs available. The
older one:

http://twistedmatrix.com/documents/current/api/twisted.web.client.getPage.html
http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html

And the newer one, introduced in 9.0:

http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html

Jean-Paul

Terry Reedy · Jan 4, 2010

Hello,

what would be best practise for speeding up a larger number of http-get
requests done via urllib? Until now they are made in sequence, each
request taking up to one second. The results must be merged into a list,
while the original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple
threads.
Are there any python best practises, or even existing modules, for
creating and handling a task queue with a fixed number of concurrent
threads?

I believe code of this type has been published here in various threads.
The fairly obvious thing to do is use a queue.queue for tasks and
another for results and a pool of threads that read, fetch, and write.

Jens Müller · Jan 5, 2010

Hello,

The fairly obvious thing to do is use a queue.queue for tasks and another
for results and a pool of threads that read, fetch, and write.

Thanks, indeed.

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

Jens

Jens Müller · Jan 5, 2010

Hello,

The fairly obvious thing to do is use a queue.queue for tasks and another
for results and a pool of threads that read, fetch, and write.

Thanks, indeed.

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

Jens

MRAB · Jan 5, 2010

Jens said:
Hello,

Thanks, indeed.

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

Terry said "queue". not "list". Use the Queue class (it's thread-safe)
in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
it's called the "queue" module).

Antoine Pitrou · Jan 5, 2010

Le Tue, 05 Jan 2010 15:04:56 +0100, Jens MÃ¼ller a Ã©critÂ :

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

The built-in list type is thread-safe, but is doesn't provide the waiting
features that queue.Queue provides.

Regards

Antoine.

Jens Müller · Jan 5, 2010

Hi and sorry for double posting - had mailer problems,

Terry said "queue". not "list". Use the Queue class (it's thread-safe)
in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
it's called the "queue" module).

Yes yes, I know. I use a queue to realize the thread pool queue, that works
all right.

But each worker thread calculates a result and needs to make it avaialable
to the application in the main thread again. Therefore, it appends its
result to a common list. This seems works as well, but I was thinking of
possible conflict situations that maybe could happen when two threads append
their results to that same result list at the same moment.

Regards,
Jens

Steve Holden · Jan 5, 2010

Jens said:
Hi and sorry for double posting - had mailer problems,

Yes yes, I know. I use a queue to realize the thread pool queue, that
works all right.

But each worker thread calculates a result and needs to make it
avaialable to the application in the main thread again. Therefore, it
appends its result to a common list. This seems works as well, but I was
thinking of possible conflict situations that maybe could happen when
two threads append their results to that same result list at the same
moment.

If you don't need to take anything off the list ever, just create a
separate thread that reads items from an output Queue and appends them
to the list.

If you *do* take them off, then use a Queue.

regards
Steve

Steve Holden · Jan 5, 2010

Jens said:
Hi and sorry for double posting - had mailer problems,

Yes yes, I know. I use a queue to realize the thread pool queue, that
works all right.

But each worker thread calculates a result and needs to make it
avaialable to the application in the main thread again. Therefore, it
appends its result to a common list. This seems works as well, but I was
thinking of possible conflict situations that maybe could happen when
two threads append their results to that same result list at the same
moment.

If you don't need to take anything off the list ever, just create a
separate thread that reads items from an output Queue and appends them
to the list.

If you *do* take them off, then use a Queue.

regards
Steve

speeding things up with C++	11	May 26, 2007
Slow network?	0	Jan 12, 2009
another thread on Python threading	9	Jun 3, 2007
Threading problem when many sockets open	1	Aug 11, 2007
Advice on sending images to clients over network	8	Jul 22, 2007
Python Scalability TCP Server + Background Game	7	Jan 15, 2014
Seeking clarity on asp.net threading	1	Apr 21, 2005
Threading in ASP.NET	1	Feb 13, 2004

Speeding up network access: threading?

Jens Müller

exarkun

Terry Reedy

Jens Müller

Jens Müller

MRAB

Antoine Pitrou

Jens Müller

Steve Holden

Steve Holden

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads