Queues - Is Infinity all right?

A

Anand Pillai

The standard Python Queue module, allows to generate queues that
have no size limit, by passing the size argument as <= 0.

q = Queue(0)

In a multithreaded application, these queues could be useful
when you have many threads using the queue for data access
and synchronization.

Is there a performance hit when one uses an 'infinite' queue
when compared to a queue of fixed size?

Of course it should depend upon the O(n) of the Queue data structure,
how the time of access (get/put) varies with the number of items in
it.

I would like to have some light on this. Is it constant, O(n) or
a higher order?

Thanks

-Anand
 
P

Peter Hansen

Anand said:
The standard Python Queue module, allows to generate queues that
have no size limit, by passing the size argument as <= 0.

q = Queue(0)

In a multithreaded application, these queues could be useful
when you have many threads using the queue for data access
and synchronization.

Is there a performance hit when one uses an 'infinite' queue
when compared to a queue of fixed size?

Of course it should depend upon the O(n) of the Queue data structure,
how the time of access (get/put) varies with the number of items in
it.

I would like to have some light on this. Is it constant, O(n) or
a higher order?

Checking the source, Queue is implemented as a sophisticated wrapper
around a standard Python list []. That means adding an item is
amortized O(1), while removing one is O(n).

That said, I'd consider using a fixed queue only when I was greatly
concerned about whether my consumer thread was going to be able to
keep up with my producer thread(s), and was therefore concerned about
overflowing memory with unconsumed input.

In other words, I'd worry more about the robustness of the app
than about optimizing it before it worked...

-Peter
 
P

Paul Rubin

Peter Hansen said:
Checking the source, Queue is implemented as a sophisticated wrapper
around a standard Python list []. That means adding an item is
amortized O(1), while removing one is O(n).

Why is removing one O(n)? That should be easy to fix, if true.
 
J

Jeremy Fincher

Paul Rubin said:
Peter Hansen said:
Checking the source, Queue is implemented as a sophisticated wrapper
around a standard Python list []. That means adding an item is
amortized O(1), while removing one is O(n).

Why is removing one O(n)? That should be easy to fix, if true.

The queue implementation he's referring to works somewhat like this:

class queue(list):
def enqueue(self, item):
self.append(item)
def dequeue(self):
return self.pop(0)

It's the self.pop(0) that's the transgressor here. All the elements
after it must be moved back one index, so it's O(n) in the size of the
queue.

Sure, it's a simple fix, and there are several ways to make O(1)
queues, but that implementation, aside from being simple, also has a
*very* low constant. Having tested several competing O(1) queues
against this implementation, none were faster until the number of
items in the queue increased beyond several hundred, sometimes even
into the (low) thousands.

As a side note, the fastest O(1) queue algorithm in my testing was the
following:

class queue(object):
__slots__ = ('back', 'front')
def __init__(self, iterable):
self.front = []
self.back = []
for item in iterable:
self.back.append(item)
def enqueue(self, item):
self.back.append(item)
def dequeue(self):
if not self.front:
self.back.reverse()
self.front = self.back
self.back = []
return self.front.pop()

(Although its efficiency obviously depends on how well enqueues and
dequeues are intermixed; while still O(1), it's not highly efficient
when its average length is 1, obviously.)

Jeremy
 
L

logistix at cathoderaymission.net

Paul Rubin said:
Peter Hansen said:
Checking the source, Queue is implemented as a sophisticated wrapper
around a standard Python list []. That means adding an item is
amortized O(1), while removing one is O(n).

Why is removing one O(n)? That should be easy to fix, if true.

A list is an array of pointers to py_objects, not a linked list. So
removing element x from a list of size y requires something like:

for z in range(x,y):
l[z] = l[z+1]

But since you're only copying the pointers and not the underlying
objects, it's a pretty quick O(n)
 
P

Peter Hansen

Peter Hansen wrote:

...

I've been bitten a few times by unbounded queues when the consumer
thread could not keep up: the queue grows to memory limits and
no robustness is left.

So I changed my default choice for queue implementations to bounded
queues. They are much more predictable under unexpected loads
as they evt. force the producer to wait for the consumer.

The next question is off course what maximum queue size to use.
You can get good answers from queuing theory, but 10 to 20
works well in many cases.

Reading your reply, and my original response, I am second-thinking
my above-quoted philosophy. Certainly, as even I said above, robustness
should be the concern, and not performance. Nevertheless, clearly
this is an issue of robustness primarily, so it would seem that a
predefined limit, well above what is expected to be required, would be a
good idea to help ensure robustness.

-Peter
 
A

Anand Pillai

This was exactly what my problem was. The producer thread was
creating a lot of data to the queue which the consumer cannot
keep up with.

Since I am using python threads, it so happens most of the time
that the queue is filled to huge sizes, and the consumer thread
context switch is not done, causing memory problems.

I have not gone through the code of Queue.py, but I think that
the queue.get() method is an O(n) operation, considering the
ration of 'put's to the 'get's.

A related question here is... How do you avoid producer thread
deadlocking when the consumer queue is full?

If the question is not clear, my queue is of size 'n'. It is already
full with 'n' pieces of data inside it. All of my 'n' producer threads
are trying to push data into it at the *same* time and there is no
consumer thread popping the data off (In my program, the consumers
are the producers themselves, a bad design I agree...).

This is clearly a deadlock situtation, since no thread will enter the
queue forever. I solved it by creating a new consumer thread to pop
off some data, when deadlock occurs. But sometimes the deadlock
occurs very often resulting in many extra threads being created like this.

The other approach is to write separate out consumer and producer thread
roles so that there is always one consumer thread somewhere which is
popping data off the queue, which is what I have done now.

Anyway how do you make sure that dead lock situations not occur?
Can you write some Event() based mechanisms to make sure that the
threads always access the queue at different times? Since python offers
no control over thread priority or mechanisms to suspend a thread, how
else could this be done?

That brings me to a favorite topic, to implement a thread wrapper
over the existing python threading API to allow suspending of threads
and modifying thread priority. Has this been considered as a PEP anytime
before?

Thank you all

-Anand
 
Y

ykingma

Peter Hansen wrote:


....
That said, I'd consider using a fixed queue only when I was greatly
concerned about whether my consumer thread was going to be able to
keep up with my producer thread(s), and was therefore concerned about
overflowing memory with unconsumed input.

In other words, I'd worry more about the robustness of the app
than about optimizing it before it worked...

I've been bitten a few times by unbounded queues when the consumer
thread could not keep up: the queue grows to memory limits and
no robustness is left.

So I changed my default choice for queue implementations to bounded
queues. They are much more predictable under unexpected loads
as they evt. force the producer to wait for the consumer.

The next question is off course what maximum queue size to use.
You can get good answers from queuing theory, but 10 to 20
works well in many cases.

Kind regards,
Ype
 
P

Peter Hansen

Anand said:
This was exactly what my problem was. The producer thread was
creating a lot of data to the queue which the consumer cannot
keep up with.

Since I am using python threads, it so happens most of the time
that the queue is filled to huge sizes, and the consumer thread
context switch is not done, causing memory problems.

I have not gone through the code of Queue.py, but I think that
the queue.get() method is an O(n) operation, considering the
ration of 'put's to the 'get's.

I think you might be misinterpreting things. Remember that put()
and get() on a list are actually implemented in C code, so relative
to anything you would actually do with the data that you get(),
and to anything you actually do in order to produce the data that
you put(), the actual put() and get() operations are effectively
both instantaneous (and thus in effect O(1)) unless you are talking
literally thousands of items in the Queue.

And you seem to be suggesting it is because get() is O(n) that you
are even getting to this overload point, but the time for the get()
won't become noticeable until you've already had a problem for
quite some time.

-Peter
 
Y

Ype Kingma

Anand said:
This was exactly what my problem was. The producer thread was
creating a lot of data to the queue which the consumer cannot
keep up with.

Since I am using python threads, it so happens most of the time
that the queue is filled to huge sizes, and the consumer thread
context switch is not done, causing memory problems.

I have not gone through the code of Queue.py, but I think that
the queue.get() method is an O(n) operation, considering the
ration of 'put's to the 'get's.

As Peter already remarked, the source of the your problem is not
the get() operation.
A related question here is... How do you avoid producer thread
deadlocking when the consumer queue is full?

By using a separate thread for the consumer(s), as you indicate
below.
If the question is not clear, my queue is of size 'n'. It is already
full with 'n' pieces of data inside it. All of my 'n' producer threads
are trying to push data into it at the *same* time and there is no
consumer thread popping the data off (In my program, the consumers
are the producers themselves, a bad design I agree...).

That's probably the source of your problem.
This is clearly a deadlock situtation, since no thread will enter the
queue forever. I solved it by creating a new consumer thread to pop
off some data, when deadlock occurs. But sometimes the deadlock
occurs very often resulting in many extra threads being created like this.

The other approach is to write separate out consumer and producer thread
roles so that there is always one consumer thread somewhere which is
popping data off the queue, which is what I have done now.

Anyway how do you make sure that dead lock situations not occur?

When there is a cycle between the producers and the consumers, ie.
back from the consumers to the producers, you may need one queue
(eg. the backwards queue) of infinite size to prevent deadlock.
However, you should make sure that no thread accesses a queue
in the wrong direction, because that might still cause deadlock.

In case you have a fixed ratio between the amounts of things
on the various queues, you might consider not using queues at all
for these fixed ratios.
Can you write some Event() based mechanisms to make sure that the
threads always access the queue at different times? Since python offers

A queue provides synchronized access already, you don't need
anything else.
no control over thread priority or mechanisms to suspend a thread, how
else could this be done?

You can make reasonably sure that work is going on everywhere by keeping the
queues small enough. This forces more thread switching, though.
That brings me to a favorite topic, to implement a thread wrapper
over the existing python threading API to allow suspending of threads
and modifying thread priority. Has this been considered as a PEP anytime
before?

Thread suspension is (very) difficult to implement correctly.

Thread priorities are good eg. when you have background work going
on and you want some short interactive pieces of work done at normal
speed. Also, if you have a series of threads and queues it can
be helpful to give the first thread a higher priority when it also has
to wait for external events.

I don't know whether there was a PEP for any of this.

Kind regards,
Ype
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,225
Members
46,815
Latest member
treekmostly22

Latest Threads

Top