Condition.wait(0.5) doesn't respect it's timeout

S

stephane.bisinger

Hi all,
I have a problem with Condition.wait(), it doesn't return after the
given timeout. The thing is that if I try to create a simple program,
it works as expected, but in the actual code, the timeout is not
respected (albeit the notify()s work as expected).
You can find the code I am talking about here:
http://github.com/Kjir/amsn2/blob/6...6/amsn2/gui/front_ends/curses/contact_list.py

If you clone the repository, then you can run the program like this:
$ python amsn2.py -f curses 2>> run.log
and in another term watch for prints with a tail -f run.log (You need
an MSN account). You'll notice that after the initial downloading of
the contact list, there won't be any logs (unless some of your
contacts changes status, which will trigger a notify)

Has anyone the slightest idea on what I may be doing wrong? Or am I
just lucky enough to have stumbled across a bug? Maybe pollution from
another module in other parts of the code? (Like gobject...)

Anyway just for completeness here is a sample program that works for
me:

from threading import Thread
from threading import Condition

def some_func():
while True:
cond.acquire()
while True:
cond.wait(0.5)
print "Hello, world!"

cond = Condition()
thread = Thread(target=some_func)
thread.start()

thanks,

Stéphane Bisinger
 
G

Gabriel Genellina

I have a problem with Condition.wait(), it doesn't return after the
given timeout. The thing is that if I try to create a simple program,
it works as expected, but in the actual code, the timeout is not
respected (albeit the notify()s work as expected). [...]

Has anyone the slightest idea on what I may be doing wrong? Or am I
just lucky enough to have stumbled across a bug?

Looks like a real bug :(
Maybe pollution from
another module in other parts of the code? (Like gobject...)

Anyway just for completeness here is a sample program that works for
me:

from threading import Thread
from threading import Condition

def some_func():
while True:
cond.acquire()
while True:
cond.wait(0.5)
print "Hello, world!"

cond = Condition()
thread = Thread(target=some_func)
thread.start()

If another thread has acquired the lock, cond.wait() doesn't return. Add
these lines at the end of your test and see:

sleep(2)
print "Main thread - cond.acquire()"
cond.acquire()
sleep(2)
print "Main thread - cond.release()"
cond.release()
sleep(2)
sys.exit()

The timeout is detected, but the wait method doesn't return, it's stuck at
the last line (trying to restore a saved RLock state).
I don't understand the logic behind that.
Please file a bug report at http://bugs.python.org/
 
P

Piet van Oostrum

SB> Hi all,
SB> I have a problem with Condition.wait(), it doesn't return after the
SB> given timeout. The thing is that if I try to create a simple program,
SB> it works as expected, but in the actual code, the timeout is not
SB> respected (albeit the notify()s work as expected).
SB> You can find the code I am talking about here:
SB> http://github.com/Kjir/amsn2/blob/6...6/amsn2/gui/front_ends/curses/contact_list.py
SB> If you clone the repository, then you can run the program like this:
SB> $ python amsn2.py -f curses 2>> run.log
SB> and in another term watch for prints with a tail -f run.log (You need
SB> an MSN account). You'll notice that after the initial downloading of
SB> the contact list, there won't be any logs (unless some of your
SB> contacts changes status, which will trigger a notify)
SB> Has anyone the slightest idea on what I may be doing wrong? Or am I
SB> just lucky enough to have stumbled across a bug? Maybe pollution from
SB> another module in other parts of the code? (Like gobject...)

I haven't run it (too much hassle to setup) but I noticed one strange
thing in your code:

,----
| def groupUpdated(self, gView):
| # Acquire the lock to do modifications
| self._mod_lock.acquire()
|
| if not self._groups.has_key(gView.uid):
| return
`----

In case the return is taken, the lock will not be released thereby
blocking the rest. It could be that the timeout is taken, but before the
wait can continue it has to acquire the lock again.

It is best to put all your code after the acquire in a try: finally:
like

,----
| self._mod_lock.acquire()
| try:
| do something
| finally:
| self._mod_lock.release()
`----

or even better use the with statement:

,----
| with self._mod_lock:
| do something
`----
It will do the releasing automatically

And then something else:
,----
| def __thread_run(self):
| while True:
| import sys
| print >> sys.stderr, "at loop start"
| self._mod_lock.acquire()
| t = time.time()
| # We don't want to work before at least half a second has passed
| while time.time() - t < 0.5 or not self._modified:
| print >> sys.stderr, "Going to sleep\n"
| self._mod_lock.wait(timeout=1)
| print >> sys.stderr, "Ok time to see if we must repaint"
| self.__repaint()
| t = time.time()
| self._mod_lock.release()
| print >> sys.stderr, "at loop end"
| self._mod_lock.acquire()
`----

Your loop ends with self._mod_lock.acquire() but in the next round there
is another self._mod_lock.acquire() at the beginning of the loop. So the
acquire at the end isn't necessary.
 
S

stephane.bisinger

I haven't run it (too much hassle to setup) but I noticed one strange
thing in your code:

,----
| def groupUpdated(self, gView):
|         # Acquire the lock to do modifications
|         self._mod_lock.acquire()
|  
|         if not self._groups.has_key(gView.uid):
|             return
`----

In case the return is taken, the lock will not be released thereby
blocking the rest. It could be that the timeout is taken, but before the
wait can continue it has to acquire the lock again.

That is absolutely correct, but I guess with an RLock there shouldn't
be a deadlock... Anyway I'm fixing that!
It is best to put all your code after the acquire in a try: finally:
like

,----
| self._mod_lock.acquire()
| try:
|         do something
| finally:
|         self._mod_lock.release()
`----

or even better use the with statement:

,----
| with self._mod_lock:
|      do something
`----
It will do the releasing automatically

Thanks for the advice!
And then something else:
,----
| def __thread_run(self):
|        while True:
|            import sys
|            print >> sys.stderr, "at loop start"
|            self._mod_lock.acquire()
|            t = time.time()
|            # We don't want to work before at least half a second has passed
|            while time.time() - t < 0.5 or not self._modified:
|                print >> sys.stderr, "Going to sleep\n"
|                self._mod_lock.wait(timeout=1)
|                print >> sys.stderr, "Ok time to see if we must repaint"
|            self.__repaint()
|            t = time.time()
|            self._mod_lock.release()
|            print >> sys.stderr, "at loop end"
|            self._mod_lock.acquire()
`----

Your loop ends with self._mod_lock.acquire() but in the next round there
is another self._mod_lock.acquire() at the beginning of the loop. So the
acquire at the end isn't necessary.

I have no idea how that came to be, probably pasted one line where I
shouldn't have, it's gone ;)

Anyway I cleaned up that mess and here is the resulting code, with the
same issues:
http://github.com/Kjir/amsn2/commit/bda6829d0c7d50a1cbf1188cdfa3789c4b7967c5
http://github.com/Kjir/amsn2/blob/b...5/amsn2/gui/front_ends/curses/contact_list.py

I'll also file a bug report because I am more and more convinced this
is a bug, if anything else at least in the documentation...
 
A

Aahz

I have a problem with Condition.wait(), it doesn't return after the
given timeout. The thing is that if I try to create a simple program,
it works as expected, but in the actual code, the timeout is not
respected (albeit the notify()s work as expected).

Whether or not there's a bug, you likely will simplify your code if you
switch to using a Queue().
 
S

stephane.bisinger

Whether or not there's a bug, you likely will simplify your code if you
switch to using a Queue().

I'm sorry, but I can't see how the queue would help me on this, since
I only have 2 threads and the documentation esplicitly states that it
helps with multiple threads. I have no items to put in the queue, my
only concern is not to update the screen while I am changing the state
of the contact list... Maybe there's something I didn't think of, but
I'd need more details on your part if that is the case, because I
really can't see the advantage of Queue in this situation...

Thanks,
Stéphane
 
A

Aahz

I'm sorry, but I can't see how the queue would help me on this, since
I only have 2 threads and the documentation esplicitly states that it
helps with multiple threads. I have no items to put in the queue, my
only concern is not to update the screen while I am changing the state
of the contact list... Maybe there's something I didn't think of, but
I'd need more details on your part if that is the case, because I
really can't see the advantage of Queue in this situation...

Essentially, you use the Queue instead of the Condition. When you want
to explicitly give up control in a thread, you get() on the Queue until
you get an object (with the optional timeout). When the other thread is
done processing, it puts an object on the Queue (optionally doing a
get_nowait() at some point if it wants to make sure the Queue is cleaned
up).

The critical advantage of using Queue is that you don't have to do the
acquire()/release() dance.

It's not so much that there's an advantage to using Queue in this one
specific case as the fact that you can use Queue for almost everything
you'd use other kinds of locking mechanisms, so you can reduce your
mental model for dealing with threading.
 
S

stephane.bisinger

Essentially, you use the Queue instead of the Condition.  When you want
to explicitly give up control in a thread, you get() on the Queue until
you get an object (with the optional timeout).  When the other thread is
done processing, it puts an object on the Queue (optionally doing a
get_nowait() at some point if it wants to make sure the Queue is cleaned
up).

Yep but as I said I have nothing to pass around
The critical advantage of using Queue is that you don't have to do the
acquire()/release() dance.

And neither do I have to using the "with" statement ;)
It's not so much that there's an advantage to using Queue in this one
specific case as the fact that you can use Queue for almost everything
you'd use other kinds of locking mechanisms, so you can reduce your
mental model for dealing with threading.

That surely helps not making synchronization errors, but I like fully
understanding the concept even if it means being bitten by it! But
thanks for your tips, they will surely be helpful in some other
situation...
 
A

Aahz

Yep but as I said I have nothing to pass around

You create it -- it's essentially a plain sentinel in this case.
And neither do I have to using the "with" statement ;)

Fair enough -- I was stuck on Python 2.3 in my last job and I still
haven't caught up.
 
P

Piet van Oostrum

Gabriel Genellina said:
GG> If another thread has acquired the lock, cond.wait() doesn't return. Add
GG> these lines at the end of your test and see:
GG> sleep(2)
GG> print "Main thread - cond.acquire()"
GG> cond.acquire()
GG> sleep(2)
GG> print "Main thread - cond.release()"
GG> cond.release()
GG> sleep(2)
GG> sys.exit()
GG> The timeout is detected, but the wait method doesn't return, it's stuck at
GG> the last line (trying to restore a saved RLock state).
GG> I don't understand the logic behind that.

That is the semantics of a Condition. The cond.wait() must be done in a
critical section, i.e. when the thread has acquired the lock. The wait()
temporarily releases the lock so that another thread can try to make the
way free for the waiting thread and give a notify(). When the wait()
continues it must first re-acquire the lock to make sure it has
exclusive access to the critical section. If another thread still has
the lock it has to wait for the lock being released. This is also the
case when the wait continues because the timeout has expired.
GG> Please file a bug report at http://bugs.python.org/

It is not a bug. I see the Hello World temporarily suspended and then
continuing. That's how it should be.
 
P

Piet van Oostrum

SB> That is absolutely correct, but I guess with an RLock there shouldn't
SB> be a deadlock... Anyway I'm fixing that!

I suppose the groupUpdated() method is called from a different thread
than the self._mod_lock.wait(timeout=1) because it contains a notify().
Otherwise your code would have been flawed. From the code you can see
that this is the case as the thread with wait in it is created here (in
aMSNContactListWidget.__init__) and it contains no calls to
groupUpdated. Therefore the RLock thing isn't applicable. It was,
however, applicable in the superfluous acquire in the loop, so it saved
you there.

Maybe you can put print statements at the beginning and end of each
"with" block to see if there is an unreleased lock that blocks the wait.

Some more remarks:

1. __repaint acquires the lock, but it is called from __thread_run which
has already acquired it. Because the lock is recursive it doesn't harm
but it is not necessary.

2. Importing in a thread is discouraged. I think it is cleaner to put
the import sys in the top of the module.
3. I think I have found a bug in the Condition implementation, but I
can't imagine you being hit by it. It would not be a reproducible one.
 
P

Piet van Oostrum

sb> I'll also file a bug report because I am more and more convinced this
sb> is a bug, if anything else at least in the documentation...

I looked at it more closely, and I found that the Condition.wait is
stuck on obtaining the GIL while the main thread executes the gobject
main loop. Therefore it also blocks on notifications, not only on the
timeout.

It appears that GTK and Python threads are incompatible UNLESS you call
gtk.gdk.threads_init() before gtk.main(). In your case you can do it
after the import gtk in contact_list.py. Then it should work.

By the way, I wonder why you need a timeout in your wait. I think the
notifications should be sufficient to keep the gui updated.
 
S

stephane.bisinger

2. Importing in a thread is discouraged. I think it is cleaner to put
   the import sys in the top of the module.

Yes I know I shouldn't import inside of threads, but that is something
that is going away, it's just for debugging this issue.

I looked at it more closely, and I found that the Condition.wait is
stuck on obtaining the GIL while the main thread executes the gobject
main loop. Therefore it also blocks on notifications, not only on the
timeout.

Ah-ha! My suspicions were right, then, I remembered something about
gobject being incompatible with python threads... but I couldn't find
anything in the docs, not even those of gobject, so I assumed my
memory was failing me...
It appears that GTK and Python threads are incompatible UNLESS you call
gtk.gdk.threads_init() before gtk.main(). In your case you can do it
after the import gtk in contact_list.py. Then it should work.

I'll try to do that
By the way, I wonder why you need a timeout in your wait. I think the
notifications should be sufficient to keep the gui updated.

The reason is simple: when first downloading the contactss list, I
receive a swarm of *Updated() calls, so if I redraw every time I get a
very bad visual effect and I waste a lot of CPU redrawing something
that will change in a very very short time. Hence the time
constraint... I couldn't come up with something smarter, so...

Anyway I wanted to really thank you for your time and your precious
advice, I really appreciate it!
 
S

stephane.bisinger

It appears that GTK and Python threads are incompatible UNLESS you call
gtk.gdk.threads_init() before gtk.main(). In your case you can do it
after the import gtk in contact_list.py. Then it should work.

Ok here's the deal: I'm not ending up in the gtk frontend. My code is
for the ncurses frontend, so there is no "import gtk" and "gtk.main()"
that gets executed in that case. But it is definitely around the
gobject thing, I'll try to search for this!
 
S

stephane.bisinger

Ok here's the deal: I'm not ending up in the gtk frontend. My code is
for the ncurses frontend, so there is no "import gtk" and "gtk.main()"
that gets executed in that case. But it is definitely around the
gobject thing, I'll try to search for this!

gobject.threads_init() is what was missing, now everything works as
expected! Thanks!
 
A

Aahz

The reason is simple: when first downloading the contactss list, I
receive a swarm of *Updated() calls, so if I redraw every time I get a
very bad visual effect and I waste a lot of CPU redrawing something
that will change in a very very short time. Hence the time
constraint... I couldn't come up with something smarter, so...

The redraw thread should keep track of the last time it did a redraw;
each time it receives an update event, it should check to see whether it
has been more than a specified period of time since the last redraw.
 
S

stephane.bisinger

The redraw thread should keep track of the last time it did a redraw;
each time it receives an update event, it should check to see whether it
has been more than a specified period of time since the last redraw.

That's what I do ;)
 
P

Piet van Oostrum

sb> The reason is simple: when first downloading the contactss list, I
sb> receive a swarm of *Updated() calls, so if I redraw every time I get a
sb> very bad visual effect and I waste a lot of CPU redrawing something
sb> that will change in a very very short time. Hence the time
sb> constraint... I couldn't come up with something smarter, so...

while time.time() - t < 0.5 or not self._modified:
print >> sys.stderr, "Going to sleep\n"
self._mod_lock.wait(timeout=1)
print >> sys.stderr, "Ok time to see if we must repaint"
self.__repaint()
t = time.time()

But the timeout thing will continue after that, so you have a continuous
polling loop which wastes some CPU time. I think what you want can be
more easily achieved by doing a sleep(0.5) before the outer while loop,
i.e. as the first statement in __thread_run.
 
S

stephane.bisinger

                while time.time() - t < 0.5 or not self._modified:
                    print >> sys.stderr, "Going to sleep\n"
                    self._mod_lock.wait(timeout=1)
                    print >> sys.stderr, "Ok time to see if we must repaint"
                self.__repaint()
                t = time.time()

But the timeout thing will continue after that, so you have a continuous
polling loop which wastes some CPU time. I think what you want can be
more easily achieved by doing a sleep(0.5) before the outer while loop,
i.e. as the first statement in __thread_run.

Two boolean checks every half second is not a great waste; but
thinking about your solution I've come to agree that it is smarter and
more effective. Thank you again!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top