simple Thread question

A

adeger

Having trouble with my first forays into threads. Basically, the
threads don't seem to be working in parallel (or you might say are
blocking). I've boiled my problems to the following short code block
and ensuing output. Seems like the output should be all interleaved
and of course it's not. Running Python 2.2 from ActiveState on
Windows XP (also doesn't work on Windows 2000).

Thanks in advance!
adeger

#====================================================

import threading

class TestThr(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)

def run(self, name):
import time
for i in range(1,11):
print 'thread ', name, ' instance ', str(i)
time.sleep(1)

threads = []
for inst in ('a', 'b', 'c'):
thread = TestThr()
thread.run(inst)
threads.append(thread)

# output below
thread a instance 1
thread a instance 2
thread a instance 3
thread a instance 4
thread a instance 5
thread a instance 6
thread a instance 7
thread a instance 8
thread a instance 9
thread a instance 10
thread b instance 1
thread b instance 2
thread b instance 3
thread b instance 4
thread b instance 5
thread b instance 6
thread b instance 7
thread b instance 8
thread b instance 9
thread b instance 10
thread c instance 1
thread c instance 2
thread c instance 3
thread c instance 4
thread c instance 5
thread c instance 6
thread c instance 7
thread c instance 8
thread c instance 9
thread c instance 10
 
C

Christopher T King

Having trouble with my first forays into threads. Basically, the
threads don't seem to be working in parallel (or you might say are
blocking). I've boiled my problems to the following short code block
and ensuing output. Seems like the output should be all interleaved
and of course it's not. Running Python 2.2 from ActiveState on
Windows XP (also doesn't work on Windows 2000).

The Python interpreter isn't too thread-friendly. Because it's not
re-entrant, it has to make use of a Global Interpreter Lock in order to
keep internal structures from getting mangled. This lock only allows one
thread to access the interpreter at a time, and switches threads every
hundred or so bytecodes. The likely cause of your problem is that your
loops don't reach this switching threshold -- try using xrange(100) or
higher.

The GIL is released during blocking I/O (or other) operations, and C
extensions can release the lock if they plan on doing lots of non-Python
processing. Because of the former property, another thing you can try is
inserting a time.sleep(.1) inside of each loop -- being a blocking I/O
operation, this should cause the GIL to be released, and your threads will
switch each time through the loop.

Aside from the performance loss on parallel-processing machines, there is
usually no reason to worry about the GIL in Python code: so long as you
make proper use of the thread synchronization routines, everything will
work as you intend it to.
 
P

Peter Hansen

Christopher said:
The Python interpreter isn't too thread-friendly. Because it's not
re-entrant, it has to make use of a Global Interpreter Lock in order to
keep internal structures from getting mangled. ....

Notwithstanding the rest of your answer, Christopher, I have to
say that in my opinion, the Python interpreter is *very*
thread-friendly. Obviously this just depends on differing
ideas of what it means to be "friendly" to threads, but I find
Python to be the most reliable and easiest to use environment
for multi-threaded applications that I've ever used. I know
what you meant by this, but if nothing else the criticism doesn't
serve to promote Python very well...

The world would probably be a much better place if most of its
multi-threaded applications were rewritten in Python.

(And it would be better still if some of them were then
re-designed to use Twisted, but that's another story. ;-)

-Peter
 
D

Dave Brueck

Christopher said:
The Python interpreter isn't too thread-friendly. Because it's not
re-entrant, it has to make use of a Global Interpreter Lock in order to
keep internal structures from getting mangled. This lock only allows one
thread to access the interpreter at a time, and switches threads every
hundred or so bytecodes. The likely cause of your problem is that your
loops don't reach this switching threshold -- try using xrange(100) or
higher.

Er... you're jumping the gun a bit - no need to scare the OP away from
threads with all these details about the GIL when the problem was simply
that the threads were never started.

[snip]
Because of the former property, another thing you can try is
inserting a time.sleep(.1) inside of each loop

Uh, did you read his code before responding? (hint: he's already doing
that) :)

-Dave
 
C

Christopher T King

Obviously this just depends on differing ideas of what it means to be
"friendly" to threads, but I find Python to be the most reliable and
easiest to use environment for multi-threaded applications that I've
ever used.

Poor choice of words on my part... I didn't mean API wise (though I'm not
a huge fan of threading), but rather implementation-wise (having the GIL
and all).
 
J

JCM

You should override the run method, but call thread.start() to kick
the execution off in a separate thread. If you call thread.run(),
you're just running your code in the same thread.

adeger said:
Having trouble with my first forays into threads. Basically, the
threads don't seem to be working in parallel (or you might say are
blocking). I've boiled my problems to the following short code block
and ensuing output. Seems like the output should be all interleaved
and of course it's not. Running Python 2.2 from ActiveState on
Windows XP (also doesn't work on Windows 2000).
Thanks in advance!
adeger

import threading
class TestThr(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self, name):
import time
for i in range(1,11):
print 'thread ', name, ' instance ', str(i)
time.sleep(1)
threads = []
for inst in ('a', 'b', 'c'):
thread = TestThr()
thread.run(inst)
threads.append(thread)
 
C

Christopher T King

Er... you're jumping the gun a bit - no need to scare the OP away from
threads with all these details about the GIL when the problem was simply
that the threads were never started.

[snip]
Because of the former property, another thing you can try is
inserting a time.sleep(.1) inside of each loop

Uh, did you read his code before responding? (hint: he's already doing
that) :)

Wow, I really screwed that up :p I really should stop posting to this list
in a hurry. My eyes glanced at the short loop, the thread-starting code
(didn't notice the .run() instead of .start()), and the output, and
concluded "GIL funkiness" (something I've been hit with before).

FWIW, replacing .run() with .start() (and dropping name from .run()) and
dropping the sleep() produces the same symptoms the OP was showing.
 
P

Peter Hansen

Christopher said:
Poor choice of words on my part... I didn't mean API wise (though I'm not
a huge fan of threading), but rather implementation-wise (having the GIL
and all).

I realize that's what you meant, and I stick with my opinion that
it is *exactly that* which makes Python a much better environment
for all the threaded applications I've written than the other
environments I've tried in the past.

-Peter
 
A

adeger

JCM said:
You should override the run method, but call thread.start() to kick
the execution off in a separate thread. If you call thread.run(),
you're just running your code in the same thread.

Thanks JCM and everyone else for your help! This wasn't that clear to
me reading the docs (I really DID read them) and your answer saved me
a bunch of personal (and program execution) time!
 
D

David Pokorny

adeger said:
JCM <[email protected]> wrote in message

Thanks JCM and everyone else for your help! This wasn't that clear to
me reading the docs (I really DID read them) and your answer saved me
a bunch of personal (and program execution) time!

It's worth pointing out that while print bytecodes are atomic, print
statements are not.
So unless a) you use your own lock to control output via "print"
or b) you pass only 1 argument to print, you run the risk of creating output
like:

....
thread a instance 33
thread a instance 34
thread a instance 35
thread a instance 36
thread a instance 37
thread a instance 38
thread a thread b instance instance 39
1
thread thread a instance b instance 2
40
thread a thread b instance 3
thread b instance 4
thread b instance 5
thread b instance 6
thread b instance 7
....


To create the desired bad behavior, comment out the three statements that
refer to plock.
---
import threading,time,sys

plock = threading.Lock()

class TestThr(threading.Thread):
def __init__(self,name):
self.name = name
threading.Thread.__init__(self)

def run(self):
for i in range(1,100):
plock.acquire()
print 'thread', self.name, 'instance', str(i)
plock.release()
#time.sleep(.1)

def test():
for tname in ('a', 'b', 'c'):
thread = TestThr(tname)
thread.start()
 
B

Beeyah

def run(self, name):
import time
for i in range(1,11):
print 'thread ', name, ' instance ', str(i)
time.sleep(1)

You're importing the time module for every TestThr() object you
create, that's no good. You only need to import the module once, at
the beginning of the file like you've done with threading.
 
P

Peter Hansen

Beeyah said:
You're importing the time module for every TestThr() object you
create, that's no good. You only need to import the module once, at
the beginning of the file like you've done with threading.

While it's true that that is all that's needed, it is not
required, and doesn't really provide any performance
improvements. I often do the above sort of thing where I
use a given module in only one place, especially when using
threads where it feels subtly cleaner to defer loading of
some modules until the thread actually starts.

Modules are really ever imported only once and after that
the "import" statement amounts to retrieving a reference
by looking it up in the sys.modules dictionary. In the
case of a builtin module like "time", there isn't even
any significant overhead involved in the first import,
as there might be with a .py module (where the timestamp
encoded in the .pyc file is compared with the timestamp
of the .py file, and the .py file is recompiled if
necessary, and then a new module object is created and
the bytecode is executed in its namespace, etc).

-Peter
 
A

Aahz

While it's true that that is all that's needed, it is not required,
and doesn't really provide any performance improvements. I often do
the above sort of thing where I use a given module in only one place,
especially when using threads where it feels subtly cleaner to defer
loading of some modules until the thread actually starts.

Except that the import lock makes this a Bad Idea, IMO.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"To me vi is Zen. To use vi is to practice zen. Every command is a
koan. Profound to the user, unintelligible to the uninitiated. You
discover truth everytime you use it." (e-mail address removed)
 
P

Peter Hansen

Aahz said:
Except that the import lock makes this a Bad Idea, IMO.

Does the import lock apply to the first import, or even
to all later ones where it can just find it in sys.modules?

-Peter
 
A

Aahz

Does the import lock apply to the first import, or even
to all later ones where it can just find it in sys.modules?

It applies to all -- after all, sys.modules gets updated before the
module code is run...
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"To me vi is Zen. To use vi is to practice zen. Every command is a
koan. Profound to the user, unintelligible to the uninitiated. You
discover truth everytime you use it." (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,202
Messages
2,571,058
Members
47,668
Latest member
SamiraShac

Latest Threads

Top