The Future of Python Threading

J

Justin T.

Hello,

While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.

Python is not multi-threading friendly. Any code that deals with the
python interpreter must hold the global interpreter lock (GIL). This
has the effect of serializing (to a certain extent) all python
specific operations. IE, any thread that is written purely in python
will not release the GIL except at particular (and possibly non-
optimal) times. Currently that's the rather arbitrary quantum of 100
bytecode instructions. Since the ability of the OS to schedule python
threads is based on when its possible to run that thread (according to
the lock), python threads do not benefit from a good scheduler in the
same manner that real OS threads do, even though python threads are
supposed to be a thin wrapper around real OS threads[1].

The detrimental effects of the GIL have been discussed several times
and nobody has ever done anything about it. This is because the GIL
isn't really that bad right now. The GIL isn't held that much, and
pthreads spawned by python-C interations (Ie, those that reside in
extensions) can do all their processing concurrently as long as they
aren't dealing with python data. What this means is that python
multithreading isn't really broken as long as python is thought of as
a convenient way of manipulating C. After all, 100 bytecode
instructions go by pretty quickly, so the GIL isn't really THAT
invasive.

Python, however, is much better than a convenient method of
manipulating C. Python provides a simple language which can be
implemented in any way, so long as promised behaviors continue. We
should take advantage of that.

The truth is that the future (and present reality) of almost every
form of computing is multi-core, and there currently is no effective
way of dealing with concurrency. We still worry about setting up
threads, synchronization of message queues, synchronization of shared
memory regions, dealing with asynchronous behaviors, and most
importantly, how threaded an application should be. All of this is
possible to do manually in C, but its hardly optimal. For instance, at
compile time you have no idea if your library is going to be running
on a machine with 1 processor or 100. Knowing that makes a huge
difference in architecture as 200 threads might run fine on the 100
core machine where it might thrash the single processor to death.
Thread pools help, but they need to be set up and initialized. There
are very few good thread pool implementations that are meant for
generic use.

It is my feeling that there is no better way of dealing with dynamic
threading than to use a dynamic language. Stackless python has proven
that clever manipulation of the stack can dramatically improve
concurrent performance in a single thread. Stackless revolves around
tasklets, which are a nearly universal concept.

For those who don't follow experimental python implementations,
stackless essentially provides an integrated scheduler for "green
threads" (tasklets), or extremely lightweight snippets of code that
can be run concurrently. It even provides a nice way of messaging
between the tasklets.

When you think about it, lots of object oriented code can be organized
as tasklets. After all, encapsulation provides an environment where
side effects of running functions can be minimized, and is thus
somewhat easily parallelized (with respect to other objects).
Functional programming is, of course, ideal, but its hardly the trendy
thing these days. Maybe that will change when people realize how much
easier it is to test and parallelize.

What these seemingly unrelated thoughts come down to is a perfect
opportunity to become THE next generation language. It is already far
more advanced than almost every other language out there. By
integrating stackless into an architecture where tasklets can be
divided over several parallelizable threads, it will be able to
capitalize on performance gains that will have people using python
just for its performance, rather than that being the excuse not to use
it.

The nice thing is that this requires a fairly doable amount of work.
First, stackless should be integrated into the core. Then there should
be an effort to remove the reliance on the GIL for python threading.
After that, advanced features like moving tasklets amongst threads
should be explored. I can imagine a world where a single python web
application is able to redistribute its millions of requests amongst
thousands of threads without the developer ever being aware that the
application would eventually scale. An efficient and natively multi-
threaded implementation of python will be invaluable as cores continue
to multiply like rabbits.

There has been much discussion on this in the past [2]. Those
discussions, I feel, were premature. Now that stackless is mature (and
continuation free!), Py3k is in full swing, and parallel programming
has been fully realized as THE next big problem for computer science,
the time is ripe for discussing how we will approach multi-threading
in the future.

Justin

[1] I haven't actually looked at the GIL code. It's possible that it
creates a bunch of wait queues for each nice level that a python
thread is running at and just wakes up the higher priority threads
first, thus maintaining the nice values determined by the scheduler,
or something. I highly doubt it. I bet every python thread gets an
equal chance of getting that lock despite whatever patterns the
scheduler may have noticed.

[2]
http://groups.google.com/group/comp...a6a5d976?q=stackless,+thread+paradigm&lnk=ol&
http://www.stackless.com/pipermail/stackless/2003-June/000742.html
.... More that I lost, just search this group.
 
S

Steve Holden

Justin said:
Hello,

While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python. [...]
What these seemingly unrelated thoughts come down to is a perfect
opportunity to become THE next generation language. It is already far
more advanced than almost every other language out there. By
integrating stackless into an architecture where tasklets can be
divided over several parallelizable threads, it will be able to
capitalize on performance gains that will have people using python
just for its performance, rather than that being the excuse not to use
it.
Aah, the path to world domination. You know you don't *have* to use
Python for *everything*.
The nice thing is that this requires a fairly doable amount of work.
First, stackless should be integrated into the core. Then there should
be an effort to remove the reliance on the GIL for python threading.
After that, advanced features like moving tasklets amongst threads
should be explored. I can imagine a world where a single python web
application is able to redistribute its millions of requests amongst
thousands of threads without the developer ever being aware that the
application would eventually scale. An efficient and natively multi-
threaded implementation of python will be invaluable as cores continue
to multiply like rabbits.
Be my guest, if it's so simple.
There has been much discussion on this in the past [2]. Those
discussions, I feel, were premature. Now that stackless is mature (and
continuation free!), Py3k is in full swing, and parallel programming
has been fully realized as THE next big problem for computer science,
the time is ripe for discussing how we will approach multi-threading
in the future.
I doubt that a thread on c.l.py is going to change much. It's the
python-dev and py3k lists where you'll need to take up the cudgels,
because I can almost guarantee nobody is going to take the GIL out of
2.6 or 2.7.
[1] I haven't actually looked at the GIL code. It's possible that it
creates a bunch of wait queues for each nice level that a python
thread is running at and just wakes up the higher priority threads
first, thus maintaining the nice values determined by the scheduler,
or something. I highly doubt it. I bet every python thread gets an
equal chance of getting that lock despite whatever patterns the
scheduler may have noticed.
It's possible that a little imp tosses a coin and decides which thread
gets to run next. The point of open source is that it's easy to dispel
ignorance and make your own changes if you are competent to do so. Talk
is cheap, code is costly. Your bet is worth nothing. Is it even possible
to run threads of the same process at different priority levels on all
platforms?
Anyone who's been around Python for a while is familiar with the issues.
Have you actually asked Chris Tismer whether he's happy to see Stackless
go into the core? He was far from certain that would be a good idea at
PyCon earlier this year. He probably has his reasons ...

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
 
P

Paul Boddie

While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.

Python is not multi-threading friendly.

Yes it is: Jython and IronPython support free threading; CPython,
however, does not. Meanwhile, take a look at this page for different
flavours of alternative solutions:

http://wiki.python.org/moin/ParallelProcessing

Paul
 
B

Bjoern Schliessmann

Justin said:
The detrimental effects of the GIL have been discussed several
times and nobody has ever done anything about it.

Also it has been discussed that dropping the GIL concept requires
very fine locking mechanisms inside the interpreter to keep data
serialised. The overhead managing those fine ones properly is not
at all negligible, and I think it's more than managing GIL.
The truth is that the future (and present reality) of almost every
form of computing is multi-core,

Is it? 8)

The question is: If it really was, how much of useful performance
gain would you get?

Regards,


Björn
 
K

kyosohma

Be my guest, if it's so simple.



regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

I've been watching this Intel TBB thing and it sounds like it might be
useful for threading. This fellow claims that he's going to play with
SWIG and may use it to create bindings to TBB for Python, although
he's starting with Perl:

http://softwareblogs.intel.com/2007/07/26/threading-building-blocks-and-c/

Dunno what the license ramifications are though.

Mike
 
A

Aahz

I doubt that a thread on c.l.py is going to change much. It's the
python-dev and py3k lists where you'll need to take up the cudgels,
because I can almost guarantee nobody is going to take the GIL out of
2.6 or 2.7.

Actually, python-ideas is the right place for this. Threads have been
hashed and rehashed over and over and over and over and over again
through the years; python-dev and python-3000 should not get cluttered
because another Quixote rides up.
 
C

Cameron Laird

Also it has been discussed that dropping the GIL concept requires
very fine locking mechanisms inside the interpreter to keep data
serialised. The overhead managing those fine ones properly is not
at all negligible, and I think it's more than managing GIL.


Is it? 8)
.
.
.
I reinforce some of these points slightly:
A. An effective change to the GIL impacts not just
Python in the sense of the Reference Manual, but
also all its extensions. That has the potential
to be a big, big cost; and
B. At least part of the attention given multi-core
processors--or, perhaps more accurately, the
claims that multi-cores deserve threading re-
writes--smells like vendor manipulation.
 
B

Ben Finney

Justin T. said:
The truth is that the future (and present reality) of almost every
form of computing is multi-core, and there currently is no effective
way of dealing with concurrency.

Your post seems to take threading as the *only* way to write code for
multi-core systems, which certainly isn't so.

Last I checked, multiple processes can run concurrently on multi-core
systems. That's a well-established way of structuring a program.
We still worry about setting up threads, synchronization of message
queues, synchronization of shared memory regions, dealing with
asynchronous behaviors, and most importantly, how threaded an
application should be.

All of which is avoided by designing the program to operate as
discrete processes communicating via well-defined IPC mechanisms.
 
K

king kikapu

All of which is avoided by designing the program to operate as
discrete processes communicating via well-defined IPC mechanisms.

Hi Ben,

i would like to learn more about this, have you got any links to give
me so i can have a look ?
 
N

Nick Craig-Wood

Bjoern Schliessmann said:
Also it has been discussed that dropping the GIL concept requires
very fine locking mechanisms inside the interpreter to keep data
serialised. The overhead managing those fine ones properly is not
at all negligible, and I think it's more than managing GIL.

That is certainly true. However the point being is that running on 2
CPUs at once at 95% efficiency is much better than running on only 1
at 99%...
Is it? 8)

Intel, AMD and Sun would have you believe that yes!
The question is: If it really was, how much of useful performance
gain would you get?

The linux kernel has been through these growing pains already... SMP
support was initially done with the Big Kernel Lock (BKL) which is
exactly equivalent to the GIL.

The linux kernel has moved onwards to finer and finer grained locking.

I think one important point to learn from the linux experience is to
make the locking a compile time choice. You can build a uniprocessor
linux kernel with all the locking taken out or you can build a
multi-processor kernel with fine grained locking.

I'd like to see a python build as it is at the moment and a python-mt
build which has the GIL broken down into a lock on each object.
python-mt would certainly be slower for non threaded tasks, but it
would certainly be quicker for threaded tasks on multiple CPU
computers.

The user could then choose which python to run.

This would of course make C extensions more complicated...
 
B

Ben Sizer

Your post seems to take threading as the *only* way to write code for
multi-core systems, which certainly isn't so.

Last I checked, multiple processes can run concurrently on multi-core
systems. That's a well-established way of structuring a program.

It is, however, almost always more complex and slower-performing.

Plus, it's underdocumented. Most academic study of concurrent
programming, while referring to the separately executing units as
'processes', almost always assume a shared memory space and the
associated primitives that go along with that.
All of which is avoided by designing the program to operate as
discrete processes communicating via well-defined IPC mechanisms.

Hardly. Sure, so you don't have to worry about contention over objects
in memory, but it's still completely asynchronous, and there will
still be a large degree of waiting for the other processes to respond,
and you have to develop the protocols to communicate. Apart from
convenient serialisation, Python doesn't exactly make IPC easy, unlike
Java's RMI for example.
 
B

brad

Justin said:
Hello,

While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.

This is all anecdotal... threads in Python work great for me. I like
Ruby's green threads too, but I find py threads to be more robust. We've
written a tcp scanner (threaded connects) and can process twenty ports
on two /16s (roughly 171K hosts) in about twenty minutes. I'm happy with
that. However, that's all I've ever really used threads for, so I'm
probably less of an expert than you are :) I guess it comes down to what
you're doing with them.

Brad
 
B

brad

brad said:
This is all anecdotal... threads in Python work great for me. I like
Ruby's green threads too,

I forgot to mention that Ruby is moving to a GIL over green threads in v2.0
 
A

Aahz

This would of course make C extensions more complicated...

It's even worse than that. One of the goals for Python is to make it
easy to call into random libraries, and there are still plenty around
that aren't thread-safe (not even talking about thread-hot).
 
C

Chris Mellon

It is, however, almost always more complex and slower-performing.

Plus, it's underdocumented. Most academic study of concurrent
programming, while referring to the separately executing units as
'processes', almost always assume a shared memory space and the
associated primitives that go along with that.

This is simply not true. Firstly, there's a well defined difference
between 'process' and a 'thread' and that is that processes have
private memory spaces. Nobody says "process" when they mean threads of
execution within a shared memory space and if they do they're wrong.

And no, "most" academic study isn't limited to shared memory spaces.
In fact, almost every improvement in concurrency has been moving
*away* from simple shared memory - the closest thing to it is
transactional memory, which is like shared memory but with
transactional semantics instead of simple sharing. Message passing and
"shared-nothing" concurrency are very popular and extremely effective,
both in performance and reliability.

There's nothing "undocumented" about IPC. It's been around as a
technique for decades. Message passing is as old as the hills.
Hardly. Sure, so you don't have to worry about contention over objects
in memory, but it's still completely asynchronous, and there will
still be a large degree of waiting for the other processes to respond,
and you have to develop the protocols to communicate. Apart from
convenient serialisation, Python doesn't exactly make IPC easy, unlike
Java's RMI for example.

There's nothing that Python does to make IPC hard, either. There's
nothing in the standard library yet, but you may be interested in Pyro
(http://pyro.sf.net) or Parallel Python
(http://www.parallelpython.com/). It's not erlang, but it's not hard
either. At least, it's not any harder than using threads and locks.
 
J

Justin T.

While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.
[snip - threading in Python doesn't exploit hardware level parallelism
well, we should incorporate stackless and remove the GIL to fix this]

I think you have a misunderstanding of what greenlets are. Greenlets are
essentially a non-preemptive user-space threading mechanism. They do not
allow hardware level parallelism to be exploited.

I'm not an expert, but I understand that much. What greenlets do is
force the programmer to think about concurrent programming. It doesn't
force them to think about real threads, which is good, because a
computer should take care of that for you.Greenlets are nice because
they can run concurrently, but they don't have to. This means you can
safely divide them up among many threads. You could not safely do this
with just any old python program.
There has been much discussion on this in the past [2]. Those
discussions, I feel, were premature. Now that stackless is mature (and
continuation free!), Py3k is in full swing, and parallel programming
has been fully realized as THE next big problem for computer science,
the time is ripe for discussing how we will approach multi-threading
in the future.

Many of the discussions rehash the same issues as previous ones. Many
of them are started based on false assumptions or are discussions between
people who don't have a firm grasp of the relevant issues.

That's true, but there are actually a lot of good ideas in there as
well.
I don't intend to suggest that no improvements can be made in this area of
Python interpreter development, but it is a complex issue and cheerleading
will only advance the cause so far. At some point, someone needs to write
some code. Stackless is great, but it's not the code that will solve this
problem.
Why not? It doesn't solve it on its own, but its a pretty good start
towards something that could.
 
J

Justin T.

Justin said:
While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python. [...]
What these seemingly unrelated thoughts come down to is a perfect
opportunity to become THE next generation language. It is already far
more advanced than almost every other language out there. By
integrating stackless into an architecture where tasklets can be
divided over several parallelizable threads, it will be able to
capitalize on performance gains that will have people using python
just for its performance, rather than that being the excuse not to use
it.

Aah, the path to world domination. You know you don't *have* to use
Python for *everything*.
True, but Python seems to be the *best* place to tackle this problem,
at least to me. It has a large pool of developers, a large standard
library, it's evolving, and it's a language I like :). Languages that
seamlessly support multi-threaded programming are coming, as are
extensions that make it easier on every existent platform. Python has
the opportunity to lead that change.
Be my guest, if it's so simple.
I knew somebody was going to say that! I'm pretty busy, but I'll see
if I can find some time to look into it.
I doubt that a thread on c.l.py is going to change much. It's the
python-dev and py3k lists where you'll need to take up the cudgels,
because I can almost guarantee nobody is going to take the GIL out of
2.6 or 2.7.

I was hoping to get a constructive conversation on what the structure
of a multi-threaded python would look like. It would appear that this
was not the place for that.
Is it even possible
to run threads of the same process at different priority levels on all
platforms?
No, it's not, and even fewer allow the scheduler to change the
priority dynamically. Linux, however, is one that does.
 
J

Justin T.


Uh oh, my ulterior motives have been discovered!

I'm aware of Erlang, but I don't think it's there yet. For one thing,
it's not pretty enough. It also doesn't have the community support
that a mainstream language needs. I'm not saying it'll never be
adequate, but I think that making python into an Erlang competitor
while maintaining backwards compatibility with the huge amount of
already written python software will make python a very formidable
choice as languages adapt more and more multi-core support. Python is
in a unique position as its actually a flexible enough language to
adapt to a multi-threaded environment without resorting to terrible
hacks.

Justin
 
B

Bjoern Schliessmann

Nick Craig-Wood wrote:
[GIL]
That is certainly true. However the point being is that running
on 2 CPUs at once at 95% efficiency is much better than running on
only 1 at 99%...

How do you define this percent efficiency?
Intel, AMD and Sun would have you believe that yes!

Strange, in my programs, I don't need any "real" concurrency (they
are network servers and scripts). Or do you mean "the future of
computing hardware is multi-core"? That indeed may be true.
The linux kernel has been through these growing pains already...
SMP support was initially done with the Big Kernel Lock (BKL)
which is exactly equivalent to the GIL.

So, how much performance gain would you get? Again, managing
fine-grained locking can be much more work than one simple lock.
The linux kernel has moved onwards to finer and finer grained
locking.

How do you compare a byte code interpreter to a monolithic OS
kernel?
I'd like to see a python build as it is at the moment and a
python-mt build which has the GIL broken down into a lock on each
object. python-mt would certainly be slower for non threaded
tasks, but it would certainly be quicker for threaded tasks on
multiple CPU computers.

From where do you take this certainty? For example, if the program
in question involves mostly IO access, there will be virtually no
gain. Multithreading is not Performance.
The user could then choose which python to run.

This would of course make C extensions more complicated...

Also, C extensions can release the GIL for long-running
computations.

Regards,


Björn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,130
Members
46,689
Latest member
liammiller

Latest Threads

Top