J
Justin T.
Hello,
While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.
Python is not multi-threading friendly. Any code that deals with the
python interpreter must hold the global interpreter lock (GIL). This
has the effect of serializing (to a certain extent) all python
specific operations. IE, any thread that is written purely in python
will not release the GIL except at particular (and possibly non-
optimal) times. Currently that's the rather arbitrary quantum of 100
bytecode instructions. Since the ability of the OS to schedule python
threads is based on when its possible to run that thread (according to
the lock), python threads do not benefit from a good scheduler in the
same manner that real OS threads do, even though python threads are
supposed to be a thin wrapper around real OS threads[1].
The detrimental effects of the GIL have been discussed several times
and nobody has ever done anything about it. This is because the GIL
isn't really that bad right now. The GIL isn't held that much, and
pthreads spawned by python-C interations (Ie, those that reside in
extensions) can do all their processing concurrently as long as they
aren't dealing with python data. What this means is that python
multithreading isn't really broken as long as python is thought of as
a convenient way of manipulating C. After all, 100 bytecode
instructions go by pretty quickly, so the GIL isn't really THAT
invasive.
Python, however, is much better than a convenient method of
manipulating C. Python provides a simple language which can be
implemented in any way, so long as promised behaviors continue. We
should take advantage of that.
The truth is that the future (and present reality) of almost every
form of computing is multi-core, and there currently is no effective
way of dealing with concurrency. We still worry about setting up
threads, synchronization of message queues, synchronization of shared
memory regions, dealing with asynchronous behaviors, and most
importantly, how threaded an application should be. All of this is
possible to do manually in C, but its hardly optimal. For instance, at
compile time you have no idea if your library is going to be running
on a machine with 1 processor or 100. Knowing that makes a huge
difference in architecture as 200 threads might run fine on the 100
core machine where it might thrash the single processor to death.
Thread pools help, but they need to be set up and initialized. There
are very few good thread pool implementations that are meant for
generic use.
It is my feeling that there is no better way of dealing with dynamic
threading than to use a dynamic language. Stackless python has proven
that clever manipulation of the stack can dramatically improve
concurrent performance in a single thread. Stackless revolves around
tasklets, which are a nearly universal concept.
For those who don't follow experimental python implementations,
stackless essentially provides an integrated scheduler for "green
threads" (tasklets), or extremely lightweight snippets of code that
can be run concurrently. It even provides a nice way of messaging
between the tasklets.
When you think about it, lots of object oriented code can be organized
as tasklets. After all, encapsulation provides an environment where
side effects of running functions can be minimized, and is thus
somewhat easily parallelized (with respect to other objects).
Functional programming is, of course, ideal, but its hardly the trendy
thing these days. Maybe that will change when people realize how much
easier it is to test and parallelize.
What these seemingly unrelated thoughts come down to is a perfect
opportunity to become THE next generation language. It is already far
more advanced than almost every other language out there. By
integrating stackless into an architecture where tasklets can be
divided over several parallelizable threads, it will be able to
capitalize on performance gains that will have people using python
just for its performance, rather than that being the excuse not to use
it.
The nice thing is that this requires a fairly doable amount of work.
First, stackless should be integrated into the core. Then there should
be an effort to remove the reliance on the GIL for python threading.
After that, advanced features like moving tasklets amongst threads
should be explored. I can imagine a world where a single python web
application is able to redistribute its millions of requests amongst
thousands of threads without the developer ever being aware that the
application would eventually scale. An efficient and natively multi-
threaded implementation of python will be invaluable as cores continue
to multiply like rabbits.
There has been much discussion on this in the past [2]. Those
discussions, I feel, were premature. Now that stackless is mature (and
continuation free!), Py3k is in full swing, and parallel programming
has been fully realized as THE next big problem for computer science,
the time is ripe for discussing how we will approach multi-threading
in the future.
Justin
[1] I haven't actually looked at the GIL code. It's possible that it
creates a bunch of wait queues for each nice level that a python
thread is running at and just wakes up the higher priority threads
first, thus maintaining the nice values determined by the scheduler,
or something. I highly doubt it. I bet every python thread gets an
equal chance of getting that lock despite whatever patterns the
scheduler may have noticed.
[2]
http://groups.google.com/group/comp...a6a5d976?q=stackless,+thread+paradigm&lnk=ol&
http://www.stackless.com/pipermail/stackless/2003-June/000742.html
.... More that I lost, just search this group.
While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.
Python is not multi-threading friendly. Any code that deals with the
python interpreter must hold the global interpreter lock (GIL). This
has the effect of serializing (to a certain extent) all python
specific operations. IE, any thread that is written purely in python
will not release the GIL except at particular (and possibly non-
optimal) times. Currently that's the rather arbitrary quantum of 100
bytecode instructions. Since the ability of the OS to schedule python
threads is based on when its possible to run that thread (according to
the lock), python threads do not benefit from a good scheduler in the
same manner that real OS threads do, even though python threads are
supposed to be a thin wrapper around real OS threads[1].
The detrimental effects of the GIL have been discussed several times
and nobody has ever done anything about it. This is because the GIL
isn't really that bad right now. The GIL isn't held that much, and
pthreads spawned by python-C interations (Ie, those that reside in
extensions) can do all their processing concurrently as long as they
aren't dealing with python data. What this means is that python
multithreading isn't really broken as long as python is thought of as
a convenient way of manipulating C. After all, 100 bytecode
instructions go by pretty quickly, so the GIL isn't really THAT
invasive.
Python, however, is much better than a convenient method of
manipulating C. Python provides a simple language which can be
implemented in any way, so long as promised behaviors continue. We
should take advantage of that.
The truth is that the future (and present reality) of almost every
form of computing is multi-core, and there currently is no effective
way of dealing with concurrency. We still worry about setting up
threads, synchronization of message queues, synchronization of shared
memory regions, dealing with asynchronous behaviors, and most
importantly, how threaded an application should be. All of this is
possible to do manually in C, but its hardly optimal. For instance, at
compile time you have no idea if your library is going to be running
on a machine with 1 processor or 100. Knowing that makes a huge
difference in architecture as 200 threads might run fine on the 100
core machine where it might thrash the single processor to death.
Thread pools help, but they need to be set up and initialized. There
are very few good thread pool implementations that are meant for
generic use.
It is my feeling that there is no better way of dealing with dynamic
threading than to use a dynamic language. Stackless python has proven
that clever manipulation of the stack can dramatically improve
concurrent performance in a single thread. Stackless revolves around
tasklets, which are a nearly universal concept.
For those who don't follow experimental python implementations,
stackless essentially provides an integrated scheduler for "green
threads" (tasklets), or extremely lightweight snippets of code that
can be run concurrently. It even provides a nice way of messaging
between the tasklets.
When you think about it, lots of object oriented code can be organized
as tasklets. After all, encapsulation provides an environment where
side effects of running functions can be minimized, and is thus
somewhat easily parallelized (with respect to other objects).
Functional programming is, of course, ideal, but its hardly the trendy
thing these days. Maybe that will change when people realize how much
easier it is to test and parallelize.
What these seemingly unrelated thoughts come down to is a perfect
opportunity to become THE next generation language. It is already far
more advanced than almost every other language out there. By
integrating stackless into an architecture where tasklets can be
divided over several parallelizable threads, it will be able to
capitalize on performance gains that will have people using python
just for its performance, rather than that being the excuse not to use
it.
The nice thing is that this requires a fairly doable amount of work.
First, stackless should be integrated into the core. Then there should
be an effort to remove the reliance on the GIL for python threading.
After that, advanced features like moving tasklets amongst threads
should be explored. I can imagine a world where a single python web
application is able to redistribute its millions of requests amongst
thousands of threads without the developer ever being aware that the
application would eventually scale. An efficient and natively multi-
threaded implementation of python will be invaluable as cores continue
to multiply like rabbits.
There has been much discussion on this in the past [2]. Those
discussions, I feel, were premature. Now that stackless is mature (and
continuation free!), Py3k is in full swing, and parallel programming
has been fully realized as THE next big problem for computer science,
the time is ripe for discussing how we will approach multi-threading
in the future.
Justin
[1] I haven't actually looked at the GIL code. It's possible that it
creates a bunch of wait queues for each nice level that a python
thread is running at and just wakes up the higher priority threads
first, thus maintaining the nice values determined by the scheduler,
or something. I highly doubt it. I bet every python thread gets an
equal chance of getting that lock despite whatever patterns the
scheduler may have noticed.
[2]
http://groups.google.com/group/comp...a6a5d976?q=stackless,+thread+paradigm&lnk=ol&
http://www.stackless.com/pipermail/stackless/2003-June/000742.html
.... More that I lost, just search this group.