You can do this right now with a small amount of work to make
updatePartition a callable which works in parallel, and without the
need for extra syntax. For example, with the pprocess module, you'd
use boilerplate like this:
import pprocess
queue = pprocess.Queue(limit=ncores)
updatePartition = queue.manage(pprocess.MakeParallel
(updatePartition))
(Seehttp://
www.boddie.org.uk/python/pprocess/tutorial.html#Mapfor
details.)
At this point, you could use a normal "for" loop, and you could then
"sync" for results by reading from the queue. I'm sure it's a similar
story with the multiprocessing/processing module.
Yes, that's the idea.
In what sense are we not ready? Perhaps the abstractions could be
better, but it's definitely possible to run Python code on multiple
cores today and get decent core utilisation.
Yes, that's what pprocess.pmap is for, and I imagine that other
solutions offer similar facilities.
That your last statement is false: true parallel processing is
possible today. See the Wiki for a list of solutions:
http://wiki.python.org/moin/ParallelProcessing
In addition, Jython and IronPython don't have a global interpreter
lock, so you have the option of using threads with those
implementations, too.
Paul
Hi Paul and others,
Thanks for your responses to my original questions.
Paul, thanks for explaining about the pprocess module which appears
very useful. I presume that this is using multiple operating system
processes rather than threads which would probably imply that it is
suitable for coarse grained parallel programming rather than fine-
grained because of overhead in starting up new processes and sharing
objects. (How is that done, by the way?). It probably has advantages
and disadvantages compared with thread based parallelism.
My suggestion is primarily about using multiple threads and sharing
memory - something akin to the OpenMP directives that one of you has
mentioned. To do this efficiently would involve removing the Global
Interpreter Lock, or switching to Jython or Iron Python as you
mentioned.
However I *do* actually want to add syntax to the language. I think
that 'par' makes sense as an official Python construct - we already
have had this in the Occam programming language for twenty-five years.
The reason for this is ease of use. I would like to make it easy for
amateur programmers to exploit natural parallelism in their
algorithms. For instance somebody who wishes to calculate a property
of each member from a list of chemical structures using the Python
Daylight interface: with my suggestion they could potentially get a
massive speed up just by changing 'for' to 'par' or 'map' to 'pmap'.
(Or map with a parallel keyword argument set as suggested). At present
they would have to manually chop up their work and run it as multiple
processes in order to achieve the same - fine for expert programmers
but not reasonable for people working in other domains who wish to use
Python as a utility because of its fantastic productivity and ease of
use.
Let me clarify what I think par, pmap, pfilter and preduce would mean
and how they would be implemented. A par loop is like a for loop,
however the programmer is saying that the order in which the
iterations are performed doesn't matter and they might be performed in
parallel. The python system then has the option to allocate a number
of threads to the task and share out the iterations accordingly
between the threads. (It might be that the programmer should be
allowed to explictly define the number of threads to use or can
delegate that decision to the system). Parallel pmap and pfilter would
be implemented in much the same way, although the resultant list might
have to be reassembled from the partial results returned from each
thread. As people have pointed out, parallel reduce is a tricky option
because it requires the binary operation to be associative in which
case it can be parallelised by calculating the result using a tree-
based evaluation strategy.
I have used all of OpenMP, MPI, and Occam in the past. OpenMP adds
parallelism to programs by the use of special comment strings, MPI by
explicit calls to library routines, and Occam by explicit syntactical
structures. Each has its advantages. I like the simplicity of OpenMP,
the cross-language portability of MPI and the fact the concurrency is
built in to the Occam language. What I am proposing here is a hybrid
of the OpenMP and Occam approaches - a change to the language which is
very natural and yet is easy for programmers to understand.
Concurrency is generally regarded as the hardest concept for
programmers to grasp.
Jeremy