Erlang style processes for Python

K

Kay Schluehr

Every once in a while Erlang style [1] message passing concurrency [2]
is discussed for Python which does not only imply Stackless tasklets
[3] but also some process isolation semantics that lets the runtime
easily distribute tasklets ( or logical 'processes' ) across physical
processes. Syntactically a tasklet might grow out of a generator by
reusing the yield keyword for sending messages:

yield_expr : 'yield' ([testlist] | testlist 'to' testlist)

where the second form is specific for tasklets ( one could also use a
new keyword like "emit" if this becomes confusing - the semantics is
quite different ) and the addition of a new keyword for assigning the
"mailbox" e.g:

required_stmt: 'required' ':' suite

So tasklets could be identified on a lexical level ( just like
generators today ) and compiled accordingly. I just wonder about
sharing semantics. Would copy-on-read / copy-on-write and new opcodes
be needed? What would happen when sharing isn't dropped at all but
when the runtime moves a tasklet around into another OS level thread /
process it will be pickled and just separated on need? I think it
would be cleaner to separate it completely but what are the costs?

What do you think?

[1] http://en.wikipedia.org/wiki/Erlang_programming_language
[2] http://en.wikipedia.org/wiki/Actor_model
[3] http://www.stackless.com/
 
J

Jacob Lee

Every once in a while Erlang style [1] message passing concurrency [2]
is discussed for Python which does not only imply Stackless tasklets [3]
but also some process isolation semantics that lets the runtime easily
distribute tasklets ( or logical 'processes' ) across physical
processes. Syntactically a tasklet might grow out of a generator by
reusing the yield keyword for sending messages:

yield_expr : 'yield' ([testlist] | testlist 'to' testlist)

where the second form is specific for tasklets ( one could also use a
new keyword like "emit" if this becomes confusing - the semantics is
quite different ) and the addition of a new keyword for assigning the
"mailbox" e.g:

required_stmt: 'required' ':' suite

So tasklets could be identified on a lexical level ( just like
generators today ) and compiled accordingly. I just wonder about sharing
semantics. Would copy-on-read / copy-on-write and new opcodes be needed?
What would happen when sharing isn't dropped at all but when the runtime
moves a tasklet around into another OS level thread / process it will be
pickled and just separated on need? I think it would be cleaner to
separate it completely but what are the costs?

What do you think?

[1] http://en.wikipedia.org/wiki/Erlang_programming_language [2]
http://en.wikipedia.org/wiki/Actor_model [3] http://www.stackless.com/

Funny enough, I'm working on a project right now that is designed for
exactly that: PARLEY, http://osl.cs.uiuc.edu/parley . (An announcement
should show up in clp-announce as soon as the moderators release it). My
essential thesis is that syntactic sugar should not be necessary -- that a
nice library would be sufficient. I do admit that Erlang's pattern
matching would be nice, although you can get pretty far by using uniform
message formats that can easily be dispatched on -- the tuple
(tag, sender, args, kwargs)
in the case of PARLEY, which maps nicely to instance methods of a
dispatcher class.

The questions of sharing among multiple physical processes is interesting.
Implicit distribution of actors may not even be necessary if it is easy
enough for two hosts to coordinate with each other. In terms of the
general question of assigning actors to tasklets, threads, and processes,
there are added complications in terms of the physical limitations of
Python and Stackless Python:
- because of the GIL, actors in the same process do not gain the
advantag of true parallel computation
- all tasklet I/O has to be non-blocking
- tasklets are cooperative, while threads are preemptive
- communication across processes is slower, has to be serialized, etc.
- using both threads and tasklets in a single process is tricky

PARLEY currently only works within a single process, though one can choose
to use either tasklets or threads. My next goal is to figure out I/O, at
which point I get to tackle the fun question of distribution.

So far, I've not run into any cases where I've wanted to change the
interpreter, though I'd be interested in hearing ideas in this direction
(especially with PyPy as such a tantalizing platform!).
 
K

Kay Schluehr

Funny enough, I'm working on a project right now that is designed for
exactly that: PARLEY,http://osl.cs.uiuc.edu/parley. (An announcement
should show up in clp-announce as soon as the moderators release it). My
essential thesis is that syntactic sugar should not be necessary -- that a
nice library would be sufficient.

Synsugar is helpfull when you want to control compiler actions. Of
course you can do this also by means of __special__ attributes but I
guess this becomes clutter when you work with certain exposed sections
in the code.
I do admit that Erlang's pattern
matching would be nice, although you can get pretty far by using uniform
message formats that can easily be dispatched on -- the tuple
(tag, sender, args, kwargs)
in the case of PARLEY, which maps nicely to instance methods of a
dispatcher class.

Yes, I do think so too. It is more interesting to think about what
might be qualify as a message. Destructuring it is not hard in anyway
and I do also have a few concerns with naive pattern matching:

http://www.fiber-space.de/EasyExtend/doc/gallery/gallery.html#4._Chainlets_and_the_switch-statement
The questions of sharing among multiple physical processes is interesting.
Implicit distribution of actors may not even be necessary if it is easy
enough for two hosts to coordinate with each other. In terms of the
general question of assigning actors to tasklets, threads, and processes,
there are added complications in terms of the physical limitations of
Python and Stackless Python:
- because of the GIL, actors in the same process do not gain the
advantag of true parallel computation
- all tasklet I/O has to be non-blocking
- tasklets are cooperative, while threads are preemptive
- communication across processes is slower, has to be serialized, etc.
- using both threads and tasklets in a single process is tricky

Actors don't need locking primitives since their data is locked by
virtue of the actors definition. That's also why I'm in favour for a
runtime / compiler based solution. Within the shiny world of actors
and actresses the GIL has no place. So a thread that runs actors only,
does not need to be blocked or block other threads - at least not for
data locking purposes. It is used much like an OS level process with
better sharing capabilities ( for mailbox addresses and messages ).
Those threads shall not take part of the access/release GIL game. They
might also not be triggered explicitely using the usual threading
API.
PARLEY currently only works within a single process, though one can choose
to use either tasklets or threads. My next goal is to figure out I/O, at
which point I get to tackle the fun question of distribution.

So far, I've not run into any cases where I've wanted to change the
interpreter, though I'd be interested in hearing ideas in this direction
(especially with PyPy as such a tantalizing platform!).

I guess you mean tantalizing in both of its meanings ;)

Good luck and inform us when you find interesting results.

Kay
 
J

Jacob Lee

Have you seen Candygram?

http://candygram.sourceforge.net/


jon N

I did look at Candygram. I wasn't so keen on the method of dispatch (a
dictionary of lambdas that is passed to the receive function). It also
only works with threads and doesn't communicate across processes.

I definitely used Candygram as a reference point when determining what
features to hoist from Erlang.
 
J

Jacob Lee

[snip]
I do admit that Erlang's pattern
matching would be nice, although you can get pretty far by using uniform
message formats that can easily be dispatched on -- the tuple
(tag, sender, args, kwargs)
in the case of PARLEY, which maps nicely to instance methods of a
dispatcher class.

Yes, I do think so too. It is more interesting to think about what
might be qualify as a message. Destructuring it is not hard in anyway
and I do also have a few concerns with naive pattern matching:

http://www.fiber-space.de/EasyExtend/doc/gallery/gallery.html#4._Chainlets_and_the_switch-statement

Interesting. Scala's pattern matching also looks nice. They have a
construct called a "case class" which is sort of like an algebraic data
type in that == compares the actual internal structure of the objects...
come to think of it, it reminds me of the proposal for named tuples that
floated around one of the python lists recently.
[snip]
Actors don't need locking primitives since their data is locked by
virtue of the actors definition. That's also why I'm in favour for a
runtime / compiler based solution. Within the shiny world of actors
and actresses the GIL has no place. So a thread that runs actors only,
does not need to be blocked or block other threads - at least not for
data locking purposes. It is used much like an OS level process with
better sharing capabilities ( for mailbox addresses and messages ).
Those threads shall not take part of the access/release GIL game. They
might also not be triggered explicitely using the usual threading
API.

There are also a lot of places where Python implicitly shares data, though.
Global variables are one -- if you disallow those, then each actor has
to have its own copy of all imported modules. I think the GC is also not
at all threadsafe. I'm not familiar enough with how the interpreter works
to judge whether disallowing shared memory would make any of the existing
obstacles to removing the GIL easier to deal with.

Certainly, if it's doable, it would be a big win to tackle these problems.
I guess you mean tantalizing in both of its meanings ;)

Good luck and inform us when you find interesting results.

Kay

Thanks!
 
M

Michael

Jacob said:
Funny enough, I'm working on a project right now that is designed for
exactly that: PARLEY, http://osl.cs.uiuc.edu/parley .

Have you seen Kamaelia? Some people have noted that Kamaelia seems to have a
number of similarities to Erlang's model, which seems to come from a common
background knowledge. (Kamaelia's model is based on a blending of what I
know from a very basic recasting of CSP, Occam, unix pipelines and async
hardware verification).

Home:
http://kamaelia.sourceforge.net/Home

Intros:
http://kamaelia.sourceforge.net/Introduction
http://kamaelia.sourceforge.net/t/TN-LinuxFormat-Kamaelia.pdf
http://www.bbc.co.uk/rd/pubs/whp/whp113.shtml
* http://kamaelia.sourceforge.net/t/TN-LightTechnicalIntroToKamaelia.pdf
http://kamaelia.sourceforge.net/Docs/NotationForVisualisingAxon

The one *'d is perhaps the best at the moment.

Detail:
http://kamaelia.sourceforge.net/Cookbook
http://kamaelia.sourceforge.net/Components


Michael.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top