multiprocessing module (PEP 371)

S

sturlamolden

I sometimes read python-dev, but never contribute. So I'll post my
rant here instead.

I completely support adding this module to the standard lib. Get it in
as soon as possible, regardless of PEP deadlines or whatever.

I don't see pyprocessing as a drop-in replacement for the threading
module. Multi-threading and multi-processing code tend to be
different, unless something like mutable objects in shared memory is
used as well (cf. Python Shared Objects). If this limitation can
educate Python programmers to use queues instead of locks and mutable
objects, even multi-threaded Python programs may actually benefit.
Some API differences between threading and multiprocessing do not
matter. Programmers should not consider processes as a drop-in
replacement for threads.

One limitation not discussed on python-dev is the lack of fork on
Win32. This makes the pyprocessing module particularly inefficient at
creating processes on this platform, as it depends on serializing
(pickling and de-pickling) a lot of Python objects. Even a non-COWfork
would be preferred. I will strongly suggest something is done to add
support for os.fork to Python on Windows. Either create a full cow
fork using ZwCreateProcess (ntdll.dll does support COWforking, but
Win32 API does not expose it), or do the same as Cygwin is doing to
fork a process without COW. Although a non-cow fork a la Cygwin is not
as efficient as a fork on Linux/FreeBSD/Unix, it is still better than
what pyprocessing is doing.
 
P

Paul Boddie

Even a non-COWfork
would be preferred. I will strongly suggest something is done to add
support for os.fork to Python on Windows. Either create a full cow
fork using ZwCreateProcess (ntdll.dll does support COWforking, but
Win32 API does not expose it), or do the same as Cygwin is doing to
fork a process without COW. Although a non-cow fork a la Cygwin is not
as efficient as a fork on Linux/FreeBSD/Unix, it is still better than
what pyprocessing is doing.

You seem to know more about this matter than the average person, I
would wager, so it might be an idea if you more than "strongly
suggest" something. ;-) I've looked at this situation briefly, I've
seen the different Cygwin-based techniques, and I've even gone as far
to investigate whether it's possible to write the necessary code using
the mingw32 stuff, although I don't think it actually worked when I
tested the executable on Windows. COW (copy-on-write, for those still
thinking that we're talking about dairy products) would be pretty
desirable if it's feasible, though.

Having said all this, I don't care about Windows myself, and my own
contribution to the collection of available libraries in this domain
has never been targeted at standard library adoption (nor thread API
compatibility) and thus has no need to run on Windows without Cygwin.

Paul
 
S

sturlamolden

tested the executable on Windows. COW (copy-on-write, for those still
thinking that we're talking about dairy products) would be pretty
desirable if it's feasible, though.

There is a well known C++ implementation of cow-fork on Windows, which
I have slightly modified and ported to C. But as the new WDK (Windows
driver kit) headers are full of syntax errors, the compiler choke on
it. :( I am seriously considering re-implementing the whole cow fork
in pure Python using ctypes.

If you pass NULL as section handle to ZwCreateProcess (or
NtCreateProcess) you do get a rudimentary cow fork. But the new
process image has no context and no active threads. The NT kernel is
designed to support several subsystems. Both the OS/2 and SUA
subsystems provide a functional COW fork, but the Win32 subsystem do
not expose the functionality. I honestly don't understand why, but
maybe it is backwards compatibility that prevents it (it's backlog
goes back to DOS, in which forking was impossible due to single-
tasking.)

But anyway ... what I am trying to say is that pyprocessing is
somewhat inefficient (and limited) on Windows due to lack of a fork
(cow or not).
 
C

Christian Heimes

sturlamolden said:
There is a well known C++ implementation of cow-fork on Windows, which
I have slightly modified and ported to C. But as the new WDK (Windows
driver kit) headers are full of syntax errors, the compiler choke on
it. :( I am seriously considering re-implementing the whole cow fork
in pure Python using ctypes.

Can you provide a C implementation that compiles under VS 2008? Python
2.6 and 3.0 are using my new VS 2008 build system and we have dropped
support for 9x, ME and NT4. If you can provide us with an implementation
we *might* consider using it.

Christian
 
P

pataphor

I don't see pyprocessing as a drop-in replacement for the threading
module. Multi-threading and multi-processing code tend to be
different, unless something like mutable objects in shared memory is
used as well (cf. Python Shared Objects). If this limitation can
educate Python programmers to use queues instead of locks and mutable
objects, even multi-threaded Python programs may actually benefit.
Some API differences between threading and multiprocessing do not
matter. Programmers should not consider processes as a drop-in
replacement for threads.

This is probably not very central to the main intention of your post,
but I see a terminology problem coming up here. It is possible for
python objects to share a reference to some other object. This has
nothing to do with threads or processes, although it can be used as a
*mechanism* for threads and processes to share data. Another mechanism
would be some copying and synchronization scheme, which is what posh
seems to do. Or maybe not, I haven't used posh yet, I just read some
docs (and I already hate the "if process.fork():" idiom, what are they
trying to do, reintroduce c-style assignment and swiching?).

By the way I haven't done much thread and process programming, but the
things I *have* done often combine threads and processes, like starting
a console oriented program in a background process, redirecting the IO
and communicate with it using an event loop in a thread. I gets more
complicated when a gui thread is also involved, for example when
retrofitting a gui interface to an existing terminal based chess or go
playing program.

P.
 
S

sturlamolden

This is probably not very central to the main intention of your post,
but I see a terminology problem coming up here. It is possible for
python objects to share a reference to some other object. This has
nothing to do with threads or processes, although it can be used as a
*mechanism* for threads and processes to share data.

It is complicated in the case of processes, because the object must be
kept in shared memory. The complicating factor is that the base
address of the memory mapping, which is not guaranteed to be the same
in the virtual address space of different processes.
 
J

John Nagle

sturlamolden said:
It is complicated in the case of processes, because the object must be
kept in shared memory. The complicating factor is that the base
address of the memory mapping, which is not guaranteed to be the same
in the virtual address space of different processes.

Introducing shared memory in Python would be a terrible idea,
for many reasons, including the need for interprocess garbage
collection and locking. Don't go there. Use message passing instead.

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,697
Latest member
AugustNabo

Latest Threads

Top