examples of realistic multiprocessing usage?

T

TomF

I'm trying to multiprocess my python code to take advantage of multiple
cores. I've read the module docs for threading and multiprocessing,
and I've done some web searches. All the examples I've found are too
simple: the processes take simple inputs and compute a simple value.
My problem involves lots of processes, complex data structures, and
potentially lots of results. It doesn't map cleanly into a Queue,
Pool, Manager or Listener/Client example from the python docs.

Instead of explaining my problem and asking for design suggestions,
I'll ask: is there a compendium of realistic Python multiprocessing
examples somewhere? Or an open source project to look at?

Thanks,
-Tom
 
P

Philip Semanchuk

I'm trying to multiprocess my python code to take advantage of multiple cores. I've read the module docs for threading and multiprocessing, and I've done some web searches. All the examples I've found are too simple: the processes take simple inputs and compute a simple value. My problem involves lots of processes, complex data structures, and potentially lots of results. It doesn't map cleanly into a Queue, Pool, Manager or Listener/Client example from the python docs.

Instead of explaining my problem and asking for design suggestions, I'll ask: is there a compendium of realistic Python multiprocessing examples somewhere? Or an open source project to look at?


A colleague pointed me to this project the other day.

http://gluino.com/


I grepped through the code to see that it's using multiprocessing.Listener. I didn't go any further than that because our project is BSD licensed and the license for Gluino is unclear. Until I find out whether or not its under an equally permissive license, I can't borrow ideas and/or code from it.

Hope it's of some help to you, though.

Cheers
Philip
 
A

Adam Skutt

Instead of explaining my problem and asking for design suggestions,
I'll ask: is there a compendium of realistic Python multiprocessing
examples somewhere?  Or an open source project to look at?

There are tons, but without even a knowledge domain, it's difficult to
recommend much of anything. Multiprocessing for I/O (e.g., web
serving) tends to look different and be structured differently from
multiprocessing for CPU-intensive tasking (e.g., digital signal
processing), and both look different from things with specific
requirements w.r.t latency (e.g., video game server, hard-real time
applications) or other requirements.

Even the level at which you parallel process can change things
dramatically. Consider the simple case of matrix multiplication. I
can make the multiplication itself parallel; or assuming I have
multiple sets of matricies I want to multiply (common), I can make
execute each multiplication in parallel. Both solutions look
different, and a solution that uses both levels of parallelism
frequently will look different still.

Adam
 
D

Dan Stromberg

I'm trying to multiprocess my python code to take advantage of multiple
cores.  I've read the module docs for threading and multiprocessing, and
I've done some web searches.  All the examples I've found are too simple:
the processes take simple inputs and compute a simple value.  My problem
involves lots of processes, complex data structures, and potentially lots of
results.  It doesn't map cleanly into a Queue, Pool, Manager or
Listener/Client example from the python docs.

Instead of explaining my problem and asking for design suggestions, I'll
ask: is there a compendium of realistic Python multiprocessing examples
somewhere?  Or an open source project to look at?

I'm unaware of a big archive of projects that use multiprocessing, but
maybe one of the free code search engines could help with that.

It sounds like you're planning to use mutable shared state, which is
generally best avoided if at all possible, in concurrent programming -
because mutable shared state tends to slow down things quite a bit.

But if you must have mutable shared state that's more complex than a
basic scalar or homogeneous array, I believe the multiprocessing
module would have you use a "server process manager".
 
T

TomF

I'm unaware of a big archive of projects that use multiprocessing, but
maybe one of the free code search engines could help with that.

It sounds like you're planning to use mutable shared state, which is
generally best avoided if at all possible, in concurrent programming -
because mutable shared state tends to slow down things quite a bit.
I'm trying to avoid mutable shared state since I've read the cautions
against it. I think it's possible for each worker to compute changes
and return them back to the parent (and have the parent coordinate all
changes) without too much overhead. So far It looks like
multiprocessing.Pool.apply_async is the best match to what I want.

One difficulty is that there is a queue of work to be done and a queue
of results to be incorporated back into the parent; there is no
one-to-one correspondence between the two. It's not obvious to me how
to coordinate the queues in a natural way to avoid deadlock or
starvation.
But if you must have mutable shared state that's more complex than a
basic scalar or homogeneous array, I believe the multiprocessing
module would have you use a "server process manager".

I've looked into Manager but I don't really understand the trade-offs.
-Tom
 
A

Adam Skutt

One difficulty is that there is a queue of work to be done and a queue
of results to be incorporated back into the parent; there is no
one-to-one correspondence between the two.  It's not obvious to me how
to coordinate the queues in a natural way to avoid deadlock or
starvation.

Depends on what you are doing. If you can enqueue all the jobs before
waiting for your results, then two queues are adequate. The first
queue is jobs to be accomplished, the second queue is the results.
The items you put on the result queue have both the result and some
sort of id so the results can be ordered after the fact. Your parent
thread of execution (thread hereafter) then:

1. Adds jobs to the queue
2. Blocks until all the results are returned. Given that you
suggested that there isn't a 1:1 correspondence between jobs and
results, have the queue support a message saying, 'Job X is done'.
You're finished when all jobs send such a message.
3. Sorts the results into the desired ordered.
4. Acts on them.

If you cannot enqueue all the jobs before waiting for the results, I
suggest turning the problem into a pipeline, such that the thread
submitting the jobs and the thread acting on the results are
different: submitter -> job processor -> results processor.

Again though, the devil is in the details and without more details,
it's hard to suggest an explicit approach. The simplest way to avoid
contention between two queues is to just remove it entirely (by
converting the processing to a single pipeline like I suggested). If
that is not possible, then I suggest moving to pipes (or some other
form of I/O based IPC) and asynchronous I/O. But I'd only do that if
I really couldn't write a pipeline.

Adam
 
T

TomF

Depends on what you are doing. If you can enqueue all the jobs before
waiting for your results, then two queues are adequate. The first
queue is jobs to be accomplished, the second queue is the results.
The items you put on the result queue have both the result and some
sort of id so the results can be ordered after the fact. Your parent
thread of execution (thread hereafter) then:

1. Adds jobs to the queue
2. Blocks until all the results are returned. Given that you
suggested that there isn't a 1:1 correspondence between jobs and
results, have the queue support a message saying, 'Job X is done'.
You're finished when all jobs send such a message.
3. Sorts the results into the desired ordered.
4. Acts on them.

If you cannot enqueue all the jobs before waiting for the results, I
suggest turning the problem into a pipeline, such that the thread
submitting the jobs and the thread acting on the results are
different: submitter -> job processor -> results processor.
Adam

Thanks for your reply. I can enqueue all the jobs before waiting for
the results, it's just that I want the parent to process the results as
they come back. I don't want the parent to block until all results are
returned. I was hoping the Pool module had a test for whether all
processes were done, but I guess it isn't hard to keep track of that
myself.

-Tom
 
A

Adam Skutt

Thanks for your reply.  I can enqueue all the jobs before waiting for
the results, it's just that I want the parent to process the results as
they come back.  I don't want the parent to block until all results are
returned.  I was hoping the Pool module had a test for whether all
processes were done, but I guess it isn't hard to keep track of that
myself.

Regardless of whether it does or doesn't, you don't really want to be
blocking in two places anyway, so the "FINISHED" event in the queue is
the superior solution.

It's certainly possible to build a work pool w/ a queue such that you
block on both for entries added to the queue and job completion, but
I'm pretty sure it's something you'd have to write yourself.

Adam
 
A

Adam Tauno Williams

You have been brain washed by the Intellectual Properties congsy.
Of course you can read through code to borrow idea's from it.

I wouldn't; and there is no brain-washing.

It is very unwise to look at GPL'd code if you are working on a non-GPL
project; the GPL is specifically and intentionally viral. The
distinction between reading-through-code-and-borrowing-ideas and
copying-code is thin and best left to lawyers.

Aside: Comments to the contrary often stand-on-their-head to make such
cases. For example:

"You do have a choice under the GPL license: you can stop using the
stolen code and write your own, or you can decide you'd rather release
under the GPL. But the choice is yours. If you say, I choose neither,
then the court can impose an injunction to stop you from further
distribution, but it won't order your code released under the GPL. ...
Of course, you could avoid all such troubles in the first place by not
stealing GPL code to begin with"
<http://www.groklaw.net/article.php?story=20031214210634851>

Seriously? What that basically means is you can't use GPL'd code in a
non-GPL'd product/project. Saying if you do it is OK, but you'll be
required to replace the code or change your license is
standing-on-ones-head. Risking a forced reimplementation of a core
component of an existing application is 'just nuts'.
 
A

Albert van der Horst

I grepped through the code to see that it's using =
multiprocessing.Listener. I didn't go any further than that because our =
project is BSD licensed and the license for Gluino is unclear. Until I =
find out whether or not its under an equally permissive license, I can't =
borrow ideas and/or code from it.

You have been brain washed by the Intellectual Properties congsy.
Of course you can read through code to borrow idea's from it.
 
A

Albert van der Horst

I wouldn't; and there is no brain-washing.

It is very unwise to look at GPL'd code if you are working on a non-GPL
project; the GPL is specifically and intentionally viral. The
distinction between reading-through-code-and-borrowing-ideas and
copying-code is thin and best left to lawyers.

This is what some people want you to believe. Arm twisting by
GPL-ers when you borrow their ideas? That is really unheard of.
GPL-ers are not keen on getting the most monetary award by
setting lawyers on you and go to court only reluctantly to
enforce the license.
Aside: Comments to the contrary often stand-on-their-head to make such
cases. For example:

"You do have a choice under the GPL license: you can stop using the
stolen code and write your own, or you can decide you'd rather release
under the GPL. But the choice is yours. If you say, I choose neither,
then the court can impose an injunction to stop you from further
distribution, but it won't order your code released under the GPL. ...
Of course, you could avoid all such troubles in the first place by not
stealing GPL code to begin with"
<http://www.groklaw.net/article.php?story=20031214210634851>

Stealing code means just that, verbatim copies. When you read this
carefully, you can see that reimplementing the stolen code is
an option. Exactly what you say is legally impossible.

It doesn't say:
"you can stop using the stolen code, and now you're forever
banned from writing your own, since you have seen our code"
Seriously? What that basically means is you can't use GPL'd code in a
non-GPL'd product/project. Saying if you do it is OK, but you'll be
required to replace the code or change your license is
standing-on-ones-head. Risking a forced reimplementation of a core
component of an existing application is 'just nuts'.

GPL-code is protected under *copyright law*, not patents or some
such. That means that reimplementing idea's is okay.
That is one of the things GPL tries to protect.
Also we recognize the fact that the wheel is reinvented all the time
and that there are a limited number of solutions to a problem.
You could easily have come up with the same idea as me.

Then you overlook another possibility. I have a lot of GPL-ed code
on my site. If you want to use some of it commercially, you
could contact me and negotiate a non-GPL license. You might be
surprised how easy I'm on you, as long as you recognize where
the code comes from. If you want to use it BSD-licensed, I
would be even more lenient (unless strategic issues are at
stake.)

So pardon me, but not even looking at code you might learn from
is pretty hysteric.

Groetjes Albert
 
A

Adam Skutt

This is what some people want you to believe. Arm twisting by
GPL-ers when you borrow their ideas? That is really unheard of.

Doesn't matter, you're still legally liable if your work is found to
be derivative and lacking a fair use defense. It's not borrowing
"ideas" that's problematic, it's proving that's all you did. For
those of us with legal departments, we have no choice: if they don't
believe we can prove our case, we're not using the code, period. The
risk simply isn't worth it.
GPL-ers are not keen on getting the most monetary award by
setting lawyers on you and go to court only reluctantly to
enforce the license.

And? Monetary award is hardly the only issue.
Stealing code means just that, verbatim copies. When you read this
carefully, you can see that reimplementing the stolen code is
an option. Exactly what you say is legally impossible.

No, in the United States it means anything that constitutes a
derivative work, since derivative works of GPL-licensed works must be
released under the GPL. Merely copying ideas does not make one a
derivative work, but one also must be prepared to show that's all that
happened. As such, it would have to be a substantially different
implementation, generally with some sort of added or useful value.
Proving that can be difficult and may very well depend on what court
you land in.
So pardon me, but not even looking at code you might learn from
is pretty hysteric.

Not at all. Separating ideas from implementation can be difficult,
and convincing a judge of that vastly more so. It's a legitimate
concern, and people who intend to ship proprietary software should
definitely resort to GPL-licensed software last when looking for
inspiration.

Adam
 
A

Adam Tauno Williams

Doesn't matter, you're still legally liable if your work is found to
be derivative and lacking a fair use defense. It's not borrowing
"ideas" that's problematic, it's proving that's all you did. For
those of us with legal departments, we have no choice: if they don't
believe we can prove our case, we're not using the code, period. The
risk simply isn't worth it.

+1, exactly. "reimplementation" is the defense of GPL is very often
treated as *trivial*. Changing function names and variable names and
indenting style is not "reimplementation". Reimplementation can be very
difficult, time consuming, and error-prone.

Anyway, legally define: "reimplementation". Have fun.
Not at all. Separating ideas from implementation can be difficult,

Honestly, IMNSHO, it is borders on *impossible*. Even statistical
analysis of written prose or off-hand speech will reveal how
pathologically derivative humans are in their use of language. And as
that language gets forcibly more structured as in programming or
technical documentation even more so.
 
D

Dan Stromberg

Doesn't matter, you're still legally liable if your work is found to
be derivative and lacking a fair use defense.  It's not borrowing
"ideas" that's problematic, it's proving that's all you did.  For
those of us with legal departments, we have no choice: if they don't
believe we can prove our case, we're not using the code, period.  The
risk simply isn't worth it.

Many legal departments have an overblown sense of risk, I'm afraid.
And I suppose that's somewhat natural, as it's mostly the legal people
who are putting their necks on the line over such issues - though I
wouldn't be surprised to see a disciplinary action or even firing of a
techie over same.

I worked at DATAllegro when it was acquired by Microsoft. The
DATAllegro product had significant portions that were opensource code;
Microsoft, of course, decided that they needed to "quarantine"
(meaning "eliminate", in a weird, half-way sense) the opensource
portions.

Why did Microsoft do this? Why knowingly go through with the purchase
of a product that had large opensource parts? Why was what they did
considered "enough" as part of a complex due diligence process, to
satisfy even Microsoft's copyright-extensionist lawyers?

When I say "copyright extensionist", I mean:
1) Their legal department once told me that a small python module
could not just be rewritten under a different license, legally,
because a small module could not be made different enough to avoid
issues.
2) Their onboarding process literally said "don't look at example code
in programming books - it entails a legal risk for the company."

What made them think DATAllegro's purchase price was still worth it,
despite this perspective on copyright?

I don't know; I have no first-hand knowledge of that process, though
ironically I did help quarantine the "offending" code. But obviously
Microsoft management, their board and their lawyers felt it was worth
the risk at the price. I know it had something to do with contracting
out to a 3rd party company to assess the risk and ascertain what
portions "required" excising.

Here's one such company: http://www.blackducksoftware.com/black-duck-suite
A former coworker (not of Microsoft) suggested they were the only
company in this business. I believe Black Duck has software that
automatically detects opensource code in a body of work.

IOW, it's quite possible to demonstrate that something isn't a
derivative work, enough so to make even Microsoft's lawyers happy,
given adequate funding for the purpose.

So yeah, sometimes a programmer peeking at opensource code might be
more of a risk (== expense) than a closed-source company is willing to
take, but so might studying a book intended to help you learn
programming. And how many programmers haven't studied a programming
book at some time in their life?

My intuition tells me (I'm not going into details - that feels too
dangerous to me personally) that part of the issue Microsoft was
trying to prevent, wasn't so much a matter of copyright safety, as
trying to avoid being called hypocritical; they've made a lot of noise
about how dangerous opensource is. If they then turn around and
distribute opensource code artifacts as part of a Microsoft product,
then they'll probably eventually get beaten up in the tech press yet
again over the new matter.
 
P

Philip Semanchuk

Many legal departments have an overblown sense of risk, I'm afraid.

I carefully avoid GPLed code on our BSD-licensed project not because I need fear anyone's legal department, but out of respect for the author(s) of the GPL-ed code. The way I see it, the author of GPL-ed code gives away something valuable and asks for just one thing in return: respect the license. It strikes me as very selfish to deny them the one thing they ask for.

JMHO,
Philip
 
D

Dan Stromberg

I carefully avoid GPLed code on our BSD-licensed project not because I need fear anyone's legal department, but out of respect for the author(s) of the GPL-ed code. The way I see it, the author of GPL-ed code gives away something valuable and asks for just one thing in return: respect the license. It strikes me as very selfish to deny them the one thing they ask for.

That's very considerate, and yet, I think there are multiple senses of
the word "avoid" above.

If you're avoiding inspecting GPL'd code for ideas, I think if you ask
most authors of GPL'd code, they'd be more than happy to allow you to.
I've released GPL'd code quite a few times, and personally, I'm
flattered when others want to look it over.

If you're avoiding cutting and pasting from (or linking against) GPL'd
code into something that isn't GPL-licensed, then that's very
sensible.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top