A question about subprocess

J

JD

Hi,

I want send my jobs over a whole bunch of machines (using ssh). The
jobs will need to be run in the following pattern:

(Machine A) (Machine B) (Machine C)

Job A1 Job B1 Job C1

Job A2 Job B2 etc

Job A3 etc

etc

Jobs runing on machine A, B, C should be in parallel, however, for
each machine, jobs should run one after another.

How can I do it with the subprocess?


Thanks,

JD
 
D

Dan Stromberg

You don't necessarily need the subprocess module to do this, though you
could use it.

I've done this sort of thing in the past with fork and exec.

To serialize the jobs on the machines, the easiest thing is to just send
the commands all at once to a given machine, like "command1; command2;
command3".

You can use waitpid or similar to check if a series of jobs has finished
on a particular machine.

An example of something similar can be found at
http://stromberg.dnsalias.org/~strombrg/loop.html

(If you look at the code, be kind. I wrote it long ago :)

There's a benefit to saving the output from each machine into a single
file for that machine. If you think some machines will produce the same
output, and you don't want to see it over and over, you can analyze the
files with something like
http://stromberg.dnsalias.org/~strombrg/equivalence-classes.html .
 
K

Karthik Gurusamy

Hi,

I want send my jobs over a whole bunch of machines (using ssh). The
jobs will need to be run in the following pattern:

(Machine A) (Machine B) (Machine C)

Job A1 Job B1 Job C1

Job A2 Job B2 etc

Job A3 etc

etc

Jobs runing on machine A, B, C should be in parallel, however, for
each machine, jobs should run one after another.

How can I do it with the subprocess?

subprocess is not network aware. What you can do is write a simple
python script say run_jobs.py which can take in a command-line
argument (say A or B or C) and will fire a sequence of subprocesses to
execute a series of jobs. This will ensure the serialization condition
like A2 starting after A1's completion.

Now you can write a load distributer kind of script which uses ssh to
login to the various machines and run run_jobs.py with appropriate
argument (Here I assume all machines have access to run_jobs.py -- say
it may reside on a shared mounted file-system).

e.g. in outer script:

ssh machine-A run_jobs.py A
ssh machine-B run_jobs.py B
ssh machine-B run_jobs.py C
....

You may want to fire all these at once so that they all execute in
parallel.

Karthik
 
C

Carl Banks

You can't. Subprocess is a library to spawn new processes on the local
machine. If you want to handle external machines you need something like
parallel python: <http://www.parallelpython.com/>

Sure he can--you didn't read his post carefully. He wants to run jobs on
other machines using ssh, which is installed on the local machine.
subprocess can call ssh.


Carl Banks
 
C

Carl Banks

Hi,

I want send my jobs over a whole bunch of machines (using ssh). The jobs
will need to be run in the following pattern:

(Machine A) (Machine B) (Machine C)

Job A1 Job B1 Job C1

Job A2 Job B2 etc

Job A3 etc

etc

Jobs runing on machine A, B, C should be in parallel, however, for each
machine, jobs should run one after another.

How can I do it with the subprocess?


It's not too hard if the remote jobs exit gracefully, so that ssh knows
when the remote job is done.

If that's the case, the easiest thing to do might be to simply run a
different script for each machine, and use subprocess.call().

For example, in one script, do this:
subprocess.call("ssh A jobA1")
subprocess.call("ssh A jobA2")
subprocess.call("ssh A jobA3")
subprocess.call("ssh A jobA4")

In another, do this:
subprocess.call("ssh B jobB1")
subprocess.call("ssh B jobB2")
subprocess.call("ssh B jobB3")
subprocess.call("ssh B jobB4")

And so on. Then simply run them at the same time.


If you can't do that--and I recommend you do it that way if you can--then
you can either use threads, or detect when the processes complete
asynchronously using os.wait. I'm not sure if threads and subprocesses
work well together on all machines, to be honest; there could be some
signal issues.

Briefly, here's how os.wait would work. You run parallel subprocesses
like this:

pid_A = subprocess.Popen("ssh A jobA1").pid
pid_B = subprocess.Popen("ssh B jobB1").pid
pid_C = subprocess.Popen("ssh C jobC1").pid


Then call os.wait:

pid,status = os.wait()


os.wait should return the process id of the first subprocess to exit.
You can then tell which machine it was by comparing pid to pid_A, pid_B,
or pid_C, and start the next job accordingly. What you would do is to
call os.wait() in a loop, waiting and spawning the next job for a
particular machine, until all jobs are done. The bookkeeping necessary
to do all this is left an exercise.


Carl Banks
 
L

Lawrence D'Oliveiro

I want send my jobs over a whole bunch of machines (using ssh). The
jobs will need to be run in the following pattern:

(Machine A) (Machine B) (Machine C)

Job A1 Job B1 Job C1

Job A2 Job B2 etc

Job A3 etc

etc

Jobs runing on machine A, B, C should be in parallel, however, for
each machine, jobs should run one after another.

How can I do it with the subprocess?

You could do it with SSH. A command like

ssh machine_a run_job_a1.py

will not terminate until the execution of run_job_a1.py on the remote
machine has terminated. So you end up with a lot of "proxy" subprocesses,
if you like, on the master machine, each one waiting for a remote process
on some master machine to terminate. As the controlling process notices the
termination of each proxy process, it looks to see which slave machine that
maps to, and sends another command to start the next job on that machine.
 
L

Lawrence D'Oliveiro

It's not too hard if the remote jobs exit gracefully, so that ssh knows
when the remote job is done.

It's hard to see how an _ungraceful_ exit could stop SSH from knowing when
the remote job is done.
 
C

Carl Banks

It's hard to see how an _ungraceful_ exit could stop SSH from knowing when
the remote job is done.

Clean exit might have been a better word. Various things can cause
ssh to exit before the job is done; other things can make the process
hang around after the job is finished. The OP needs to make sure the
job, and ssh, exit reliably enough for the given use before depending
on it. Otherwise, resorting to things like lockfiles and timeouts may
be necessary to keep things sequential.


Carl Banks
 
J

JD

Thanks very much for all the answers.

JD

You don't necessarily need thesubprocessmodule to do this, though you
could use it.

I've done this sort of thing in the past with fork and exec.

To serialize the jobs on the machines, the easiest thing is to just send
the commands all at once to a given machine, like "command1; command2;
command3".

You can use waitpid or similar to check if a series of jobs has finished
on a particular machine.

An example of something similar can be found athttp://stromberg.dnsalias.org/~strombrg/loop.html

(If you look at the code, be kind. I wrote it long ago :)

There's a benefit to saving the output from each machine into a single
file for that machine. If you think some machines will produce the same
output, and you don't want to see it over and over, you can analyze the
files with something likehttp://stromberg.dnsalias.org/~strombrg/equivalence-classes.html.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,311
Members
47,985
Latest member
kazewi

Latest Threads

Top