Hi,
I want send my jobs over a whole bunch of machines (using ssh). The jobs
will need to be run in the following pattern:
(Machine A) (Machine B) (Machine C)
Job A1 Job B1 Job C1
Job A2 Job B2 etc
Job A3 etc
etc
Jobs runing on machine A, B, C should be in parallel, however, for each
machine, jobs should run one after another.
How can I do it with the subprocess?
It's not too hard if the remote jobs exit gracefully, so that ssh knows
when the remote job is done.
If that's the case, the easiest thing to do might be to simply run a
different script for each machine, and use subprocess.call().
For example, in one script, do this:
subprocess.call("ssh A jobA1")
subprocess.call("ssh A jobA2")
subprocess.call("ssh A jobA3")
subprocess.call("ssh A jobA4")
In another, do this:
subprocess.call("ssh B jobB1")
subprocess.call("ssh B jobB2")
subprocess.call("ssh B jobB3")
subprocess.call("ssh B jobB4")
And so on. Then simply run them at the same time.
If you can't do that--and I recommend you do it that way if you can--then
you can either use threads, or detect when the processes complete
asynchronously using os.wait. I'm not sure if threads and subprocesses
work well together on all machines, to be honest; there could be some
signal issues.
Briefly, here's how os.wait would work. You run parallel subprocesses
like this:
pid_A = subprocess.Popen("ssh A jobA1").pid
pid_B = subprocess.Popen("ssh B jobB1").pid
pid_C = subprocess.Popen("ssh C jobC1").pid
Then call os.wait:
pid,status = os.wait()
os.wait should return the process id of the first subprocess to exit.
You can then tell which machine it was by comparing pid to pid_A, pid_B,
or pid_C, and start the next job accordingly. What you would do is to
call os.wait() in a loop, waiting and spawning the next job for a
particular machine, until all jobs are done. The bookkeeping necessary
to do all this is left an exercise.
Carl Banks