managing stdout and stderr neatly while doing parallel processing

A

Andrei D.

Hello Python newsgroup,

In the process of developing a big ssh wrapper for sending commands to
multiple hosts over the last few months, I (almost accidentally, considering
I'm really just an "amateur hacker" :) was very pleased to discover at one
stage how to run processes in parallel using python, which is powerful
technology to say the least, applicable not only in my project but in lots
of other areas as well.

Anyway, what I wanted to ask was about managing the output of stderr as well
as stdout, using select.select and your common garden os.popen in this case.

This is the script that will define my problem (which is really in another
context altogether, but just to keep the explanation/background short and
sweet for now):

[0] user1/scripts/python> cat parallel7.py
#!/usr/bin/python

import os
import select
import string

def keyboard_interrupt():
print "<<<<< Keyboard Interrupt ! >>>>>\n"
os._exit(1)

def getCommand(count):
return "echo %i: ; ls kjfdjfkd ; ls -l parallel7.py" % (count)

def main():
readPipes=[]
for count in range(1,6):
readPipes.append(os.popen(getCommand(count)))
while 1:
try:
# Could put a timeout here if we had something else to do
readable,writable,errors=select.select(readPipes,[],[])
for p in readable:
print p.read()
readPipes.remove(p)
# os.wait() # Don't want zombies
if len(readPipes)==0:
break
except KeyboardInterrupt: print keyboard_interrupt()
if __name__=="__main__":
main()

So ... the basic problem is that the response from 'ls kjfdjkfd' is not
thrown out in the 'right order' ... observe:

[0] user1/scripts/python> ./parallel7.py
ls: kjfdjfkd: No such file or directory
ls: kjfdjfkd: No such file or directory
ls: kjfdjfkd: No such file or directory
1:
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py

2:
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py

ls: kjfdjfkd: No such file or directory
4:
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py

5:
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py

ls: kjfdjfkd: No such file or directory
3:
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py

[0] user1/scripts/python>

In fact stdout in stages 1 to 5 isn't even necessarily thrown out in the
correct order either, but I'll tackle that separately at another time
(unless it's of direct relevance here?). I guess my question is really: how
do you handle the different elements i.e.

readable,writable,errors=select.select(readPipes,[],[])

in order to get an ordered output of errors as well, like you'd obviously
get doing a loop in the shell like so (even though this is of course a
sequential / not parallel operation):

[0] user1/scripts/python> for i in `seq 1 5` ; do echo $i ; ls sdfdskjsdj ;
ls -l parallel7.py ; done
1
ls: sdfdskjsdj: No such file or directory
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py
2
ls: sdfdskjsdj: No such file or directory
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py
3
ls: sdfdskjsdj: No such file or directory
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py
4
ls: sdfdskjsdj: No such file or directory
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py
5
ls: sdfdskjsdj: No such file or directory
-rwxr-xr-x 1 user1 user1 814 Aug 5 22:10 parallel7.py

Any ideas or comments on this or any related issues would be much
appreciated. Perhaps pexpect will help? I suspect it may well do ...

Thanks,

A.
 
J

Jeff Epler

Well, the problem is that stderr messages are still going the same
place as before (eg the terminal) instead of being redirected by the
popen(). You could use "exec 2>&1;" (if my shell-fu doesn't fail me,
anyway) at the beginning of your popen command to merge them and read
both from the file returned by popen.

However, even when you use the 2>&1 trick, stdio buffering can reverse
the order between stdout and stderr messages:
$ (exec 2>&1; python -c 'import sys; sys.stdout.write("hi there\n"); sys.stderr.write("this is the error, written second\n");') | cat
this is the error, written second
hi there

I'm not sure what tricks to use to fix this problem, and my knowledge of
how Unix works gets iffy. Getting output to be char- or line-buffered
by stdio in the subprocess would seem to be the ticket, but I don't how
to do this for programs that don't support it (python has 'python -u' to
do it). If you set things up to run in a pty, that should get you
terminal-like behavior, including the "expected" ordering of messages
from a single process, but here both my Unix and Python knowledge fail
me.

good luck,
Jeff
 
G

Grant Edwards

Anyway, what I wanted to ask was about managing the output of stderr as well
as stdout, using select.select and your common garden os.popen in this case.

Sorry, can't be done. os.popen() returns a pipe that is hooked to stdout.
stderr is still going to where it was before. The stderr stream isn't being
handled by your Python program at all.

You've got a several options:

1) use os.popen3(), so that you get separate pipes for stdout and stderr.

2) use os.popen4(), so that you get a pipe with combined stdout+stderr.

3) use a pseudo-terminal (pty) so that your child process is running in a
more "natural" environment, and you'll get both stderr and stdout that
way too. Don't know if there's handy Python "pty" module or not...

Some programs act differently when attached to ttys than they do when
attached to pipes. If this is a problem, 3) is what you'll need to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top