subprocess and non-blocking IO (again)

M

Marc Carter

I am trying to rewrite a PERL automation which started a "monitoring"
application on many machines, via RSH, and then multiplexed their
collective outputs to stdout.

In production there are lots of these subprocesses but here is a
simplified example what I have so far (python n00b alert!)
- SNIP ---------
import subprocess,select,sys

speakers=[]
lProc=[]

for machine in ['box1','box2','box3']:
p = subprocess.Popen( ('echo '+machine+';sleep 2;echo goodbye;sleep
2;echo cruel;sleep 2;echo world'), stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stdin=None, universal_newlines=True )
lProc.append( p )
speakers.append( p.stdout )

while speakers:
speaking = select.select( speakers, [], [], 1000 )[0]
for speaker in speaking:
speech = speaker.readlines()
if speech:
for sentence in speech:
print sentence.rstrip('\n')
sys.stdout.flush() # sanity check
else: # EOF
speakers.remove( speaker )
- SNIP ---------
The problem with the above is that the subprocess buffers all its output
when used like this and, hence, this automation is not informing me of
much :)

In PERL, "realtime" feedback was provided by setting the following:
$p->stdout->blocking(0);

How do I achieve this in Python ?

This topic seems to have come up more than once. I am hoping that
things have moved on from posts like this:
http://groups.google.com/group/comp...b471009ab2?q=blocking&rnum=4#434fa9b471009ab2
as I don't really want to have to write all that ugly
fork/dup/fcntl/exec code to achieve this when high-level libraries like
"subprocess" really should have corresponding methods.

If it makes anything simpler, I only *need* this on Linux/Unix (Windows
would be a nice extra though).

thanks for reading,
Marc
 
D

Donn Cave

I am trying to rewrite a PERL automation which started a "monitoring"
application on many machines, via RSH, and then multiplexed their
collective outputs to stdout.

In production there are lots of these subprocesses but here is a
simplified example what I have so far (python n00b alert!)
- SNIP ---------
import subprocess,select,sys

speakers=[]
lProc=[]

for machine in ['box1','box2','box3']:
p = subprocess.Popen( ('echo '+machine+';sleep 2;echo goodbye;sleep
2;echo cruel;sleep 2;echo world'), stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stdin=None, universal_newlines=True )
lProc.append( p )
speakers.append( p.stdout )

while speakers:
speaking = select.select( speakers, [], [], 1000 )[0]
for speaker in speaking:
speech = speaker.readlines()
if speech:
for sentence in speech:
print sentence.rstrip('\n')
sys.stdout.flush() # sanity check
else: # EOF
speakers.remove( speaker )
- SNIP ---------
The problem with the above is that the subprocess buffers all its output
when used like this and, hence, this automation is not informing me of
much :)

You're using C stdio, through the Python fileobject. This is
sort of subprocess' fault, for returning a fileobject in the
first place, but in any case you can expect your input to be
buffered. You're asking for it, because that's what C stdio does.
When you call readlines(), you're further guaranteeing that you
won't go on to the next statement until the fork dies and its
pipe closes, because that's what readlines() does -- returns
_all_ lines of output.

If you want to use select(), don't use the fileobject
functions. Use os.read() to read data from the pipe's file
descriptor (p.stdout.fileno().) This is how you avoid the
buffering.
This topic seems to have come up more than once. I am hoping that
things have moved on from posts like this:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/5472ce95e
b430002/434fa9b471009ab2?q=blocking&rnum=4#434fa9b471009ab2
as I don't really want to have to write all that ugly
fork/dup/fcntl/exec code to achieve this when high-level libraries like
"subprocess" really should have corresponding methods.

subprocess doesn't have pty functionality. It's hard to say
for sure who said what in that page, after the incredible mess
Google has made of their USENET archives, but I believe that's
why you see dup2 there - the author is using a pty library,
evidently pexpect. As far as I know, things have not moved on
in this respect, not sure what kind of movement you expected
to see in the intervening month. I don't think you need ptys,
though, so I wouldn't worry about it.

Donn Cave, (e-mail address removed)
 
N

Nick Craig-Wood

Marc Carter said:
import subprocess,select,sys

speakers=[]
lProc=[]

for machine in ['box1','box2','box3']:
p = subprocess.Popen( ('echo '+machine+';sleep 2;echo goodbye;sleep
2;echo cruel;sleep 2;echo world'), stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stdin=None, universal_newlines=True )
lProc.append( p )
speakers.append( p.stdout )

while speakers:
speaking = select.select( speakers, [], [], 1000 )[0]
for speaker in speaking:
speech = speaker.readlines()
if speech:
for sentence in speech:
print sentence.rstrip('\n')
sys.stdout.flush() # sanity check
else: # EOF
speakers.remove( speaker )
- SNIP ---------
The problem with the above is that the subprocess buffers all its output
when used like this and, hence, this automation is not informing me of
much :)

The problem with the above is that you are calling speaker.readlines()
which waits for all the output.

If you replace that with speaker.readline() or speaker.read(1) you'll
see that subprocess hasn't given you a buffered pipe after all!

In fact you'll get partial reads of each line - you'll have to wait
for a newline before processing the result, eg

import subprocess,select,sys

speakers=[]
lProc=[]

for machine in ['box1','box2','box3']:
p = subprocess.Popen( ('echo '+machine+';sleep 2;echo goodbye;sleep 2;echo cruel;sleep 2;echo world'), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=None, universal_newlines=True, shell=True)
lProc.append( p )
speakers.append( p.stdout )

while speakers:
speaking = select.select( speakers, [], [], 1000 )[0]
for speaker in speaking:
speech = speaker.readline()
if speech:
for sentence in speech:
print sentence.rstrip('\n')
sys.stdout.flush() # sanity check
else: # EOF
speakers.remove( speaker )

gives

b
o
x
1

b
o
x
3

b
o
x
2

pause...

g
o
o
d
b
y
e

etc...

I'm not sure why readline only returns 1 character - the pipe returned
by subprocess really does seem to be only 1 character deep which seems
a little inefficient! Changing bufsize to the Popen call doesn't seem
to affect it.
 
M

Marc Carter

Donn said:
If you want to use select(), don't use the fileobject
functions. Use os.read() to read data from the pipe's file
descriptor (p.stdout.fileno().) This is how you avoid the
buffering.
Thankyou, this works perfectly. I figured it would be something simple.

Marc
 
T

Thomas Bellman

Marc Carter said:
The problem with the above is that the subprocess buffers all its output
when used like this and, hence, this automation is not informing me of
much :)

You may want to take a look at my asyncproc module. With it, you
can start subprocesses and let them run in the background without
blocking either the subprocess or your own process, while still
collecting their output.

You can download it from

http://www.lysator.liu.se/~bellman/download/asyncproc.py

I suspect that it doesn't work under MS Windows, but I don't use
that OS, and thus can't test it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top