Line-by-line processing when stdin is not a tty

R

RG

When stdin is not a tty, Python seems to buffer all the input through
EOF before processing any of it:

[ron@mickey:~]$ cat | python
print 123
print 456 <hit ctrl-D here>
123
456

Is there a way to get Python to process input line-by-line the way it
does when stdin is a TTY even when stdin is not a TTY?

Thanks,
rg
 
T

Tim Harig

When stdin is not a tty, Python seems to buffer all the input through
EOF before processing any of it:

[ron@mickey:~]$ cat | python
print 123
print 456 <hit ctrl-D here>
123
456

Is there a way to get Python to process input line-by-line the way it
does when stdin is a TTY even when stdin is not a TTY?

It would be much better to know the overall purpose of what you are trying
to achieve. There are may be better ways (ie, sockets) depending what you
are trying to do. Knowing your target platform would also be helpful.

For the python interpeter itself, you can can get interactive behavior by
invoking it with the -i option.

If you want to handle stdin a single line at a time from inside of your
program, you can access it using sys.stdin.readline().
 
C

Cameron Simpson

| When stdin is not a tty, Python seems to buffer all the input through
| EOF before processing any of it:
|
| [ron@mickey:~]$ cat | python
| print 123
| print 456 <hit ctrl-D here>
| 123
| 456
|
| Is there a way to get Python to process input line-by-line the way it
| does when stdin is a TTY even when stdin is not a TTY?

What you're seeing here is not python's behaviour but cat's behaviour.

Almost all programs do line buffering (flush buffer at newline) when the
file is a terminal (character device) and block buffering (flush when a
fixed size buffer, typically 8192 bytes or some larger power of 2) when
the file is not a terminal. This is default behaviour for the stdio
package.

So "cat" is simply not feeding any data to python until it has a lot of
it; there is nothing python can do about that. We would need to know
more about your specific task to suggest workarounds.

Usually you either
need an option on the upstream program to tell it to line buffer
explicitly or you need to play silly games with pseudo terminals to
convince the upstream program it is attached to a terminal. The latter
is both ugly and generally inadvisable because many programs that change
their buffering when attached to a terminal also change other behaviour,
such as issuing interactiove prompts etc.

Cheers,
 
T

Tim Harig

once cat had an option -u doing exactly that but nowadays
-u seems to be ignored

http://www.opengroup.org/onlinepubs/009695399/utilities/cat.html

I have to wonder why cat knows or cares. Since we are referring to
a single directional pipe, there is no fear of creating any kind of
race condition. In general, I would expect that the shell opens the
pipe (pipe()), fork()s, closes its own 0 or 1 descriptor as appropriate
for each child, copies (dup()) one the file descriptors to the
appropriate file descriptor for the child process, and exec()s to call
the new process. Neither of the processes, in general, needs to know
anything other the to write and read from their given descriptors.
 
C

Cameron Simpson

| > On Mittwoch 11 August 2010, Cameron Simpson wrote:
| >> Usually you either
| >> need an option on the upstream program to tell it to line
| >> buffer explicitly
| >
| > once cat had an option -u doing exactly that but nowadays
| > -u seems to be ignored
| >
| > http://www.opengroup.org/onlinepubs/009695399/utilities/cat.html
|
| I have to wonder why cat knows or cares. Since we are referring to
| a single directional pipe, there is no fear of creating any kind of
| race condition. In general, I would expect that the shell opens the
| pipe (pipe()), fork()s, closes its own 0 or 1 descriptor as appropriate
| for each child, copies (dup()) one the file descriptors to the
| appropriate file descriptor for the child process, and exec()s to call
| the new process. Neither of the processes, in general, needs to know
| anything other the to write and read from their given descriptors.

The buffering is a performance choice. Every write requires a context
switch from userspace to kernel space, and availability of data in the
pipe will wake up a downstream process blocked trying to read.

It is far more efficient to do as few such copies as possible, so where
interaction (as you point out) is one way it's usually better to write
data in larger chunks. But when writing to a terminal, ostensibly for a
human to read, line buffering is generally better (for exactly the issue
the OP tripped over - humans expect stuff to happen as it occurs).
 
T

Tim Harig

| > On Mittwoch 11 August 2010, Cameron Simpson wrote:
| >> Usually you either
| >> need an option on the upstream program to tell it to line
| >> buffer explicitly
| >
| > once cat had an option -u doing exactly that but nowadays
| > -u seems to be ignored
| >
| > http://www.opengroup.org/onlinepubs/009695399/utilities/cat.html
|
| I have to wonder why cat knows or cares. Since we are referring to
| a single directional pipe, there is no fear of creating any kind of
| race condition. In general, I would expect that the shell opens the
| pipe (pipe()), fork()s, closes its own 0 or 1 descriptor as appropriate
| for each child, copies (dup()) one the file descriptors to the
| appropriate file descriptor for the child process, and exec()s to call
| the new process. Neither of the processes, in general, needs to know
| anything other the to write and read from their given descriptors.

The buffering is a performance choice. Every write requires a context
switch from userspace to kernel space, and availability of data in the
pipe will wake up a downstream process blocked trying to read.

It is far more efficient to do as few such copies as possible, so where
interaction (as you point out) is one way it's usually better to write
data in larger chunks. But when writing to a terminal, ostensibly for a
human to read, line buffering is generally better (for exactly the issue
the OP tripped over - humans expect stuff to happen as it occurs).

Right, I don't question the optimization. I question whether the
intelligence that performes that optimation should be placed within cat or
whether it should be placed within the shell. It seems to me that the
shell has a better idea of how the command is being used and can therefore
make a better decision about whether or not buffering is appropriate.
 
G

Grant Edwards

When stdin is not a tty, Python seems to buffer all the input through
EOF before processing any of it:

[ron@mickey:~]$ cat | python
print 123
print 456 <hit ctrl-D here>
123
456

Is there a way to get Python to process input line-by-line the way it
does when stdin is a TTY even when stdin is not a TTY?

It would be much better to know the overall purpose of what you are trying
to achieve. There are may be better ways (ie, sockets) depending what you
are trying to do. Knowing your target platform would also be helpful.

For the python interpeter itself, you can can get interactive behavior by
invoking it with the -i option.

If you're talking about unbuffered stdin/stdout, the option is -u.

I don't really see how the -i option is relevent -- it causes the
interpreter to go into interactive mode after running the script.
If you want to handle stdin a single line at a time from inside of your
program, you can access it using sys.stdin.readline().

That doesn't have any effect on stdin buffering.
 
P

Peter Otten

Grant said:
When stdin is not a tty, Python seems to buffer all the input through
EOF before processing any of it:

[ron@mickey:~]$ cat | python
print 123
print 456 <hit ctrl-D here>
123
456

Is there a way to get Python to process input line-by-line the way it
does when stdin is a TTY even when stdin is not a TTY?

It would be much better to know the overall purpose of what you are
trying
to achieve. There are may be better ways (ie, sockets) depending what
you
are trying to do. Knowing your target platform would also be helpful.

For the python interpeter itself, you can can get interactive behavior by
invoking it with the -i option.

If you're talking about unbuffered stdin/stdout, the option is -u.

I don't really see how the -i option is relevent -- it causes the
interpreter to go into interactive mode after running the script.

I'd say the following looks like what the OP was asking for:

$ cat | python -i -c'import sys; sys.ps1=""'
print sys.stdin.isatty()
False
print 1
1
print 2
2

(Whether it's useful is yet another question)
That doesn't have any effect on stdin buffering.

"for line in stream"-style file iteration uses an internal buffer that is
not affected by the -u option; stream.readline() doesnt use this
optimization.

Peter
 
G

Grant Edwards

Grant Edwards wrote:

"for line in stream"-style file iteration uses an internal buffer that is
not affected by the -u option; stream.readline() doesnt use this
optimization.

You're right. Why didn't I know that?

Using "for line in sys.stdin" does it's own buffering.

In my tests using sys.stdin.readline() worked as the OP desired either
with or without -u, either with or without cat. IOW, "cat" isn't
buffering output on my system (or if it is, it's line-buffering).
 
R

RG

Cameron Simpson said:
| When stdin is not a tty, Python seems to buffer all the input through
| EOF before processing any of it:
|
| [ron@mickey:~]$ cat | python
| print 123
| print 456 <hit ctrl-D here>
| 123
| 456
|
| Is there a way to get Python to process input line-by-line the way it
| does when stdin is a TTY even when stdin is not a TTY?

What you're seeing here is not python's behaviour but cat's behaviour.

Almost all programs do line buffering (flush buffer at newline) when the
file is a terminal (character device) and block buffering (flush when a
fixed size buffer, typically 8192 bytes or some larger power of 2) when
the file is not a terminal. This is default behaviour for the stdio
package.

So "cat" is simply not feeding any data to python until it has a lot of
it;

I don't think that's right:

[ron@mickey:~]$ cat | cat
123
123
321
321

Cat seems to flush its buffer after every newline. Also:

[ron@mickey:~]$ cat -u | python
print 123
print 456
123
456

We would need to know
more about your specific task to suggest workarounds.

I'm writing a system in a different language but want to use a Python
library. I know of lots of ways to do this (embed a Python interpreter,
fire up a python server) but by far the easiest to implement is to have
the main program spawn a Python interpreter and interact with it through
its stdin/stdout. In my code I explicitly force the output stream that
is being piped to Python's stdin to be flushed so I know it's not a
buffering problem on the input side.

rg
 
R

RG

Peter Otten said:
Grant said:
When stdin is not a tty, Python seems to buffer all the input through
EOF before processing any of it:

[ron@mickey:~]$ cat | python
print 123
print 456 <hit ctrl-D here>
123
456

Is there a way to get Python to process input line-by-line the way it
does when stdin is a TTY even when stdin is not a TTY?

It would be much better to know the overall purpose of what you are
trying
to achieve. There are may be better ways (ie, sockets) depending what
you
are trying to do. Knowing your target platform would also be helpful.

For the python interpeter itself, you can can get interactive behavior by
invoking it with the -i option.

If you're talking about unbuffered stdin/stdout, the option is -u.

I don't really see how the -i option is relevent -- it causes the
interpreter to go into interactive mode after running the script.

I'd say the following looks like what the OP was asking for:

$ cat | python -i -c'import sys; sys.ps1=""'
print sys.stdin.isatty()
False
print 1
1
print 2
2

That is indeed the behavior I'm looking for.
(Whether it's useful is yet another question)

It's useful to me :) I'm trying to access a python library from a
program written in another language for which an equivalent library is
not available. The easiest way to do that is to spawn a Python
interpreter and interact with it through stdin/stdout.

Thanks!

rg
 
T

Tim Harig

I'm writing a system in a different language but want to use a Python
library. I know of lots of ways to do this (embed a Python interpreter,
fire up a python server) but by far the easiest to implement is to have
the main program spawn a Python interpreter and interact with it through
its stdin/stdout.

Or, open python using a socket. That way you have total control over how
the information is transfered, as well as bi-directional transfer.
 
R

RG

Tim Harig said:
Or, open python using a socket.

You mean a TCP/IP socket? Or a unix domain socket? The former has
security issues, and the latter seems like a lot of work. Or is there
an easy way to do it that I don't know about?

rg
 
T

Tim Harig

You mean a TCP/IP socket? Or a unix domain socket? The former has
security issues, and the latter seems like a lot of work. Or is there
an easy way to do it that I don't know about?

I was referring to unix domain sockets or more specifically stream
pipes. I guess it depends what language you are using and what libraries
you have access to. Under C, working with stream pipes is no more trivial
then using pipe(). You can simply create the socket descriptors using
socketpair(). Keep one of the descriptors for your process and pass the
other to the python child process as both stdin and stdout.
 
R

RG

Tim Harig said:
I was referring to unix domain sockets or more specifically stream
pipes. I guess it depends what language you are using and what libraries
you have access to. Under C, working with stream pipes is no more trivial
then using pipe(). You can simply create the socket descriptors using
socketpair(). Keep one of the descriptors for your process and pass the
other to the python child process as both stdin and stdout.

Ah. That is in fact exactly what I am doing, and that is how I first
encountered this problem.

rg
 
R

RG

RG said:
Ah. That is in fact exactly what I am doing, and that is how I first
encountered this problem.

rg

And now I have encountered another problem:

-> print sys.stdin.encoding
<- None

But when I run from a terminal:

[ron@mickey:~]$ python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.'UTF-8'


I thought the value of sys.stdin.encoding was hard-coded into the Python
executable at compile time, but that's obviously wrong. So how does
Python get the value of sys.stdin.encoding?

rg
 
C

Cameron Simpson

| > The buffering is a performance choice. Every write requires a context
| > switch from userspace to kernel space, and availability of data in the
| > pipe will wake up a downstream process blocked trying to read.
| > It is far more efficient to do as few such copies as possible, [...]
|
| Right, I don't question the optimization. I question whether the
| intelligence that performes that optimation should be placed within cat or
| whether it should be placed within the shell. It seems to me that the
| shell has a better idea of how the command is being used and can therefore
| make a better decision about whether or not buffering is appropriate.

I would argue it's not much better placed, though it would be nice if
the control could be issued from there. But it can't.

Regarding the former, in this pipeline:

cat some files... | python filter program | something else

how shall the shell know if the python filter (to take the OP's case)
wants its input line buffered (rare) or block buffered (usually ok)?

What might be useful would be a way to attach an attribute to a pipe
or other file descriptor indicating the desired buffering behaviour
that writers to the file descriptor should adopt.

Of course, the ugly sides to that are how many buffering regimes should
it be possible to express and how and when should the upstream (writing)
program decide to check? In a pipeline the pipes are made _before_ any
of the programs commence because the programs need to be attached to the
pipes (this is done before the programs themselves are dispatched). So,
_after_ dispatch the python-wanting-line-buffering issues an ioctl on
the pipe saying "I want line buffering". However, the upstream program
may well already have commenced operation before that happens. It may
even have run to completion before that happens! So, shall all upstream
programs be required to poll? How often? On every write? Shall they
receive a signal? What if they don't catch it? If the downstream
program _requires_ line buffering then the whole situation is racey
and unreliable.

You can see that on reflection this isn't easy to resolve cleanly from
_outside_ the writing program.

To do it from inside requires all programs to sprout an option like
GNU cat's -u option.

Cheers,
--
Cameron Simpson <[email protected]> DoD#743
http://www.cskk.ezoshosting.com/cs/

What progress we are making. In the Middle Ages they would have burned
me. Now they are content with burning my books. - Sigmund Freud
 
N

Nobody

I have to wonder why cat knows or cares.

The issue relates to the standard C library. By convention[1], stdin and
stdout are line-buffered if the descriptor refers to a tty, and are
block-buffered otherwise. stderr is always unbuffered.

Any program which uses stdin and stdout without explicitly setting the
buffering or using fflush() will exhibit this behaviour.

[1] ANSI/ISO C is less specific; C99, 7.19.3p7:

As initially opened, the standard error stream is not fully
buffered; the standard input and standard output streams are
fully buffered if and only if the stream can be determined not
to refer to an interactive device.

POSIX says essentially the same thing:

<http://www.opengroup.org/onlinepubs/9699919799/functions/stdin.html>
 
R

RG

Nobody said:
I have to wonder why cat knows or cares.

The issue relates to the standard C library. By convention[1], stdin and
stdout are line-buffered if the descriptor refers to a tty, and are
block-buffered otherwise. stderr is always unbuffered.

Any program which uses stdin and stdout without explicitly setting the
buffering or using fflush() will exhibit this behaviour.

[1] ANSI/ISO C is less specific; C99, 7.19.3p7:

As initially opened, the standard error stream is not fully
buffered; the standard input and standard output streams are
fully buffered if and only if the stream can be determined not
to refer to an interactive device.

POSIX says essentially the same thing:

<http://www.opengroup.org/onlinepubs/9699919799/functions/stdin.html>

This doesn't explain why "cat | cat" when run interactively outputs
line-by-line (which it does). STDIN to the first cat is a TTY, but the
second one isn't.

For that matter, you can also do this:

nc -l 1234 | cat

and then telnet localhost 1234 and type at it, and it still works
line-by-line.

rg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top