K
Keith Thompson
Joe Wright said:Keith Thompson wrote: [...]The end-of-file indicator is set differently for a file or for theWhen the user types ^D or ^Z, it's interpreted by (some layer of)
the OS as an end-of-file indication. If a C program is reading
from the correspoding input stream by repeatedly calling getchar(),
after the getchar() has processed the last character that preceded
the ^D or ^Z, the *end-of-file indicator* for the stream is set.
When getchar() is called on a stream whose end-of-file indicator
is set, it returns the value of EOF.
keyboard. And yes, the I/O system knows which. Sending a line to
getchar() ending with '\n' does not set the end-of-file (eof)
indicator. The eof gets set only when you attempt to read beyond the
last byte of a file or when, from the keyboard, ^D or ^Z follows '\n'.
A line ending with '\n' doesn't signal end-of-file for a disk file
either.
The input model, as far as getchar() is concerned, is pretty much
the same either for a disk file or for input from a keyboard.
Both are modeled as input text streams, and both are composed of
a sequence of lines, each terminated by '\n'. The end-of-file
condition is triggered differently: for a disk file, typically as
defined by the filesystem's idea of how many bytes the file contains,
and for keyboard input, typically by the user entering some special
key combination. But the behavior as seen by a C program calling
getchar() is very similar. (The C implementation, including the OS,
goes to considerable effort to make them look similar.)
The ^D or ^Z keypresses immediately following '\n', followed by '\n'
or Enter is the keyboard version of end-of-file. Otherwise 4 and 26
are just charters with that value.
s/charters/characters/
nNote also that, under Unix, typing ^D twice in the middle of a line
also signals an end-of-file condition; getchar() can then return
EOF without having previously returned '\n'. And ^V^D causes
getchar() to return '\004'. (Both ^V and ^D can be reconfigured
to other values.) I'm less familiar with how Windows does this.
The ^Z written as a character to signify the end of a text file is an
artifact of CP/M carried over into early MSDOS. CP/M wrote files in
128-byte chunks. The filesystem couldn't tell you whether a file was 3
bytes or 123 bytes. You needed to read it and stop at the 1A.
Relatively modern systems retain this legacy. Many Microsoft text
files still end with 1A for no other obvious reason.
And, IIRC, a 1A byte in the middle of a Windows text file is treated
as an end-of-file marker (but only if you read it in text mode;
in binary mode it's just another byte).