Registered said:
I've read in a book:
<quote>
With a binary-mode stream, you can't detect the end-of-file by looking
for EOF, because a byte of data from a binary stream could have that
value, which would result in premature end of input. Instead, you can
use the library function feof(), which can be used for both binary- and
text-mode files:
int feof(FILE *fp);
</quote>
Isn't it true that testing for EOF is valid for both text- and
binary-mode files?
The book is right in the sense that it is possible for a
byte read from a stream (text or binary) to have the value
EOF, but only on "exotic" machines where bytes and ints have
the same size. That is, the book is right if it's trying to
be "fully general" -- but if it's writing about "mainstream"
C implementations it's wrong.
The Standard defines all input operations as if they used
the fgetc() function as many times as necessary (the actual
implementation might do something more intricate, but the end
result must be the same). The fgetc() function returns an int
value: either EOF to indicate failure, or an actual input byte
represented as unsigned char converted to int. If int is
wider than char, converting an unsigned char to an int yields
a non-negative value, and since the EOF macro expands to a
negative number there can be no confusion.
On those exotic architectures, though, things get sticky.
If sizeof(int) == 1, there must be unsigned char values that
are too large for int: for example, on a system with sixteen-bit
chars and sixteen-bit ints, INT_MAX will be 32767 but UCHAR_MAX
will be 65535. Since fgetc() must be able to read back any
character values fputc() might have written (subject to some
restrictions that don't matter here), on this system it must
be able to return 65536 distinguishable int values. Half of
those will necessarily be negative, and one of them will have
the same value as EOF. So on exotic architectures, it is
possible for fgetc() to return EOF when reading "real" data,
and the only way to tell whether the EOF is actual data or an
indication of input failure is to call both feof() and ferror().
> Also, the FAQ recommends not to use feof():
> <quote>In virtually all cases, there's no need to use feof at all.
> </quote>
I'm not the FAQ author, but I'd read "in virtually all cases"
to mean "whenever int is wider than char," or "on virtually all
`mainstream' machines." It would be nice, IMHO, if the FAQ were
more explicit about this, but it's not a big failing.
The FAQ is right in implying that feof() is seldom used,
because after receiving an EOF return value (on a "mainstream"
system) your immediate concern should be "End-of-input, or error?"
and it seems more natural to use ferror() for that question:
int ch;
while ( (ch = fgetc(stream)) != EOF ) {
/* process the character just read */
}
/* "Why did we get EOF?" */
if (ferror(stream)) {
/* do something about the I/O error */
}
else {
/* normal end-of-input */
}
This code assumes that EOF can only appear as the result of
end-of-input or I/O error, so if there's no I/O error the stream
must have reached its end. Of course, the same reasoning would
hold for using feof(stream) and swapping the bodies of the two
if statements, but "ferror?" seems a more direct inquiry.
On "exotic" architectures the either/or reasoning breaks down
because there's a third possibility: an EOF return might be actual
input data. If you're writing with such a system in mind you need
to use both feof() and ferror() to distinguish the three outcomes,
and the loop might look something like
int ch;
while ( (ch = fgetc(stream)) , /* comma operator */
(!feof(stream) && !ferror(stream) ) {
/* process the character just read */
}
/* "Was it error or end-of-input?" */
if (ferror(stream)) {
/* do something about the I/O error */
}
else {
/* normal end-of-input */
}
Of course, this can be written in many other rearrangements. One
likely change would be to call feof() and ferror() only when an EOF
shows up instead of every single time, by changing the while clause
to something like
while ( (ch = fgetc(stream)) != EOF
|| (!feof(stream) && !ferror(stream)) )
Since most I/O devices are pathetically slow compared to most CPUs,
this "optimization" probably doesn't save noticeable time -- but
it is in the tradition of C to worry about tiny efficiencies while
ignoring gross waste. ;-) (That same tradition, by the way, calls
for using getc() instead of fgetc() wherever possible.)