EOF for binary files

R

Registered User

I've read in a book:

<quote>
With a binary-mode stream, you can't detect the end-of-file by looking
for EOF, because a byte of data from a binary stream could have that
value, which would result in premature end of input. Instead, you can
use the library function feof(), which can be used for both binary- and
text-mode files:

int feof(FILE *fp);
</quote>

Isn't it true that testing for EOF is valid for both text- and
binary-mode files?

Also, the FAQ recommends not to use feof():
<quote>In virtually all cases, there's no need to use feof at all.
</quote>
 
R

Richard Heathfield

Registered User said:
I've read in a book:

<quote>
With a binary-mode stream, you can't detect the end-of-file by looking
for EOF, because a byte of data from a binary stream could have that
value, which would result in premature end of input.

Ditch the book. It doesn't understand EOF.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
C

CBFalconer

Registered said:
I've read in a book:

<quote>
With a binary-mode stream, you can't detect the end-of-file by
looking for EOF, because a byte of data from a binary stream could
have that value, which would result in premature end of input.
Instead, you can use the library function feof(), which can be
used for both binary- and text-mode files:

int feof(FILE *fp);
</quote>

Isn't it true that testing for EOF is valid for both text- and
binary-mode files?

Yes. The only possible exception occurs when (sizeof(int) == 1).
A stream is a stream of bytes, and the routines to read them return
ints formed from the (unsigned)char value involved. Thus the value
of EOF is always distinct.
Also, the FAQ recommends not to use feof():
<quote>In virtually all cases, there's no need to use feof at all.
</quote>

feof is primarily useful to distinguish between i/o errors and
actual eof, either of which conditions will usually return EOF.

if (EOF == (ch = getc(f))) {
if (feof(f)) /* actual file eof encountered */
else {
/* use ferror etc. to determine the cause */
}
}
else {
/* use the value of ch, which is a valid unsigned char */
}

note that ch must have been declared as an int.
 
R

Richard Heathfield

Registered User said:
Oh, thanks Richard!! That part of the book really got me confused.

The mistake the author makes is that he appears to believe EOF is a
character. It isn't. It's a message from your I/O library which, freely
translated, means "you asked me for more data, squire, but there ain't
none. The pot's empty. Sorry, I'd love to help and all that...".

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
R

Richard Tobin

Registered User said:
With a binary-mode stream, you can't detect the end-of-file by looking
for EOF, because a byte of data from a binary stream could have that
value, which would result in premature end of input.

It would certainly be a mistake to compare a byte against EOF if the
byte is a char, because EOF is an int value and a char converted to
an int might have the same value as EOF. But getc() doesn't return
a char; it returns an unsigned char converted to an int, so there
is no possibility of a real byte appearing to be equal to EOF, because
EOF is guaranteed to be negative.

So you can perfectly well compare against EOF provided you don't
convert the value to a char first.

-- Richard
 
E

Eric Sosman

Registered said:
I've read in a book:

<quote>
With a binary-mode stream, you can't detect the end-of-file by looking
for EOF, because a byte of data from a binary stream could have that
value, which would result in premature end of input. Instead, you can
use the library function feof(), which can be used for both binary- and
text-mode files:

int feof(FILE *fp);
</quote>

Isn't it true that testing for EOF is valid for both text- and
binary-mode files?

The book is right in the sense that it is possible for a
byte read from a stream (text or binary) to have the value
EOF, but only on "exotic" machines where bytes and ints have
the same size. That is, the book is right if it's trying to
be "fully general" -- but if it's writing about "mainstream"
C implementations it's wrong.

The Standard defines all input operations as if they used
the fgetc() function as many times as necessary (the actual
implementation might do something more intricate, but the end
result must be the same). The fgetc() function returns an int
value: either EOF to indicate failure, or an actual input byte
represented as unsigned char converted to int. If int is
wider than char, converting an unsigned char to an int yields
a non-negative value, and since the EOF macro expands to a
negative number there can be no confusion.

On those exotic architectures, though, things get sticky.
If sizeof(int) == 1, there must be unsigned char values that
are too large for int: for example, on a system with sixteen-bit
chars and sixteen-bit ints, INT_MAX will be 32767 but UCHAR_MAX
will be 65535. Since fgetc() must be able to read back any
character values fputc() might have written (subject to some
restrictions that don't matter here), on this system it must
be able to return 65536 distinguishable int values. Half of
those will necessarily be negative, and one of them will have
the same value as EOF. So on exotic architectures, it is
possible for fgetc() to return EOF when reading "real" data,
and the only way to tell whether the EOF is actual data or an
indication of input failure is to call both feof() and ferror().
> Also, the FAQ recommends not to use feof():
> <quote>In virtually all cases, there's no need to use feof at all.
> </quote>

I'm not the FAQ author, but I'd read "in virtually all cases"
to mean "whenever int is wider than char," or "on virtually all
`mainstream' machines." It would be nice, IMHO, if the FAQ were
more explicit about this, but it's not a big failing.

The FAQ is right in implying that feof() is seldom used,
because after receiving an EOF return value (on a "mainstream"
system) your immediate concern should be "End-of-input, or error?"
and it seems more natural to use ferror() for that question:

int ch;
while ( (ch = fgetc(stream)) != EOF ) {
/* process the character just read */
}
/* "Why did we get EOF?" */
if (ferror(stream)) {
/* do something about the I/O error */
}
else {
/* normal end-of-input */
}

This code assumes that EOF can only appear as the result of
end-of-input or I/O error, so if there's no I/O error the stream
must have reached its end. Of course, the same reasoning would
hold for using feof(stream) and swapping the bodies of the two
if statements, but "ferror?" seems a more direct inquiry.

On "exotic" architectures the either/or reasoning breaks down
because there's a third possibility: an EOF return might be actual
input data. If you're writing with such a system in mind you need
to use both feof() and ferror() to distinguish the three outcomes,
and the loop might look something like

int ch;
while ( (ch = fgetc(stream)) , /* comma operator */
(!feof(stream) && !ferror(stream) ) {
/* process the character just read */
}
/* "Was it error or end-of-input?" */
if (ferror(stream)) {
/* do something about the I/O error */
}
else {
/* normal end-of-input */
}

Of course, this can be written in many other rearrangements. One
likely change would be to call feof() and ferror() only when an EOF
shows up instead of every single time, by changing the while clause
to something like

while ( (ch = fgetc(stream)) != EOF
|| (!feof(stream) && !ferror(stream)) )

Since most I/O devices are pathetically slow compared to most CPUs,
this "optimization" probably doesn't save noticeable time -- but
it is in the tradition of C to worry about tiny efficiencies while
ignoring gross waste. ;-) (That same tradition, by the way, calls
for using getc() instead of fgetc() wherever possible.)
 
C

Coos Haak

Op 11 Nov 2006 14:34:44 GMT schreef Richard Tobin:
It would certainly be a mistake to compare a byte against EOF if the
byte is a char, because EOF is an int value and a char converted to
an int might have the same value as EOF. But getc() doesn't return
a char; it returns an unsigned char converted to an int, so there
is no possibility of a real byte appearing to be equal to EOF, because
EOF is guaranteed to be negative.

getc returns an int, not a char, be it signed or unsigned.
#include <stdio.h>
int getc(FILE *FP);
And yes, if no EOF condition is reached, the int may be regarded as char.
EOF does not fit in a char so it well may be some negative number.
So you can perfectly well compare against EOF provided you don't
convert the value to a char first.

Yes.
 
F

Flash Gordon

Coos said:
Op 11 Nov 2006 14:34:44 GMT schreef Richard Tobin:


getc returns an int, not a char, be it signed or unsigned.

Richard said that.
#include <stdio.h>
int getc(FILE *FP);
And yes, if no EOF condition is reached, the int may be regarded as char.

Be *definition* if EOF is not returned the value is that of an
*unsigned* char as, again, richard said.
EOF does not fit in a char so it well may be some negative number.

EOF is *defined* as being a negative number, so there is no "may well
be" about it.

Everything Richard said in that post is correct, not just that last
sentence.
 
C

Coos Haak

Op Sat, 11 Nov 2006 17:21:14 +0000 schreef Flash Gordon:
My mistake, I overlooked this -------
Sorry for reading and replying too fast and hasty ;-(
 
K

Keith Thompson

Registered User said:
I've read in a book:

<quote>
With a binary-mode stream, you can't detect the end-of-file by looking
for EOF, because a byte of data from a binary stream could have that
value, which would result in premature end of input. Instead, you can
use the library function feof(), which can be used for both binary- and
text-mode files:

int feof(FILE *fp);
</quote>

Who is the author? If it's Schildt, we already know about him (and
warn people away from his books whenever possible). If it's someone
else, we may have another name for The List.
 
S

SM Ryan

# I've read in a book:
#
# <quote>
# With a binary-mode stream, you can't detect the end-of-file by looking
# for EOF, because a byte of data from a binary stream could have that
# value, which would result in premature end of input. Instead, you can
# use the library function feof(), which can be used for both binary- and
# text-mode files:

It's referring to getw(fp) which can return the same value as EOF
without actually being at the end of file.
 
K

Keith Thompson

SM Ryan said:
It's referring to getw(fp) which can return the same value as EOF
without actually being at the end of file.

What makes you think it's referring to getw()? There is no such function
in standard C.

<OT>
There is a non-standard function getw() that reads a word (defined as
an int) from a stream. It's not even POSIX; it's defined by SVID, and
one man page recommends using fread() instead. The text quoted from
the book doesn't even make sense in terms of getw(), since it talks
about a *byte* of data having the value EOF.
</OT>

It's far more likely that the author of the book just doesn't know
what he's talking about.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top