EOF (novice)

H

Hallvard B Furuseth

James said:
Of course not. "We don't know nor should we care" clearly spells that
out. You claimed that EOF was *not* stored on the physical medium.

No, I claimed that _C's_ EOF is not. At least that was what I meant to
say. Anyway, maybe we just have been saying the same thing in different
ways.
 
A

Arthur J. O'Dwyer

I still don't get it. Each and every byte combination is still valid
in a binary file, therefore it *cannot* be used as eof marker.

A trivial example would be an MS-DOS-like hybrid system on which the
byte 0xA1 would indicate the end of each file (text or binary). [Not
a typo; I specifically changed it from 0x1A so that EOF could be
#defined to be 0xA1A1 on this hypothetical 16-bit system.]
"But then how does a program represent the literal byte 0xA1 on
the disk?" you ask. Simple -- escape codes. For example, the EOF
code could be 0xA1A1, and the escape code for the literal byte 0xA1
could be 0xA100 (big-endian). This would satisfy all the requirements
of the C standard on file systems (i.e., precious few), while being
technically possible.
Heck, you could even Huffman-encode every single file on the system
to save space, and use some rare codon to indicate EOF. That's getting
closer to what I think James means by "a compressed filesystem."

That's true and irrelevant in the case of text files. My point is that
your scheme simply does not work for binary files.

[In case Dan hasn't already thought of this: fseek() is not required
to run in constant time. Binary files don't have to be random-access
in their "natural state"; it just happens that all existing systems
do it that way.]
Furthermore, EOF
is a C macro having no connection with whatever mechanism the
implementation uses to detect the end of a file. All we know about it
is that it expands to a negative integer value.

Correct, of course. But I just gave a possible implementation
on which the system's EOF marker, 0xA1A1, is exactly the same value
as the C compiler's EOF value. So James' scenario is not impossible,
merely implausible. Heck, for all I know it might be *common* on
some highly esoteric platforms! ;-)

-Arthur
 
D

Dan Pop

I still don't get it. Each and every byte combination is still valid
in a binary file, therefore it *cannot* be used as eof marker.

A trivial example would be an MS-DOS-like hybrid system on which the
byte 0xA1 would indicate the end of each file (text or binary). [Not
a typo; I specifically changed it from 0x1A so that EOF could be
#defined to be 0xA1A1 on this hypothetical 16-bit system.]
"But then how does a program represent the literal byte 0xA1 on
the disk?" you ask. Simple -- escape codes. For example, the EOF
code could be 0xA1A1, and the escape code for the literal byte 0xA1
could be 0xA100 (big-endian). This would satisfy all the requirements
of the C standard on file systems (i.e., precious few), while being
technically possible.

The semantics of fscanf and ftell on binary streams render this scheme
painful to implement: the byte offsets used by the program or reported
to the program are not the real byte offsets inside the file. But this
is only the tip of the iceberg. Imagine that I want to overwrite a
sequence of ordinary bytes by a sequence of 0xA1 bytes. Not only the
whole remaining of the file would have to be rewritten on the disk, but
the physical size of the file would increase, creating problems if there
is no more room on the disk (from the user's POV the file has the same
size, but it suddenly no longer fits on the disk). I'm afraid no one
would want to use your implementation ;-)

Dan
 
G

glen herrmannsfeldt

Arthur said:
[In case Dan hasn't already thought of this: fseek() is not required
to run in constant time. Binary files don't have to be random-access
in their "natural state"; it just happens that all existing systems
do it that way.]

The file system used on some IBM mainframes does not make fseek() easy.

For files with fixed length records, they are normally stored on disk in
fixed length blocks, except for the last block. If an existing file is
appended to, it can have a short block that is not at the end, making
random access difficult. Though if the library routines keep track of
the block sizes the first time through, it would be easy to fseek() to
any previously seen position.

For files with variable length records (V or VB), the only way would be
to keep track of the block lengths in the file.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,303
Messages
2,571,557
Members
48,359
Latest member
Raqib_121635
Top