I'm curious as to what existing OS's do not accurately
report the lengths of binary files. Does anyone
have any examples?
A whole bunch of old mainframe and minicomputer OSes did this.
They allocated only whole sectors to files, and files were always
sized in whole-sector units. Text files used special encodings so
as to be able to hold "lines of text" that did not come out to an
even number of disk sectors. For instance, each line might be
prefixed by a byte-count indicating how much space the line occupied
within the text file and how many bytes of that were to be treated
as file text (with the extra bytes, if any, being ignored -- this
allows one to shorten lines without rewriting the file). (Each
line might also be numbered, so that lines could be lengthened
without rewriting the entire file, by marking the original line as
deleted -- zero valid bytes -- and placing the new text into whatever
existing space could be found, or at the end of the file.)
VMS's RMS took care of dealing with all the various file-formats
for you; you just told it to open a "text" file and it would map
out the magic. Open the same file as "binary", however, and all
the magic encoding shows up. It was not until VMS version 5 that
"stream-LF" text files appeared; before then, *all* text files had
magic encoding. (The encoding for a "stream-LF" file is basically
the same as that used on Unix systems, i.e., no encoding at all,
just a sequence of bytes with "lines" indicated by newline bytes.)
One interesting consequence of byte-count-encoded (and optionally
numbered) lines is that there is no such thing as a final line that
does not end with a newline. That is:
FILE *somefile = fopen("somefile.txt", "w");
... check for errors as needed ...
fprintf(somefile, "ab\nc\nd");
fclose(somefile);
is faced with a problem: should it write the three lines saying
"line 1: two bytes, ab; line 2: one byte, c; line 3: one byte, d"
-- which is "ab\nc\nd\n", which is not what you wrote -- or should
it write "line 1: two bytes, ab; line 2: one byte, c" -- which is
"ab\nc\n", which is *also* not what you wrote? The file format is
such that it is physically impossible to reproduce what you *did*
write. A file is a sequence of complete lines; there is no such
thing, in this file format, as an incomplete, not-newline-terminated,
line.
The C standard allows the runtime library to have either of the
two above behaviors, and different C libraries did different things.
If you want "ab\nc\nd\n" to appear in the file, you must write that
final newline yourself; only then is line 3, consisting of the
letter "d", sure to make its way into the file. (Assuming no disk
errors or other similar problems, of course.)