On 03/25/2014 02:14 AM, kerravon wrote:
....
My question was intended to apply to all systems
in existence, because C's binary vs text fopen
mode applies to all systems in existence.
I then deliberately chose MSDOS as an example
because I:
1. Wanted a non-mainframe (which uses records
instead of line endings) environment. I am aware other
languages use records so work on the mainframe
environment, and I didn't want anyone to come
back and say "records work fine". I'm interested
in text files (CRLF etc), not records.
On those platforms, text files are stored using records, rather than
CRLF, so that's the wrong way to make the distinction you're trying to make.
2. Wanted a non-Unix environment so that no-one
would come back and say "on Unix there is no
difference between text and binary, so there is
nothing to discuss, end of story.
I hope that clears up the question. With the
answers so far I still don't understand how CRLF
are swallowed on input.
Calling any <stdio.h> function to read data from a text mode stream
requires that the function use whatever method is native for that
platform to identify when a line ends. It is then required to replace
whatever the native method is, with a single '\n' at the end of each
line. In practice, all other input routines are required to behave as if
they used fgetc() to get individual characters from the stream, so
conceptually, at least, you can think in terms of the actual logic that
implements this behavior as being part of fgetc(). On unix-like systems,
that's a trivial requirement, but in general it's more complicated,
which is why the standard says:
"Data read in from a text stream will necessarily compare equal to the
data that were earlier written out to that stream only if: the data
consist only of printing characters and the control characters
horizontal tab and new-line; no new-line character is immediately
preceded by space characters; and the last character is a new-line
character. Whether space characters that are written out immediately
before a new-line character appear when read in is
implementation-defined." (7.21.2p2)
For each of the provisions in that clause, there were known platforms
hosting an implementation of C at the time the standard was written,
where violating that provision could cause problems for the native
method for identifying line lengths, which could manifest themselves by
cause there to be a difference between the data written out and the data
read back in.
In the case of using CRLF to delimit lines, if the current character is
CR, fgetc() must get the next character (which might entail a wait for
more input, even if the stream is unbuffered). If the next character is
LF, move the current position in the stream to after the LF, and return
a single '\n'. Otherwise, fgetc() does whatever the implementor wants it
to do, since a CR not paired with LF violates the provisions I cited above.