K
Keith Thompson
Bartc said:Even considering only systems using CR, CR/LF, or LF, C's text mode can go
wrong: reading in a file created on a computer with a different newline
sequence, or writing a text file on this computer and reading it on one
using a different sequence.
So translate the file before trying to process it as text.
And then there are hybrid files which are mainly binary data, but also
contain embedded text that can include newline characters. Which means that
binary data that looks like CR/LF gets converted to LF (and the entire file
shrinks in size by one byte), or vice versa.
Such files are binary, and should be read in binary mode, which
always requires knowing the exact format of the file. If the format
specifies how the embedded text is represented, use that format.
If it doesn't, you're in trouble anyway.
I suspect people who advocate text mode tend to use machines with a single
character newline, and simply don't see the problems it creates when newline
is multiple characters.
I'm quite aware of the problems; I deal with Unix-format
vs. Windows-format text files all the time (including ASCII, Latin-1,
Windows-1252, UTF-8, and both flavors of UTF-16). Some tools I use
can deal with the differences. For others, I use conversion tools.