Some Newb Problem with "int", please help.

D

Dan Pop

True. That doesn't mean we should never try to do something reasonable
with it, though.

How can you do *anything* reasonable once undefined behaviour has been
invoked (by opening the file in the wrong mode)?
(I'm not saying one should /always/ catch weird
newlines, but for programs that need maximum robustness and
user-friendliness I do think it's one of the first things to do.)

As I already said in the text you've snipped, it is the user's job to
properly import the text file on the system he wants to process it. This
way, there is no undefined behaviour and the program need not try to do a
job it cannot be reliably done.

Last time I checked, file transfer protocols had a text mode that did
the right thing when dealing with text files, including character set
conversions. Furthermore, utilities for converting one text file format
to another are widely available (even a green newbie should be able to
convert between Unix, Windows and MacOS formats). Such a program has
the advantage that it knows what to expect as input format, it doesn't
have to resort to wild guessing.

Dan
 
R

Richard Bos

Arthur J. O'Dwyer said:
Nitpick: No, it doesn't. The Standard only requires that the
implementation translate /newlines/ to '\n' before the user code sees
them. If 0D.0A isn't a "newline" on the DS9000, then it doesn't have
to translate it.

If the DS9000 uses MS-DOS or Windows, 0D0A _is_ a newline there. If it
doesn't, then it has no bearing on "decent DOS/Windows implementations".
Replace "DS9000" by "brain-dead Windows implementation,
perhaps a naive GCC port," and you get exactly what I said.

Such a naive GCC port, working on native files in an MS-DOS or Windows
environment, would IMO not be conforming.
Certainly. But can you ensure that all your potential users have
such a program, or can write one; and that they all know how and when
to use it; and that they always remember to use it? </rhetorical>

No; but then it's _their_ problem, not yours :)

Richard
 
A

Arthur J. O'Dwyer

If the DS9000 uses MS-DOS or Windows, 0D0A _is_ a newline there. If it
doesn't, then it has no bearing on "decent DOS/Windows implementations".

IMO the sequence 0D.0A is /only/ a "newline" if the implementation
defines it to be. (In other words, it's a QoI issue, not a conformance
issue.) Of course, I haven't got any C&V to back that up.
Such a naive GCC port, working on native files in an MS-DOS or Windows
environment, would IMO not be conforming.

But processing the exact same files on a Linux system, it would be?
I don't think that's a very consistent interpretation, but I guess YMMV.
:)

No; but then it's _their_ problem, not yours :)

Well, isn't part of user-friendliness to reduce the number of problems
your users are having? ;-)

-Arthur
 
C

Chris Torek

[On \r\n as found on DOS-based systems, vs \r only as found on old Mac
systems, vs \n only as found on Unix-based systems including new Mac
systems...]

This is why text files must be properly *imported* on the system that
has to process them. Once this is done, the code can safely assume that
each line of text is \n terminated.

I agree, and I think at least one person provided another example
(VMS records) where no attempt to match any {\r, \n} clustering-sequence
would do the trick (because some VMS records are counted-byte-strings,
where you get an optional line number, a 16-bit byte-count, and then
the bytes making up the line, with no \r nor \n anywhere in sight).

An even more extreme example, which I hope will show that "import
text file" really *is* a Special Operation that *must* be done at
the point the text file is imported, occurs when transferring files
from some IBM mainframes. These machines use EBCDIC "code pages"
to determine which character code represents which character; but
all "text" code pages share a number of useful "text characters",
such as using 0xf0 through 0xf9 to store the characters '0' through
'9' respectively.

If an EBCDIC-coded file is not "imported" when moved to an ASCII-based
system, no amount of newline-fiddling will ever make it look like
anything other than gibberish. A VMS "record-oriented" file will
at least seem to have each line as a string of ASCII within it.
 
A

Alex Monjushko

True. That doesn't mean we should never try to do something reasonable
with it, though. (I'm not saying one should /always/ catch weird
newlines, but for programs that need maximum robustness and
user-friendliness I do think it's one of the first things to do.)

How can you do something reasonable after invoking undefined behavior?

If your file is not formatted to be a text file on the system you are on
then, by definition, it is not a text file and you should not be treating
it as such. On the other hand, if you are absolutely certain of the exact
format variations of the files that you are going to be working with, then
you can attempt to detect and correctly handle these formats in binary mode.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top