K&R2 2.3 : /n vs /r

M

Michael Mair

john said:
I always like to think that the '/' character in web URL's and the like are
a nice reminder that the internet is a UNIX invention.

Bill does have a point in the standards arena though; it's not what is
technically best nor what is most technically appropriate nor indeed what
has set the precedent that matters; it is politics and money that matters
(with the emphasis on the latter). This has been proven time and again with
the classic example being VHS versus Betamax.

*g* Should I bring up Video 2000 which certainly was better than VHS
and had some advantages over Betamax, too...
Nah, better not.

Cheers,
Michael
 
C

CBFalconer

Joona said:
Seeing as it's now become customary to use \ in all contents where a
divisor sign or an abbreviation for "or" is used, it's only logical
that / is used as an escape sequence marker in program source code.
Bill doesn't only control our computers, he controls our minds as
well.

This is ridiculous. \n has two distinct treatments. One is during
output to a terminal, when its function is to select the next line,
i.e. roll the figurative platen up one line. Nothing more. At the
same time \r returns the platen to the left with whatever
resounding crash is required. This may be very quiet on a display
with no hard copy. The \r was traditionally used to signify the
end of a single line of input, and the magic smoke in the terminal
usually converted it into a \n\r sequence.

However Unix, in its infinite wisdom, decided that all text lines
should be terminated by a single character when in internal
storage. They chose \n for this purpose, although there were more
logical characters available even then. So in this usage it is
used to mark end-of-line, sometimes named EOL. This is just a
convention, principally used in marking line ends on disk files.
Various systems use different conventions (including no or multiple
terminator characters), but they all have to convert to this one
simple one by the time the data hits the fan (which is the churning
CPU).

So make sure you separate the internal usage from the external
usage.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan"
"I listened to Toronto come back from 3:0 in '42, I plan to
watch Boston come back from 3:0 in 04"
 
J

Jack Klein

No, you should not look on a spezific implementation for that:
\n newline. On one implementation it may be 0x0a,
on another it may be 0x0d, on a third it may be 0x0d0a
on another it may be something else
\r return. on some implementations it is 0x0a
on others it may be something else
\t tab. Often it is 0x09 but other implementations may
use something else

\b bell

No, '\b' is back space. '\a' is 'alert', frequently some sort of
audible ring or beep.
 
N

Neil Kurzman

Sometimes.

\n goes to the next line.
\r moves the cursor to the left.

HOWEVER....

Many terminals will take either and do both (CR LF) look in hypreterm it is an
option.

Others require both and in the write order \n\r or \r\n.

So in the end it depends on what you are talking to.
 
C

CBFalconer

john said:
I always like to think that the '/' character in web URL's and the like are
a nice reminder that the internet is a UNIX invention.

Bill does have a point in the standards arena though; it's not what is
technically best nor what is most technically appropriate nor indeed what
has set the precedent that matters; it is politics and money that matters
(with the emphasis on the latter). This has been proven time and again with
the classic example being VHS versus Betamax.

I have a Sony Betamax recorder for sale.

--
"I support the Red Sox and any team that beats the Yankees"
"Any baby snookums can be a Yankee fan, it takes real moral
fiber to be a Red Sox fan"
"I listened to Toronto come back from 3:0 in '42, I plan to
watch Boston come back from 3:0 in 04"
 
J

Joona I Palaste

This is ridiculous. \n has two distinct treatments. One is during
output to a terminal, when its function is to select the next line,
i.e. roll the figurative platen up one line. Nothing more. At the
same time \r returns the platen to the left with whatever
resounding crash is required. This may be very quiet on a display
with no hard copy. The \r was traditionally used to signify the
end of a single line of input, and the magic smoke in the terminal
usually converted it into a \n\r sequence.
However Unix, in its infinite wisdom, decided that all text lines
should be terminated by a single character when in internal
storage. They chose \n for this purpose, although there were more
logical characters available even then. So in this usage it is
used to mark end-of-line, sometimes named EOL. This is just a
convention, principally used in marking line ends on disk files.
Various systems use different conventions (including no or multiple
terminator characters), but they all have to convert to this one
simple one by the time the data hits the fan (which is the churning
CPU).
So make sure you separate the internal usage from the external
usage.

Could you please read my other reply in this thread?
 
M

Mabden

john blackburn said:
'\r' is simply a carriage return whereas '\n' is a carriage return followed
by line feed. Note that a carriage return is simply that, it moves the
cursor to the beginning of the current line and does not move on to the
next line.

It is here you have to be careful as UNIX text files are of a different
format from Windows text files and need conversion when moving from one o/s
to the other.

I wrote a conversion program for Unix files, as I use Windows, mainly.
This is compiled using MS VC++ compiler (but it's C! Honest).
==================
#include <stdio.h>
int main (void)
{
char ch;

// read in chars until EOF
while ( (ch=getc(stdin)) != EOF )
putc (ch, stdout);

return (0);
}

==================
Note that I read in a character and write it back out. The compiler
emits a crlf when it reads in lf! Magic!
 
K

Keith Thompson

Mabden said:
I wrote a conversion program for Unix files, as I use Windows, mainly.
This is compiled using MS VC++ compiler (but it's C! Honest).
==================
#include <stdio.h>
int main (void)
{
char ch;

// read in chars until EOF
while ( (ch=getc(stdin)) != EOF )
putc (ch, stdout);

return (0);
}

==================
Note that I read in a character and write it back out. The compiler
emits a crlf when it reads in lf! Magic!

getc returns an int, not a char; your code can't distinguish between
EOF and a valid character that maps to the same value.

Apart from that, your code probably works (on Windows) because stdio
recognizes a lone ASCII LF character on input as an end-of-line
marker, but writes a CR-LF pair on output. I'm not sure you can
depend on this behavior.

A more reliable way to convert between Windows and Unix formats is to
read and write the data. (I'm not sure how a lone LF in a Windows
text file should be interpreted.)
 
M

Mabden

Keith Thompson said:
getc returns an int, not a char; your code can't distinguish between
EOF and a valid character that maps to the same value.

Well, that would be a -1 on my system, so it would require a 255 in the
file. The text files I convert "would never have that character". ;-)

I have corrected the program, per your observation. Thank you.
Apart from that, your code probably works (on Windows) because stdio
recognizes a lone ASCII LF character on input as an end-of-line
marker, but writes a CR-LF pair on output. I'm not sure you can
depend on this behavior.

Depend on it how? The program is working fine (better now). If you mean
that it may change in a future compiler, I have comments in the code
that I excluded to explain what is going on.
A more reliable way to convert between Windows and Unix formats is to
read and write the data. (I'm not sure how a lone LF in a Windows
text file should be interpreted.)

You mean in a binary format? I'm not sure what you are saying here, as I
do read and write the data. My original program read in a lf and output
a crlf. This resulted in a crcrlf. That's when I realized it was
automagically adding the cr "for me".
 
K

Keith Thompson

Mabden said:
Depend on it how? The program is working fine (better now). If you mean
that it may change in a future compiler, I have comments in the code
that I excluded to explain what is going on.

The Windows text file format uses a CR-LF pair to mark the end of a
line. If this is based on some written standard, I don't know what
that standard says about a CR or LF character that's not part of a
CR-LF pair. Apparently the stdio implementation you're using treats a
lone LF as an end-of-line marker, and you're depending on this
behavior to "magically" read Unix-format text files that you've
presumably copied without conversion to a Windows system.

If there's a guarantee somewhere that Windows treats a lone LF
character as an end-of-line marker, that's ok. <OT>The fact that
Notepad doesn't do this leads me to suspect that there is no such
guarantee. said:
You mean in a binary format? I'm not sure what you are saying here, as I
do read and write the data. My original program read in a lf and output
a crlf. This resulted in a crcrlf. That's when I realized it was
automagically adding the cr "for me".

Sorry, I left out some words and failed to proofread. I meant "to
read and write the data in binary mode". I believe it's safer to do
this and control the interpretation of the input data yourself, than
to depend on behavior that isn't guaranteed and could change
unpredictably.

Here's what I think is happening in your program. You read a
character at a time from a file opened in text mode. If the next two
bytes are a CR-LF pair (the Windows standard end-of-line marker),
getc() gives you '\n' (C's internal single-character representation of
an end-of-line marker). If the next byte is an LF character, getc()
gives you an LF character, which happens to be the same as C's '\n'
character. The result: your program happens to accept either CR-LF or
LF as an end-of-line marker. Given that getc() happens to work this
way, I *think* you can conclude that the rest of stdio will behave
consistently, for example that fgets() will treat a lone LF character
as an end-of-line marker, but I wouldn't bet large sums of money on
this conclusion.

This happens to work because of the relationship between the Windows
and Unix conventions for marking end-of-line and the value chosen by C
implementations for '\n' (it happens to be the same on both; it
needn't have been). Note that this doesn't work in the opposite
direction. If you copy a Windows text file to a Unix system without
conversion, then read it as a text file, you'll get an extra CR ('\r')
character at the end of each line.

Your program works. With enough research, you might even be able to
prove that it's guaranteed to work. The problem, in my view, is that
the chain of reasoning is too long for comfort.

The best way to convert Unix-format text files to Windows-format text
files is to treat them as binary files (or at least to treat the
"foreign" format as binary) and to do the conversion based on
knowledge of the actual format.
 
F

Flash Gordon

Depend on it how? The program is working fine (better now). If you
mean that it may change in a future compiler, I have comments in the
code that I excluded to explain what is going on.


You mean in a binary format? I'm not sure what you are saying here, as
I do read and write the data. My original program read in a lf and
output a crlf. This resulted in a crcrlf. That's when I realized it
was automagically adding the cr "for me".

I think that the suggestion is to open input and output files as binary
files, then the only translations done will be those done by you. That
way the program will do what you've told it to regardless of what OS you
run it on.

The conversion of \n to CRLF that you see is because stdout is a text
stream and the implementation knows that text files on Windows use CRLF
for line terminations. However, since an LF on it's own is not the
standard line termination for Windows I don't think the implementation
has to report it to you as a \n it could, for instance, treat it as an
error since it is invalid for a text stream.
 
M

Mabden

Keith Thompson said:
Your program works. With enough research, you might even be able to
prove that it's guaranteed to work. The problem, in my view, is that
the chain of reasoning is too long for comfort.

The best way to convert Unix-format text files to Windows-format text
files is to treat them as binary files (or at least to treat the
"foreign" format as binary) and to do the conversion based on
knowledge of the actual format.

I agree with you, that would be better.

For now, however, I think I'll keep my 3 line program and worry about
writing a better conversion routine when Windows command prompts stop
running .COM files properly. It's small, it works, and I only use it
once in a blue moon. The day it fails I'll spend the time to make a
better one. The criticality of performance is essentially zero.

Thanks for all the input.
 
R

Richard Bos

Mabden said:
Well, that would be a -1 on my system, so it would require a 255 in the
file. The text files I convert "would never have that character". ;-)

So you say. Perhaps you'll one day have to deal with a text file which
uses the character 'ÿ', and you'll scratch your head wondering what on
earth could have caused your program to terminate prematurely...
I have corrected the program, per your observation. Thank you.

It is, after all, a very simple correction. It's doubly useful, btw: if
you ever find yourself using <ctype.h>, passing a negative character
value, _any_ negative character value except the one that happens to be
equal to EOF, causes undefined behaviour (and that one exception doesn't
behave as expected). Passing the original int you got from getc(), which
has the value of either an unsigned char or EOF, works perfectly.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top