Don Knuth and the C language

J

jacob navia

Le 03/05/2014 21:47, Keith Thompson a écrit :
Even today, a Windows C program reading input in text mode treats
Control-Z (character 26) as an end-of-file indicator. (I just
tried it on Windows 7 with MSVC 2010 Express.)

Of course modern windows/MSDOS allows you to emit an EOF when using the
keyboard.

Unix uses ctrl-D for this purpose. When typing at the keyboard under
Unix you hit ctrl-d to generate an EOF. The Ctrl-z is the same in Windows.
 
K

Keith Thompson

Keith Thompson said:
Even today, a Windows C program reading input in text mode treats
Control-Z (character 26) as an end-of-file indicator. (I just
tried it on Windows 7 with MSVC 2010 Express.)

To clarify, I'm talking about reading from a text file with a Control-Z
character in the middle of it. A simple C program reading from such a
file via stdin does not read anything after the Control-Z.
 
B

Ben Bacarisse

jacob navia said:
Le 03/05/2014 21:47, Keith Thompson a écrit :

Of course modern windows/MSDOS allows you to emit an EOF when using
the keyboard.

It would be better if it allowed you to close the input. Emitting and
EOF means there must be something *in* the stream that marks the end,
and that's the very problem being discussed.
Unix uses ctrl-D for this purpose. When typing at the keyboard under
Unix you hit ctrl-d to generate an EOF. The Ctrl-z is the same in
Windows.

No, they are very different. A Unix terminal often has some character
(you can set it) that closes the stream. You can nonimate which
character to use, you can turn the mechanism off if you want, and you
can use a quote character to send the nominated character to the stream.
Whatever the nominated character is, it never gets into to the stream.
The C IO library never sees it (in fact, Unix read operations won't see
it either). In particular, there is no special meaning of Ctrl-D in and
input stream or file.
 
M

Malcolm McLean

No, they are very different. A Unix terminal often has some character
(you can set it) that closes the stream. You can nonimate which
character to use, you can turn the mechanism off if you want, and you
can use a quote character to send the nominated character to the stream.
Whatever the nominated character is, it never gets into to the stream.
The C IO library never sees it (in fact, Unix read operations won't see
it either). In particular, there is no special meaning of Ctrl-D in and
input stream or file.
Control-D, or 4, is EOT, "end of transmission". Control-C is "end of text".
 
L

Lew Pitcher

Control-D, or 4, is EOT, "end of transmission". Control-C is "end of
text".

In ASCII and derivative charactersets, to be sure.
But, irrelevant to Ben's point.

a) In Unix, the keystroke that signals to the I/O system that the terminal
input device is at End-of-file is configurable; by convention, Unix users
set this value to ^D (ASCII EOT), but the input system does not restrict
the value to just ^D. Just as easily, the end user (or his programmatic
proxy) can set this "end-of-file" character to ^Z or ^X or even ^N.

b) In Unix, End-of-file is a *condition*, not a datum. Terminal devices are
treated special by the underlying OS, in that the OS looks for a specific
input datum (such as ^D) in order to trigger the condition. The input datum
is discarded, and the OS reports "End-Of-File" to any programmatic input
requests. This differs from what CP/M (and versions of MSDOS) did: they
actually imbedded a character datum in the data stream, and left it to the
input program to interpret that datum. ^Z to a CP/M text program was a byte
of 0x1A in the reading program's input stream, and it was up to the reading
program to interpret that value as "Oh, I've hit the end of valid data.
There may be more to read, but I really shouldn't."
 
B

Ben Bacarisse

Malcolm McLean said:
Control-D, or 4, is EOT, "end of transmission". Control-C is "end of
text".

I think you missed the point. Jacob might think the Ctrl-D (or some
other control character) has a particular meaning for Unix but it does
not. What ASCII chooses to call it is neither here nor there.
 
K

Keith Thompson

ralph said:
Not quite accurate. (I say "not quite" because one needs to be
specific on what *stream* they are talking about. <g>)

Neither UNIX, or MS-DOS, or Windows (and other modern operating
systems), have or ever had "something *in* the stream to signal an
end-of-file. The exception is CP/M, which did provide a Ctrl-Z to
signal end-of-file.

However, *shells* often do have and honor such characters. However,
they are just as likely to ignore them as well. In some cases (as
noted) depending on how they are configured.

I don't think the behavior of a shell is relevant.

On Unix, if a program is reading from a keyboard, typing Ctrl-D usually
triggers an end-of-file condition, regardless of whether the program was
invoked from a shell or not. The shell is just another program.

And, as already noted, an equivalent character in a disk file is just
another cahracter; the special treatment of Ctrl-D applys only when
reading from a terminal device (a "tty" in Unix parlance).

[...]
Except as noted - the MS CRT does "see" a Ctrl-Z in text mode.
However, there are multiple I/O routines to chose from in a Windows
environment so you often see different behavior in different utilities
in the *shell*, For example:
"type" will stop at a Ctrl-Z
"cat" will not, but will print a placeholder
and so on.

"cat" is not, as far as I know, a standard Windows program. I have it
on my system, but only as part of add-on POSIX support packages (Cygwin
and GOW), and it follows POSIX semantics.

A C program on Windows, reading from a disk file in text mode, will
trigger an end-of-file condition when it encounters a Ctrl-Z character.
I don't believe this has anything to do with the shell.

A sample program to test this behavior:

#include <stdio.h>
#include <assert.h>

int main(void) {
FILE *f;
int result;
const char *const filename = "tmp.txt";
int saw_A = 0;
int saw_Ctrl_Z = 0;
int saw_Z = 0;
int c;

f = fopen(filename, "w");
assert(f != NULL);
fprintf(f, "A\n");
fprintf(f, "%c\n", 26); /* Ctrl-Z */
fprintf(f, "Z\n");
result = fclose(f);
assert(result == 0);

f = fopen(filename, "r");
assert(f != NULL);
while ((c = fgetc(f)) != EOF) {
switch (c) {
case 'A':
saw_A = 1;
break;
case 'Z':
saw_Z = 1;
break;
case 26:
saw_Ctrl_Z = 1;
break;
}
}
result = fclose(f);
assert(result == 0);

remove(filename);

printf("saw_A = %d\n", saw_A);
printf("saw_Z = %d\n", saw_Z);
printf("saw_Ctrl_Z = %d\n", saw_Ctrl_Z);
return 0;
}

On Windows, compiled with MSVC 2010 Express, the output is:

saw_A = 1
saw_Z = 0
saw_Ctrl_Z = 0
 
G

glen herrmannsfeldt

I don't think the behavior of a shell is relevant.
On Unix, if a program is reading from a keyboard, typing Ctrl-D usually
triggers an end-of-file condition, regardless of whether the program was
invoked from a shell or not. The shell is just another program.

It is one that you don't think about so often, as long as it is
working, but, yes, I believe it is the tty (real or virtual)
device driver that does it.

The stty or tset command change ioctl bits on the appropriate device.

Some other tty device characteristics also seem like they would belong
to the shell, such as my old favorite tostop, related to output from
background jobs.

-- glen
 
J

jacob navia

Le 04/05/2014 02:11, Ben Bacarisse a écrit :
I think you missed the point. Jacob might think the Ctrl-D (or some
other control character) has a particular meaning for Unix but it does
not. What ASCII chooses to call it is neither here nor there.

Ctrl-D (by default terminal settings) means:

Discard the ctrl D and set the input file as in EOF condition.

The same as in windows when using the keyboard.

The difference is that under UNiX that is configurable and not under
windows.

The Ctrl-Z is an EOF character when opening the file in TEXT mode.

I build a text file with embedded control-z characters in my mac and
copied it to my windows machine. Some editors would not read beyond the
ctrl-z because they opened the file in text mode.

Wedit, the editor of lcc-win will read the whle file ignoring the ctrl-z
directive and interpreting it as the character 26. Why?

Because Wedit opens the file in BINARY mode.
 
B

Ben Bacarisse

jacob navia said:
Le 04/05/2014 02:11, Ben Bacarisse a écrit :

Ctrl-D (by default terminal settings) means:

Discard the ctrl D and set the input file as in EOF condition.

Yes, no one disputes that -- it can mean something to the ttyt driver.
It has no meaning in input streams or in files.
The same as in windows when using the keyboard.

No. It's different in a very significant way.

I am sure you know exactly how it is different but you want to suggest
otherwise for some bizarre reason.
The difference is that under UNiX that is configurable and not under
windows.

And that it is not part of the input, so it can not have any meaning in
files (or any inpuput stream).
The Ctrl-Z is an EOF character when opening the file in TEXT mode.

Yes, on Windows. It's an actual character embedded in the input that
has an effect on the IO layer of programs reading the data as text.
Quite unlike Ctrl-D in Unix.
I build a text file with embedded control-z characters in my mac and
copied it to my windows machine. Some editors would not read beyond
the ctrl-z because they opened the file in text mode.

Wedit, the editor of lcc-win will read the whle file ignoring the
ctrl-z
directive and interpreting it as the character 26. Why?

Because Wedit opens the file in BINARY mode.

Yes, of course. It would be staggering if it had the same meaning in
binary mode -- Windows would be unusable if that were the case.
 
B

Ben Bacarisse

ralph said:
Not quite accurate. (I say "not quite" because one needs to be
specific on what *stream* they are talking about. <g>)

Neither UNIX, or MS-DOS, or Windows (and other modern operating
systems), have or ever had "something *in* the stream to signal an
end-of-file. The exception is CP/M, which did provide a Ctrl-Z to
signal end-of-file.

However, *shells* often do have and honor such characters. However,
they are just as likely to ignore them as well. In some cases (as
noted) depending on how they are configured.

No, the shell in not relevant here, neither in Unix not in Windows.
In Microsoft's case the standard CRT has recognition for the Ctrl-Z by
some functions.

Yes. This is the key issue. An actual ASCII character, embedded in an
input stream, is taken to mark the end of that stream by some IO
functions. Obviously not in all cases -- Windows can open and read
arbitrary data or it would be quite useless -- but it has stuck with
honouring this inherited usage for text streams.

You say it is the C run-time library that is responsible, and you may
well be right. I certainly thought that was the case, but I don't know
the details well enough to say so with any real confidence. And of
course the exact consequences depend on what things use the C run-time.
EOF() does not reflect an encounter with any particular byte/char/int
or whatever. It is purely a return from some functions that check for
an end-of-file condition and used to signal that such a condition has
occurred.
Yes.


There you got it. <g>

I'm not sure why you took my previous remarks to mean that I did not get
it. Did it sound like I was suggesting this it was wired into Windows
at some deeper level the C run-time? If so, you are right. It may have
been once, but I think modern Windows native file IO routines ignore
Ctrl-Z (I am not expert on Windows).
Except as noted - the MS CRT does "see" a Ctrl-Z in text mode.
However, there are multiple I/O routines to chose from in a Windows
environment

Sure. That should have been made clear. It's the behaviour of some IO
libraries (the C one is the topical one here) under Windows that is the
issue.
so you often see different behavior in different utilities
in the *shell*, For example:
"type" will stop at a Ctrl-Z
"cat" will not, but will print a placeholder
and so on.

I can't see how or why the shell has anything to do with it.
 
R

Richard Tobin

jacob navia said:
[In Unix] Ctrl-D (by default terminal settings) means:

Discard the ctrl D and set the input file as in EOF condition.

Only if there are no characters waiting to be passed to the process.
Otherwise it is discarded and any waiting characters are sent. Typing
"abc^D" does not result in an EOF condition.

What's more, the "EOF condition" only exists at the stdio level.
The underlying read() system call merely returns 0, and further
reads may return more data. And some stdio implementations (in
particular Linux) do not correctly implement the EOF condition.

-- Richard
 
B

Ben Bacarisse

jacob navia said:
[In Unix] Ctrl-D (by default terminal settings) means:

Discard the ctrl D and set the input file as in EOF condition.

Only if there are no characters waiting to be passed to the process.
Otherwise it is discarded and any waiting characters are sent. Typing
"abc^D" does not result in an EOF condition.

No but, often, it flushes the (pseudo-) tty input so the program gets
it, and a second one the causes the input to be closed. The rule is, I
think, that Ctrl-D (or whatever) closes the input only if there is no
pending input. Making it "push" any such input is an obvious extension
(but I said "often" because I don't know how widespread this behaviour
is).

<snip>
 
K

Keith Thompson

ralph said:
There actually is no such thing as an "EOF charater". Although we all
tend to use that description. Most of the time this thinking does no
harm - but the are a few programming Gotchas if one assumes it is.

Agreed, since EOF is the name of a macro defined in <stdio.h>, whose
value is not a character value.

On the other hand, "EOF" can also be used simply as an abbreviation for
the phrase "End Of File", so it's common (and not entirely incorrect) to
refer to Ctrl-D or Ctrl-Z as an "EOF character". Such usage can be
particularly confusing in the context of C, as in this newsgroup.

The Linux documentation for "stty" wisely refers to "eof", not "EOF".
EOF is never a character it is a special return value which expands to
an integer constant expression, with type int and a negative value,
that is returned by certain functions to indicate end-of-file was
reached by the last "read".

Microsoft declares it ...
#define EOF (-1)
But '-1' isn't required - it could be declared as anything - as long
as it is an int and negative. A "character" does not meet this
standard.
[I'll let Keith post chapter and verse. <g>]

Ok ... N1570 7.21.1p3:

The macros are
[...]
EOF
which expands to an integer constant expression, with type int and a
negative value, that is returned by several functions to indicate
end-of-file, that is, no more input from a stream;
[...]

I've never seen an implementation where EOF has a value other than -1,
and there are good reasons to use that specific value. For example, the
is*() and to*() functions in <ctype.h> accept either a value in the
range 0..UCHAR_MAX *or* the value of EOF; giving EOF a value adjacent to
that range makes the implementation slightly more straightforward.
That's because certain functions in the Microsoft CRT, in text mode,
do treat the Ctrl-Z character as signalling an end-of-file condition.

It is perhaps useful to note that one can/will get an EOF value
without the presence of a Ctrl-Z even in text mode.

Right. An end-of-file condition can be triggered either when a
particular character appears in the input (Ctrl-D or whatever it's
configured to when reading from a tty on Unix-like systems, Ctrl-Z from
the keyboard or in a file on Windows), *or* when there's no more input
to be read.
And that is the common work-around - if embedded Ctrl-Z's may be a
problem - use binary mode. No "re-configuration" is needed.

If you don't want Ctrl-Z's to be treated as an end-of-file marker, then
arguably you're not dealing with text files, so of course binary mode is
appropriate.
 
K

Keith Thompson

What's more, the "EOF condition" only exists at the stdio level.
The underlying read() system call merely returns 0, and further
reads may return more data. And some stdio implementations (in
particular Linux) do not correctly implement the EOF condition.

How is the implementation incorrect?
 
K

Keith Thompson

ralph said:
[...]
I don't think the behavior of a shell is relevant.

I can appreciate that. I was using the term "shell" very generically.
[...]

Well to be picky, it isn't "A C program on Windows", meaning 'any C',
but rather any of those compiled using the Microsoft stdio. Back in
the MS-DOS days with a zillion available compilers and self-written,
self-compiled 'standard' libraries - exceptions were often stumbled
across.

It could be a C program running on Windows, compiled with *any*
C compiler but using Microsoft's C runtime library.

In the Unix-like world (which I'm more familiar with), different
compilers typically use the C runtime library provided by the OS.
In the Windows world, as I understand it, the C runtime library
isn't as closely tied to the OS; some compilers might generate code
that uses the Microsoft CRT, others might provide their own.

[...]
All I meant to point out, and didn't well, is that recognition of
Ctrl-Z happens, and subsequent management occurs, somewhere between
the operating system's low-level I/O services and whatever application
you are using to access those services. A "shell" seemed a convenient
synonym? (Or apparently not? <g>)

Apparently not. On Unix-like systems, the word "shell" has a
very specific meaning, and it's not what you were referring to.
It's probably similar on Windows. As you wrote elsethread,
"run-time thingy" would have been more accurate.
<snipped>

And does so. So what's the point, other than confirm I what I said -
Ctrl-Z handling is facilitated by the Microsoft CRT?

The sample program wasn't necessarily a direct response to what you
wrote -- and not everything in a followup has to be a disagreement.

The point of the program was to demonstrate the behavior (that Ctrl-Z
in a file triggers an end-of-file condition) as clearly as possible,
since there has been some confusion between the response to Ctrl-Z
(more precisely '\x1a`) character in a file and the behavior of
Ctrl-Z in keyboard input.
 
J

jacob navia

Le 04/05/2014 03:09, Keith Thompson a écrit :
#include <stdio.h>
#include <assert.h>

int main(void) {
FILE *f;
int result;
const char *const filename = "tmp.txt";
int saw_A = 0;
int saw_Ctrl_Z = 0;
int saw_Z = 0;
int c;

f = fopen(filename, "w");
assert(f != NULL);
fprintf(f, "A\n");
fprintf(f, "%c\n", 26); /* Ctrl-Z */
fprintf(f, "Z\n");
result = fclose(f);
assert(result == 0);

f = fopen(filename, "r");
assert(f != NULL);
while ((c = fgetc(f)) != EOF) {
switch (c) {
case 'A':
saw_A = 1;
break;
case 'Z':
saw_Z = 1;
break;
case 26:
saw_Ctrl_Z = 1;
break;
}
}
result = fclose(f);
assert(result == 0);

remove(filename);

printf("saw_A = %d\n", saw_A);
printf("saw_Z = %d\n", saw_Z);
printf("saw_Ctrl_Z = %d\n", saw_Ctrl_Z);
return 0;
}

Hi kiki!

lcc64 ctrlz.c // Compile it with a good compiler
lcclnk64 ctrlz.obj // Link it with a good linker
ctrlz.exe // Execute
saw_A = 1
saw_Z = 1
saw_Ctrl_Z = 1

Conclusion:
You are using the wrong compiler kiki

Yours sincerely

jacob :)


P.S. I did not carry on the CP/M tradition. It was a sad decision but I
fear people that use lcc-win do not want CP/M backwards compatibility.

Of course all my efforts to build a reasonable C compiler will be
ignored. Off topic, shrewd businessman trying to sell his wares, etc.

Go ahead kiki
 
K

Kaz Kylheku

Ctrl-D (by default terminal settings) means:

Discard the ctrl D and set the input file as in EOF condition.

The same as in windows when using the keyboard.

The difference is that under UNiX that is configurable and not under
windows.

The TTY eof character, usually 4, is simply a command in
"canonical input mode" which means: "stop waiting for
characters and return immediately". The command itself
is consumed.

The eof effect comes as a consequence of this command
being issued at the start of an input line. No characters
have been accumulated, and so the read returns zero.
A zero length read resembles the end of a file.
 
L

Lew Pitcher

[snip]
"cat" is not, as far as I know, a standard Windows program. I have it
on my system, but only as part of add-on POSIX support packages (Cygwin
and GOW), and it follows POSIX semantics.

Interesting. In that case I'm no longer surprised it deliberately
treats the Ctrl-Z as an "unknown character" ('?').

In ASCII (and derivatives), 0x1a (aka ^Z) has been given the mnemonic "SUB",
with the explanation: "SUB is used in the place of a character that has
been found to be invalid or in error. SUB is intended to be introduced by
automatic means."

"Unknown character" would fit the intent of the SUB (0x1A ^Z) character.
 
B

BartC

jacob navia said:
Le 04/05/2014 03:09, Keith Thompson a écrit :
lcc64 ctrlz.c // Compile it with a good compiler
lcclnk64 ctrlz.obj // Link it with a good linker
ctrlz.exe // Execute
saw_A = 1
saw_Z = 1
saw_Ctrl_Z = 1

Conclusion:
You are using the wrong compiler kiki

Apart from MSVC which apparently gives 1,0,0, that is also the result with
gcc, PellesC, DMC, Clang and g++, all running on Windows.

(gcc under Linux gave 1,1,1.)

So lcc-win is the odd-one-out, in text mode.

(In binary mode, which I generally use, that gives 1,1,1 always.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,120
Messages
2,570,710
Members
47,282
Latest member
citowad9

Latest Threads

Top