Joe said:
Eric said:
Joe Wright wrote:
[...]
The descriptions of the ctype functions all take int values. I know
that char is converted to int in this case and that if char is
signed and negative, the result is probably a negative int.
... but they don't take "just any" int values; the
argument must be in a restricted range. 7.4, paragraph 1
(I don't have N869 so this is from ISO/IEC 9899:1999,
which is very nearly as good):
"In all cases the argument is an int, the value of
which shall be representable as an unsigned char or
shall equal the value of the macro EOF. If the
argument has any other value, the behavior is
undefined."
So what? Clearly -50 is not space or form feed, tab, etc. and the
expression (isspace(-50) == 0) is true.
isspace(-50) produces undefined behavior unless EOF==-50.
What is EOF for in this context?
EOF is a macro defined in <stdio.h>. Its expansion is
a negative integer constant (usually -1, although the Standard
does not require this). Various I/O functions return EOF to
indicate that something unusual (e.g., end-of-file or I/O
error) has happened.
The <ctype.h> functions accept EOF as an argument value
in addition to all the (non-negative) values of legitimate
characters, presumably because somebody once thought it would
be convenient to do things like
int ch;
/* skip leading spaces */
while (isspace(ch = getchar()))
;
if (ch == EOF)
/* end-of-file or error */ ;
else
/* found a non-space character */ ;
If isspace() didn't accept EOF, you'd need to write
int ch;
/* skip leading spaces */
while ((ch = getchar()) != EOF) {
if (! isspace(ch))
break;
}
if (ch == EOF)
/* end-of-file or error */ ;
else
/* found a non-space character */ ;
Observe that this loop makes two tests per character instead
of the first form's single test. The original inventors of
<ctype.h> were, I guess, offended by the inefficiency of a
two-test loop and saw a way to define the functions so as to
eliminate half the testing. In hindsight, it looks like this
worship of The Little Tin God may have been misplaced -- but
the ANSI committee was asked to codify existing practice, and
they took the bitter with the sweet.
I'm not overly afraid of 'Undefined Behavior'.
You need not be "overly afraid," just "afraid enough."
isspace(c) is required to return 0 if c (now converted to int) is not
among the 'space' characters.
... and if c is among the permitted values.
Clearly EOF is not among the 'space' characters and so 0 must be the
result. Right?
Right.
No, you don't. EOF is a non-event (must return 0) and (c && 0xff) will
give you the index into a 256-byte array of answers to the questions.
I don't understand what you mean by "a non-event." You
are right that isspace(EOF) must return zero, but it does not
follow that isspace(negative_value_not_equal_to_EOF) must
return zero, or must even return at all.
Also, take another look at your `c && 0xff' (by which I
imagine you actually meant `c & 0xff'). Let's assume, as you
apparently have, a system with eight-bit characters and two's
complement arithmetic. Let's further assume EOF == -1, which
is the case for most implementations. Then `EOF & 0xff' gives
the value 255 -- but 255 is the code for some perfectly valid
character. If the current locale considers that character as
a space (or as an XXXX for the isXXXX() function), you have the
conflicting requirement that isXXXX(EOF) must return zero but
isXXXX(255) must return non-zero. If the function's first step
is to convert EOF to 255, the distinction can no longer be made.
The Standard requirements for non-negative notwithstanding, having
checked the value for EOF and finding that it is not, mask the value
with 0xff and carry on. Surely.
That would work (on a two's complement eight-bit system).
It is possible that `(unsigned char)c' does exactly this
masking. However, the cast will work on all systems while
your mask will work on only some. Also, on systems where
char is already unsigned, the cast presumably compiles to
a no-op while your cast generates unnecessary code. All
in all, the cast wins on both portability and efficiency.
The Standard's mention of 'unsigned char' in this context is
unfortunate. We are talking about values of an int.
Again, I'm not sure what you mean. By "unfortunate" do you
mean "The Standard is wrong," or do you mean "It's too bad the
pre-Standard <ctype.h> worked this way so the Standard had to
adopt it?"
Note, too, that the int values in question are, specifically,
the value of EOF and the values of unsigned char.
I think it's a question of domains within a range. For 32-bit unsigned
integers, the range of values is 0..4,294,967,295. NULL defined as 0
is within the domain of pointers and EOF as -1 is outside the domain
of characters. Good choices.
For the third time, I fail to understand what you are trying
to say -- but this time, I can't even begin to puzzle it out.