S
Stephen Sprunk
I mentioned my argument for that conclusion earlier in this thread -
both you and Keith seem to have skipped over it without either
accepting it or explaining why you had rejected it. Here it is
again.
I'll admit that I didn't quite understand the relevance the first time;
you added some clarification this time (plus some of the other points
discussed have started to sink in), so now I think I get it.
... While, in general, conversion to signed type of a value that is
too big to be represented by that type produces an implementation-
defined result or raises an implementation-defined signal, for this
particular conversion, I think that 7.21.2p3 implicitly prohibits the
signal, and requires that if 'c' is an unsigned char, then
(unsigned char)(int)c == c
If CHAR_MAX > INT_MAX, then 'char' must behave the same as 'unsigned
char'. Also, on such an implementation, there cannot be more valid
'int' values than there are 'char' values, and the inversion
requirement implies that there cannot be more char values than there
are valid 'int' values. This means that we must also have, if 'i' is
an int object containing a valid representation, that
(int)(char)i == i
This is indeed an interesting property of such systems, and one with
unexpectedly far-reaching implications.
In particular, this applies when i==EOF, which is why comparing
fgetc() values with EOF is not sufficient to determine whether or not
the call was successful.
I'd wondered about that, since the usual excuse for fgetc() returning an
int is to allow for EOF, which is presented by most introductory texts
as being impossible to mistake for a valid character.
Negative zero and positive zero have to
convert to the same unsigned char, which would make it impossible to
meet both inversion requirements, so it also follows that 'int' must
have a 2's complement representation on such a platform.
That only holds if plain char is unsigned, right?
It seems these seemingly-unrelated restrictions would not apply if plain
char were signed, which would be the (IMHO only) logical choice if
character literals were signed.
You've already said that. What you haven't done so far is explained
why. I agree that there's a bit of conflict there, but 'insane' seems
extreme.
Perhaps "insane" was a bit strong, but I see no rational excuse for the
signedness of plain chars and character literals to differ; the two are
logically linked, and only C's definition of the latter as "int" even
allows such a bizarre case to exist in theory.
IMHO, that C++ implicitly requires the signedness of the two to match,
apparently without problems, is an argument in favor of adopting the
same rule in C. As long as the signedness matches, none of the problems
mentioned in this thread would come up--and potentially break code that
was not written to account for this unlikely corner case.
I'd forgotten that C++ had a different rule for the value of a
character literal than C does. The C rule is defined in terms of
conversion of a char object's value to type 'int', which obviously
would be inappropriate given that C++ gives character literals a type
of 'char'. Somehow I managed to miss that "obvious" conclusion, and I
didn't bother to check. Sorry.
I'm in no position to complain about that.
Every time I've brought up the odd behavior of implementations which
have UCHAR_MAX > INT_MAX, it's been argued that they either don't
exist or are so rare that we don't need to bother worrying about
them. Implementations where CHAR_MAX>INT_MAX must be even rarer
(since they are a subset of implementations where UCHAR_MAX >
INT_MAX), so I'm surprised (and a bit relieved) to see someone
actually arguing for the probable existence of such implementations.
I'd feel happier about it if someone could actually cite one, but I
don't remember anyone ever doing so.
I'm not arguing for the _probable_ existence of such systems as much as
admitting that I don't have enough experience with atypical systems to
have much idea what's really out there on the fringes, other than
various examples given here since I've been reading. The world had
pretty much standardized on twos-complement systems with flat, 32-bit
address spaces by the time I started using C; 64-bit systems were my
first real-world experience with having to think about variations in the
sizes of base types--and even then usually only pointers.
S