On 26-Jun-13 12:30, Keith Thompson wrote:
....
[Re: Character literals]
IMHO, any code that relies on character literals being of type int is
_already_ broken, and the current mis-definition does nothing but hide
such bugs from the programmer, which is not worth preserving.
Given that the standard currently guarantees that they are of type
'int', I cannot agree with that assessment. However, that led me to
think about precisely what kind of code you're talking about. Obviously,
the value of (sizeof 'c') depends upon that assumption, but such
expressions would only occur in code deliberately contrived to
distinguish C from C++ according tot he current rules.
In C++, the type of a character literal can determine which operator
overload is selected, which is why C++ made this change. There was no
comparable reason in C, until the addition of _Generic() to C2011.
However, _Generic() is very new and not widely supported, and as a
result, not widely used. I doubt that you're talking about such code.
The value of a character literal will be same, whether it has type 'int'
or type 'char', so long as char is signed, or is unsigned with CHAR_MAX
<= INT_MAX. Only if CHAR_MAX > INT_MAX could it matter. Character
literals that currently have a negative value would instead have a
positive value greater than INT_MAX. Note that the requirement that
members of the basic execution character set have positive values
applies only after they have been stored in a char object; if char is
unsigned, it is not possible to violate that requirement, regardless of
the sign of the corresponding character literal. And, of course, that
requirement never applied to members of the extended execution character
set.
You might expect the signedness of character literals to matter, but in
most contexts, expressions of type 'char' are subject to the integer
promotions; either as part of the default argument promotions, or as
part of the usual arithmetic conversions. On implementations where char
is signed, or is unsigned with CHAR_MAX <= INT_MAX, char promotes to
int, so it doesn't make a difference. Again, it is only implementations
where CHAR_MAX > INT_MAX where values of type 'char' are promoted to
unsigned int, rather than int, with corresponding implications for the
usual arithmetic conversions.
Implementations where CHAR_MAX > INT_MAX are extremely rare, probably
non-existent, but if one does exist, it could be fully conforming to the
C standard. Such an implementation must have CHAR_MIN == 0, CHAR_BIT >=
16, and unless there's a lot of padding bits in an int, sizeof(int) == 1.
In general, when converting an unsigned type to a signed type "either
the result is implementation-defined or an implementation-defined signal
is raised". However, the behavior of fputc() is defined in terms of
conversion from int to unsigned int, and the behavior of fgetc() is
defined in terms of the revers conversion. The requirement that data
written to a binary file using fputc() must be unchanged when read back
using fgetc() therefore implies that, on such an implementation, those
conversions must be inverses, and cannot raise a signal. This
guarantees, for example, that the definition
int letter = 'c';
will place the same value in letter, regardless of whether 'c' is an int
with a value <= INT_MAX, or a 'char' with a value > INT_MAX. If you
change 'letter' to be of type long, then you could detect a difference,
assuming that LONG_MAX > INT_MAX. Similar complications come up when
trying to write code which has different behavior if 'char' promotes to
'unsigned int'.
On such an implementation, the character written to a binary file by
fputc(EOF), if successfully read back using fgetc(), will force fgetc()
to return EOF. Therefore, code using fgetc() must check ferror() and
feof(), it cannot rely upon comparison with EOF alone to determine
whether or not a character was successfully read.
Conclusion:
Code which depends upon the assumption that character literals have type
'int' (excluding cases involving sizeof or _Gerneric()) must do so in
one of two basic ways:
1. Assuming that character literals never have a value greater than INT_MAX.
2. Assuming that character literals are never promoted to 'unsigned int'.
Those assumptions could fail if character literals were changed to
'char', but only on implementations where CHAR_MAX > INT_MAX. Are you
sure you want to claim that all such code is "broken"?
It took me a long time to put together the analysis above, and the
relevant issues are subtle enough that I'm not sure the analysis is
entirely correct. In order to make such a change, the committee members
would have to make a similar analysis, and would have to discuss it
among themselves to determine whether the negative consequences of the
change are small enough to justify it. I agree with Keith - I am "not
sure whether the committee would be willing to take the time to
determine that".