The only example I can envision a problem with is a character literal
that today is negative. IIRC, the conversion to char is well-defined in
that case. However, if character literals were char, it'd have a large
positive value. Storing in a char would still be fine, but storing in
Up to this point, you're saying almost exactly what I just said, just
with slightly different wording.
an int would require a possibly-problematic conversion.
Is that the concern?
Almost. The conversion to 'int' would be guaranteed to produce exactly
the same value that the character literal would have had under the
current rules. In order to demonstrate the change, you have to convert
it to a signed type with a MAX value greater than INT_MAX. 'long int' is
likely to be such a type, but even intmax_t is not guaranteed to be such
a type.
If we also assume that int has no padding bits, that doesn't seem
completely unreasonable, actually. There are probably DSPs like that.
The problem could be solved by the implementation defining plain char to
be signed, which is the only sane choice if a character literal can be
negative (even as an int) in the first place.
You might consider it insane to have char be unsigned on such a
implementation, but such an implementation could be fully conforming. It
would violate some widely held expectations, but if it is fully
conforming, then those expectations were unjustified. Is there any
reason other than such expectations why you would consider such an
implementation insane?
If all character literals are non-negative and plain char is unsigned,
then there is no problem making them char on such a system. ...
For implementations where CHAR_MAX > INT_MAX, some character literals
must have a negative value, so that never applies.
... That is
what a C++ implementation would have to do, ...
Why? What provision of the C++ standard would force them to do that?
I'm still wrapping my head around your (excellent, BTW) analysis, but my
gut tells me that such code is indeed "broken". Perhaps if you could
come up with a concrete example of code that you think is non-broken but
would fail if character literals were char, rather than an abstract
argument?
To me, the single strongest argument against considering such code to be
broken is the fact that the C standard guarantees that character
literals have 'int' type. You haven't explained why you consider such
code broken. My best guess is that you think that choosing 'int' rather
than 'char' was so obviously and so seriously wrong, that programmers
have an obligation to write their code so that it will continue to work
if the committee ever gets around to correcting that mistake. I agree
with you that the C++ rules are more reasonable, but I don't think it's
likely that the C committee will ever change that feature of C, and it's
even less likely that it will do so any time soon. Therefore, that
doesn't seem like a reasonable argument to me - so I'd appreciate
knowing what your actual argument against such code is.
Summarizing what I said earlier, as far as I have been able to figure
out, the behavior of C code can change as a result of character literals
changing from 'int' to 'char' in only a few ways:
1. sizeof('character_literal'), which is a highly implausible construct;
it's only plausible use that isn't redundant with #ifdef __cplusplus is
by someone who incorrectly expects it to be equivalent to sizeof(char);
and if someone did expect that, they should also have incorrectly
expected it to be a synonym for '1'; so why not write '1' instead?
2. _Generic() is too new and too poorly supported for code using it to
be a significant problem at this time.
3. Obscure, and possibly mythical, implementations where CHAR_MAX > INT_MAX.
I consider the third item to be overwhelmingly the most significant of
the three issues, even though the unlikelihood of such implementations
makes it an insignificant issue in absolute terms. Ignoring the other
two issues (and assuming that LONG_MAX > INT_MAX), consider the
following code:
char c = 'C';
long literal = 'C';
long variable = c;
int offset = -13;
Under the current rules, on an implementation where CHAR_MAX <= INT_MAX:
c+offset and 'C'+ offset both have the type 'int'. 'c', 'literal' and
'variable' are all guaranteed to be positive.
Under the current rules, on an implementation where CHAR_MAX > INT_MAX:
c+offset will have the type 'unsigned int', but 'C' + offset will have
the type 'int'. It is possible (though extremely implausible) that c >
INT_MAX. If it is, the same will be true of 'variable', but 'literal'
will be negative.
If character literals were changed to have the type 'char', on an
implementation where CHAR_MAX <= INT_MAX:
c+offset and 'C' + offset would both have the type 'int'. 'c',
'literal', and 'variable' would all be guaranteed to be positive.
If character literals were changed to have the type 'char', on an
implementation where CHAR_MAX > INT_MAX:
c + offset and 'C' + offset would both have the type 'unsigned int'. It
would be possible (though extremely implausible) that c > INT_MAX. If it
were, the same would be true for both 'literal' and 'variable'.
Therefore, the only implementations where code would have different
behavior if character literals were changed to 'char' are those where
CHAR_MAX > INT_MAX. And the only differences involve behavior that,
under the current rules, is different from the behavior for CHAR_MAX <=
INT_MAX. Therefore, the only code that will break if this rule is
changes is code that currently goes out of it's way to correctly deal
with the possibility that CHAR_MAX > INT_MAX. I cannot see how you could
justify labeling code as 'broken', just because it correctly (in terms
of the current standard) deals with such an extremely obscure side issue.
On the other hand, the simplest way to deal with the possibility that
CHAR_MAX > INT_MAX is to insert casts:
if(c == (char)'C')
or
long literal = (char)'C';
Such code would not be affected by such a change. Only code that copes
with the possibility by other methods (such as #if CHAR_MAX > INT_MAX)
would be affected. I suppose you could call such code broken - but only
if you can justify insisting that programmers have an obligation to deal
with the possibility that the committee might change this rule.