VLA question

K

Keith Thompson

James Kuyper said:
Stephen Sprunk said:
On 27-Jun-13 07:28, James Kuyper wrote: [...]
The value of a character literal will be same, whether it has type
'int' or type 'char', so long as char is signed, or is unsigned with
CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX could it matter.
Character literals that currently have a negative value would instead
have a positive value greater than INT_MAX.

The only example I can envision a problem with is a character literal
that today is negative. IIRC, the conversion to char is well-defined in
that case. However, if character literals were char, it'd have a large
positive value. Storing in a char would still be fine, but storing in
an int would require a possibly-problematic conversion.

That doesn't seem right. A character constant that has a negative value
today (because plain char is a signed type) would still have a negative
value if character constants were of type char. It would just be a
negative value of type char.

Here's a demonstration:

#include <stdio.h>
int main(void) {
printf("'\\x80' = %d\n", '\x80');
return 0;
}

On a system with CHAR_BIT==8, with plain char being signed, I get:

'\x80' = -128

If '\x80' were of type char, its value would be (char)-128, which would
be promoted to int (because printf is variadic), giving the same result.

As I indicated above, the problem I described arises only on
implementations where CHAR_MAX > INT_MAX. If CHAR_BIT==8, then you can't
have been testing on such a system.

Sure I was using a system with CHAR_BIT==8.

My point was that I think Stephen is mistaken in his statement that:

The only example I can envision a problem with is a character
literal that today is negative. IIRC, the conversion to char is
well-defined in that case. However, if character literals were
char, it'd have a large positive value.

I don't think that changing charctaer constants from int to char
would cause the values of any such constants to change from negative
to positive, assuming the signedness of char isn't changed at the
same time.
 
B

Ben Bacarisse

Keith Thompson said:
Suppose CHAR_BIT==32, CHAR_MIN==-2**31, CHAR_MAX==2**31-1,
sizeof(int)==1, and int has one padding bit, so INT_MAX==2**30-1.

"The precision of an integer type is the number of bits it uses to
represent values, excluding any sign and padding bits."

In your example, the padding bit means that the precision of int is less
than that of signed char. But that's not allowed because

"[t]he rank of a signed integer type shall be greater than the rank of
any signed integer type with less precision"

and

"[t]he rank of long long int shall be greater than the rank of long
int, which shall be greater than the rank of int, which shall be
greater than the rank of short int, which shall be greater than the
rank of signed char".
 
Ö

Öö Tiib

Others have stated that C++ and C are different languages and I agree
with that. Though code written in the common subset is, technically,
C++, it is often non-idiomatic C++.
Many C-isms are considered bad style in C++.

Do not dump that common subset just because it is not idiomatic C++. I
agree with you about different languages and idioms, but that does not
matter when there is a situation that can be solved by using common subset.

Example good property of common subset between C and C++: It is better and
more modern C than C89. (by my taste and some others, YMMV)

Example situation where that property helps: Code is required to be C and
one of (multiple) targets is required to be Microsoft compiler. Microsoft
C compiler is basically C89 but Microsoft C++ compiler compiles
"common subset" pretty well (few warnings to silence).

See? The code is going to be far from idiomatic C++, but that does not
matter, it will be regardless compiled on C++ compiler.
 
Ö

Öö Tiib

[...]
Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.

What's the correct cast to insert in

ptr->link = malloc(sizeof *ptr->link);

If we are are preprocessing for C++11 compiler then something like:

ptr->link = (decltype(ptr->link))malloc(sizeof *ptr->link);

If for C++03 or earlier then its *lot* harder. Extensions may help.
For example g++ had 'typeof' extension that worked like 'decltype'.
Here's another:

#include <limits.h>
#if INT_MAX >= 1000000
int *array;
#else
long *array;
#endif
...

array = malloc(n * sizeof *array);

It is basically same.

array = (decltype(array))malloc(n * sizeof *array);

I pretty much see that this cast is redundant and
annoying from code's readability perspective.
Note that simply running the source through a C preprocessor and
then through the cast-adding tool produces a result that is not
portable. A portable rewrite of the final line would be

array =
#if INT_MAX >= 1000000
(int*)
#else
(long*)
#endif
malloc(n * sizeof *array);

Yes, Stroustrup proposed 'decltype' at 2002 but simple things like it
take decades like always. Recent versions of gcc, msvc, clang, icc
and even Borland's c++ builder seem to recognize decltype so it is
quite widely supported.
"Trivial" is relative, but this doesn't feel "trivial" to me.

There may be other, even trickier cases. When a tool fails to
understand a piece of code then that indicates that human might
fail as well. So it may be better to simplify code not smartify tools.
 
J

James Kuyper

Sure I was using a system with CHAR_BIT==8.

My point was that I think Stephen is mistaken in his statement that:

The only example I can envision a problem with is a character
literal that today is negative. IIRC, the conversion to char is
well-defined in that case. However, if character literals were
char, it'd have a large positive value.

He was just paraphrasing what I said - if he was wrong, I was wrong.
I don't think that changing charctaer constants from int to char
would cause the values of any such constants to change from negative
to positive, assuming the signedness of char isn't changed at the
same time.

6.4.4.4p10: "If an integer character constant contains a single
character or escape sequence, its value is the one that results when an
object with type char whose value is that of the single character or
escape sequence is converted to type int."
If a char object contains the representation of a value greater than
INT_MAX, when that value is converted to int, the result will be
negative. Therefore, under the current rules, the corresponding
character literals must have a negative value. If the rules were changed
to give them the type char, they would have the actual value of the
corresponding char objects, which would be greater than INT_MAX.
 
E

Eric Sosman

[...]
Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.

What's the correct cast to insert in

ptr->link = malloc(sizeof *ptr->link);

If we are are preprocessing for C++11 compiler then something like:

ptr->link = (decltype(ptr->link))malloc(sizeof *ptr->link);
[...]

Perhaps I've misunderstood (not for the first time, nor the
last): I thought the cast-adder would produce code that was valid
in the much-discussed "common subset" of C++ and C. Indeed, the
word "redundant" suggests that line, since only in C would the
cast be unnecessary. But then, the solution you offer is quite
clearly not part of any "common subset" ... So I'm afraid I just
haven't grasped your meaning.
 
I

Ian Collins

Eric said:
On 6/26/2013 6:17 PM, Öö Tiib wrote:
[...]
Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.

What's the correct cast to insert in

ptr->link = malloc(sizeof *ptr->link);

If we are are preprocessing for C++11 compiler then something like:

ptr->link = (decltype(ptr->link))malloc(sizeof *ptr->link);
[...]

Perhaps I've misunderstood (not for the first time, nor the
last): I thought the cast-adder would produce code that was valid
in the much-discussed "common subset" of C++ and C. Indeed, the
word "redundant" suggests that line, since only in C would the
cast be unnecessary. But then, the solution you offer is quite
clearly not part of any "common subset" ... So I'm afraid I just
haven't grasped your meaning.

#if !defined __cplusplus
# define decltype(X) void*
#endif

?
 
E

Eric Sosman

Eric said:
17 PM, Öö Tiib wrote:
[...]
Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.

What's the correct cast to insert in

ptr->link = malloc(sizeof *ptr->link);

If we are are preprocessing for C++11 compiler then something like:

ptr->link = (decltype(ptr->link))malloc(sizeof *ptr->link);
[...]

Perhaps I've misunderstood (not for the first time, nor the
last): I thought the cast-adder would produce code that was valid
in the much-discussed "common subset" of C++ and C. Indeed, the
word "redundant" suggests that line, since only in C would the
cast be unnecessary. But then, the solution you offer is quite
clearly not part of any "common subset" ... So I'm afraid I just
haven't grasped your meaning.

#if !defined __cplusplus
# define decltype(X) void*
#endif

?

Well, maybe. Sort of makes a mockery of the "common subset"
notion, though. Take it a step further:

#ifdef __cplusplus
// Arbitrary C++ code here
#else
// Arbitrary C code here
#end

.... and now the "common subset" includes the entirety of both
languages -- the subset is the union!

(Long ago in this newsgroup -- I think ANSI C was still
a draft, with `noalias' -- somebody posted a "Hello, World!"
program. The unusual feature was that the program would
output "Hello, World!" when its source was fed to a C
implementation *or* to a Fortran implementation *or* to the
Unix shell. So: Is it useful to speak of the "common subset"
of C, Fortran, and sh?)
 
Ö

Öö Tiib

On 6/26/2013 6:17 PM, Öö Tiib wrote:
[...]
Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.

What's the correct cast to insert in

ptr->link = malloc(sizeof *ptr->link);

If we are are preprocessing for C++11 compiler then something like:

ptr->link = (decltype(ptr->link))malloc(sizeof *ptr->link);
[...]

Perhaps I've misunderstood (not for the first time, nor the
last): I thought the cast-adder would produce code that was valid
in the much-discussed "common subset" of C++ and C.

It can be. Conditional compiling and macros have to be used for to get
rid of necessary for C++ and illegal in C things like 'extern "C"' or
'decltype' casts. I had something like that in mind:

ptr->link =
#if defined __cplusplus
(decltype(ptr->link))
#endif
malloc(sizeof *ptr->link);

However now I see that Ian Collins suggested even better way.
Indeed, the word "redundant" suggests that line, since only in C would
the cast be unnecessary. But then, the solution you offer is quite
clearly not part of any "common subset" ... So I'm afraid I just
haven't grasped your meaning.

"Redundant" because such casts lower readability of code (YMMV). When
something is needed that hurt readability then there are alternatives to
add them manually or to use a tool that adds them temporarily compiling
time. I often prefer latter.
 
K

Keith Thompson

David Brown said:
I agree with this. Almost all the code I write is in the common subset
of C99 and C++. Certainly there is almost nothing that you can write
that is valid and well-written C99 that does not have identical
functionality in C++. The only feature of C99 that is arguably useful
(in my code) but invalid in C++ is designated integers.

Do you mean designated initializers?
 
K

Keith Thompson

James Kuyper said:
He was just paraphrasing what I said - if he was wrong, I was wrong.


6.4.4.4p10: "If an integer character constant contains a single
character or escape sequence, its value is the one that results when an
object with type char whose value is that of the single character or
escape sequence is converted to type int."
If a char object contains the representation of a value greater than
INT_MAX, when that value is converted to int, the result will be
negative. Therefore, under the current rules, the corresponding
character literals must have a negative value. If the rules were changed
to give them the type char, they would have the actual value of the
corresponding char objects, which would be greater than INT_MAX.

Got it, you're right.

Example:

CHAR_BIT == 16
sizeof(int) == 1
CHAR_MIN == 0
CHAR_MAX == 65535
INT_MIN == -32768
INT_MAX == +32767

`\xffff' is a character constant, which is of type int. Its value
is the result of converting (char)65535 to type int, which is likely
to be -1. If character constants were of type char, it would have
the positive value (char)65535 of type char.

Just to add to the frivolity, the result of the conversion
is implementation-defined. Throw one's-complement and
sign-and-magnitude into the mix, and things get fun.
 
M

Malcolm McLean

On Thursday, 27 June 2013 17:17:28 UTC+3, Eric Sosman wrote:

There may be other, even trickier cases. When a tool fails to
understand a piece of code then that indicates that human might
fail as well. So it may be better to simplify code not smartify tools.

If a tool doesn't always work, then it can be extremely irritating.
You might have thousands of C files to process. It fails on just one of them,
but that means you've got to get a programmer to fix up the code manually,
then document that, for that instance, the tool-chain fails. That adds a lot
of cost, and means that errors can much more easily slip in.
 
Ö

Öö Tiib

If a tool doesn't always work, then it can be extremely irritating.

I have not seen any tools that always work. Some are more robust, some
less but godly robustness is missing. It is because there are always
defects in code, compilers, linkers, standard libraries operating systems
and hardware on what that all runs.
You might have thousands of C files to process. It fails on just one of them,
but that means you've got to get a programmer to fix up the code manually,
then document that, for that instance, the tool-chain fails. That adds a lot
of cost, and means that errors can much more easily slip in.

When we are talking about repositories of thousands of files then we are
likely talking about efforts of thousands of man-days and so we are
likely talking about teams of tens of developers? Mere build may take
several minutes. Therefore the build (involving compilers, code
generators,static analyzers, running unit tests etc.) is best to be done
by continuous integration system (or farm) to save the time of each
developer building it.

A tool does not suddenly start to fail out of blue. Either someone
modified the file with what it fails or modified the tool or modified
something on what one or other depends. If integration is continuous then
it is very clear who committed that breaking change. Just back out
that breaking change-set, notify the one who committed it and let him to
deal with it. If he can't then he will find aid who can. We are software
developers so dodging defects is our everyday bread and butter.
 
S

Stephen Sprunk

That doesn't seem right. A character constant that has a negative
value today (because plain char is a signed type) would still have a
negative value if character constants were of type char. It would
just be a negative value of type char.

With unsigned plain char, a character literal with a negative int value
today would have a large positive value if its type changed to char,
assuming the implementation didn't change to signed plain char at the
same time.

CHAR_MAX > INT_MAX with signed plain char requires int to have padding
bits and less range than char, which AIUI isn't allowed.

S
 
M

Malcolm McLean

I have not seen any tools that always work. Some are more robust, some
less but godly robustness is missing.
The computer can break.
But most C compilers will always compile valid C code, most text editors will
always show the real contents of files, most compressors will always
archive correctly. The bugs are elsewhere. If you use the Unix philosophy
of "each tool does one thing" then those tools tend to be stable and
bug free. If you use the alternative philosophy of the "integrated
system" then you're constantly adding features, and often things
break. (However integrated systems are often easier to use, it's not
all one way).
 
K

Keith Thompson

Stephen Sprunk said:
With unsigned plain char, a character literal with a negative int value
today would have a large positive value if its type changed to char,
assuming the implementation didn't change to signed plain char at the
same time.

Right -- but that's only an issue when CHAR_BIT >= 16, which is the
context I missed in my previous response. As I also noted elsethread,
the conversion from char to int, where char is an unsigned type and the
value doesn't fit, is implementation-defined; the result is *probably*
negative, but it's not guaranteed.
CHAR_MAX > INT_MAX with signed plain char requires int to have padding
bits and less range than char, which AIUI isn't allowed.

I think that's right.
 
Ö

Öö Tiib

The computer can break.
But most C compilers will always compile valid C code, most text editors will
always show the real contents of files, most compressors will always
archive correctly. The bugs are elsewhere.

That I told. When things fail let the author of situation to fix it.
99% of cases he did mess something up. 1% of cases he discovers a bug in
compiler or the like.
If you use the Unix philosophy of "each tool does one thing" then those
tools tend to be stable and bug free.

I like that philosophy. I described a simple tool that can preprocess
source before compiler (add casts to mallocs, maybe add extern "C" to
headers). As result majority of good C code can be compiled with
both C and C++ compiler. There will be cases that still can't but then
it is better to let human to adjust them instead of making the tool more
smart and error-prone.
If you use the alternative philosophy of the "integrated
system" then you're constantly adding features, and often things
break. (However integrated systems are often easier to use, it's not
all one way).

That is AFAIK still Unix philosophy. We pipe together those simple tools
to get more sophisticated results. If we just would use each of those
simple tools alone by hand then Unix would be annoying to use. That set
will more likely fail since there are more tools and details but on each
case it is usually simple to understand problem in some simple step in
that complex chain.
 
J

James Kuyper

The only example I can envision a problem with is a character literal
that today is negative. IIRC, the conversion to char is well-defined in
that case. However, if character literals were char, it'd have a large
positive value. Storing in a char would still be fine, but storing in

Up to this point, you're saying almost exactly what I just said, just
with slightly different wording.
an int would require a possibly-problematic conversion.

Is that the concern?

Almost. The conversion to 'int' would be guaranteed to produce exactly
the same value that the character literal would have had under the
current rules. In order to demonstrate the change, you have to convert
it to a signed type with a MAX value greater than INT_MAX. 'long int' is
likely to be such a type, but even intmax_t is not guaranteed to be such
a type.
If we also assume that int has no padding bits, that doesn't seem
completely unreasonable, actually. There are probably DSPs like that.

The problem could be solved by the implementation defining plain char to
be signed, which is the only sane choice if a character literal can be
negative (even as an int) in the first place.

You might consider it insane to have char be unsigned on such a
implementation, but such an implementation could be fully conforming. It
would violate some widely held expectations, but if it is fully
conforming, then those expectations were unjustified. Is there any
reason other than such expectations why you would consider such an
implementation insane?
If all character literals are non-negative and plain char is unsigned,
then there is no problem making them char on such a system. ...

For implementations where CHAR_MAX > INT_MAX, some character literals
must have a negative value, so that never applies.
... That is
what a C++ implementation would have to do, ...

Why? What provision of the C++ standard would force them to do that?
I'm still wrapping my head around your (excellent, BTW) analysis, but my
gut tells me that such code is indeed "broken". Perhaps if you could
come up with a concrete example of code that you think is non-broken but
would fail if character literals were char, rather than an abstract
argument?

To me, the single strongest argument against considering such code to be
broken is the fact that the C standard guarantees that character
literals have 'int' type. You haven't explained why you consider such
code broken. My best guess is that you think that choosing 'int' rather
than 'char' was so obviously and so seriously wrong, that programmers
have an obligation to write their code so that it will continue to work
if the committee ever gets around to correcting that mistake. I agree
with you that the C++ rules are more reasonable, but I don't think it's
likely that the C committee will ever change that feature of C, and it's
even less likely that it will do so any time soon. Therefore, that
doesn't seem like a reasonable argument to me - so I'd appreciate
knowing what your actual argument against such code is.

Summarizing what I said earlier, as far as I have been able to figure
out, the behavior of C code can change as a result of character literals
changing from 'int' to 'char' in only a few ways:
1. sizeof('character_literal'), which is a highly implausible construct;
it's only plausible use that isn't redundant with #ifdef __cplusplus is
by someone who incorrectly expects it to be equivalent to sizeof(char);
and if someone did expect that, they should also have incorrectly
expected it to be a synonym for '1'; so why not write '1' instead?
2. _Generic() is too new and too poorly supported for code using it to
be a significant problem at this time.
3. Obscure, and possibly mythical, implementations where CHAR_MAX > INT_MAX.

I consider the third item to be overwhelmingly the most significant of
the three issues, even though the unlikelihood of such implementations
makes it an insignificant issue in absolute terms. Ignoring the other
two issues (and assuming that LONG_MAX > INT_MAX), consider the
following code:


char c = 'C';
long literal = 'C';
long variable = c;
int offset = -13;

Under the current rules, on an implementation where CHAR_MAX <= INT_MAX:
c+offset and 'C'+ offset both have the type 'int'. 'c', 'literal' and
'variable' are all guaranteed to be positive.

Under the current rules, on an implementation where CHAR_MAX > INT_MAX:
c+offset will have the type 'unsigned int', but 'C' + offset will have
the type 'int'. It is possible (though extremely implausible) that c >
INT_MAX. If it is, the same will be true of 'variable', but 'literal'
will be negative.

If character literals were changed to have the type 'char', on an
implementation where CHAR_MAX <= INT_MAX:
c+offset and 'C' + offset would both have the type 'int'. 'c',
'literal', and 'variable' would all be guaranteed to be positive.

If character literals were changed to have the type 'char', on an
implementation where CHAR_MAX > INT_MAX:
c + offset and 'C' + offset would both have the type 'unsigned int'. It
would be possible (though extremely implausible) that c > INT_MAX. If it
were, the same would be true for both 'literal' and 'variable'.

Therefore, the only implementations where code would have different
behavior if character literals were changed to 'char' are those where
CHAR_MAX > INT_MAX. And the only differences involve behavior that,
under the current rules, is different from the behavior for CHAR_MAX <=
INT_MAX. Therefore, the only code that will break if this rule is
changes is code that currently goes out of it's way to correctly deal
with the possibility that CHAR_MAX > INT_MAX. I cannot see how you could
justify labeling code as 'broken', just because it correctly (in terms
of the current standard) deals with such an extremely obscure side issue.

On the other hand, the simplest way to deal with the possibility that
CHAR_MAX > INT_MAX is to insert casts:

if(c == (char)'C')

or
long literal = (char)'C';

Such code would not be affected by such a change. Only code that copes
with the possibility by other methods (such as #if CHAR_MAX > INT_MAX)
would be affected. I suppose you could call such code broken - but only
if you can justify insisting that programmers have an obligation to deal
with the possibility that the committee might change this rule.
 
S

Stephen Sprunk

Up to this point, you're saying almost exactly what I just said,
just with slightly different wording.

When I get confused, I tend to dump my current state in hopes that
someone can point out an error that led to said confusion.
Almost. The conversion to 'int' would be guaranteed to produce
exactly the same value that the character literal would have had
under the current rules.

Why? I thought that, while converting a negative value to unsigned was
well-defined, converting an out-of-range unsigned value to signed was not.
You might consider it insane to have char be unsigned on such a
implementation, but such an implementation could be fully conforming.
It would violate some widely held expectations, but if it is fully
conforming, then those expectations were unjustified. Is there any
reason other than such expectations why you would consider such an
implementation insane?

I consider it insane to have an unsigned plain char when character
literals can be negative.
For implementations where CHAR_MAX > INT_MAX, some character
literals must have a negative value, so that never applies.

Granted, one can create arbitrary character literals, but doing so
ventures into "contrived" territory. I only mean to include real
characters, which I think means ones in the source or execution
character sets.
Why? What provision of the C++ standard would force them to do that?

In C++, character literals have type char, so if char is unsigned, then
by definition no character literal can be negative.
To me, the single strongest argument against considering such code to
be broken is the fact that the C standard guarantees that character
literals have 'int' type. You haven't explained why you consider
such code broken. My best guess is that you think that choosing 'int'
rather than 'char' was so obviously and so seriously wrong,

Well, I'm not sure how much of a "choice" that really was, rather than
an accident of C's evolution from an untyped language and everything
becoming an "int" by default.
that programmers have an obligation to write their code so that it
will continue to work if the committee ever gets around to
correcting that mistake.

I cannot recall having seen any code that would break if that mistake
were corrected, and I'm reasonably certain none of mine would because I
thought character literals _were_ of type char until many years after
first learning C--and I still code as if it were true because I want my
code to still work if compiled as C++.
...
3. Obscure, and possibly mythical, implementations where CHAR_MAX >
INT_MAX.

I consider the third item to be overwhelmingly the most significant
of the three issues, even though the unlikelihood of such
implementations makes it an insignificant issue in absolute terms.

We know there are systems where sizeof(int)==1; can we really assume
that plain char is signed on all such implementations, which is the only
way for them to _avoid_ CHAR_MAX > INT_MAX?
Therefore, the only implementations where code would have different
behavior if character literals were changed to 'char' are those
where CHAR_MAX > INT_MAX. And the only differences involve behavior
that, under the current rules, is different from the behavior for
CHAR_MAX <= INT_MAX. Therefore, the only code that will break if this
rule is changes is code that currently goes out of it's way to
correctly deal with the possibility that CHAR_MAX > INT_MAX. I cannot
see how you could justify labeling code as 'broken', just because it
correctly (in terms of the current standard) deals with such an
extremely obscure side issue.

My gut says more code would break on systems where CHAR_MAX > INT_MAX
than would break if character literals were chars; few programmers would
think about accommodating the former or even realize it could exist,
whereas most either mistakenly think the latter is true or are actually
coding for the C-like subset of C++ where it _is_ true.

S
 
J

James Kuyper

Why? I thought that, while converting a negative value to unsigned was
well-defined, converting an out-of-range unsigned value to signed was not.

I mentioned my argument for that conclusion earlier in this thread -
both you and Keith seem to have skipped over it without either accepting
it or explaining why you had rejected it. Here it is again.

The standard defines the behavior of fputc() in terms of the conversion
of int to unsigned char (7.21.7.3p2). It defines the behavior of fgetc()
in terms of the conversion from unsigned char to int (7.21.7.1p2). All
other I/O is defined in terms of the behavior of those two functions -
the other I/O functions don't have to actually call those functions, but
they are required to behave as if they did. It also requires that "Data
read in from a binary stream shall compare equal to the data that were
earlier written out to that stream, under the same implementation."
(7.21.2p3). While, in general, conversion to signed type of a value that
is too big to be represented by that type produces an
implementation-defined result or raises an implementation-defined
signal, for this particular conversion, I think that 7.21.2p3 implicitly
prohibits the signal, and requires that if 'c' is an unsigned char, then

(unsigned char)(int)c == c

If CHAR_MAX > INT_MAX, then 'char' must behave the same as 'unsigned
char'. Also, on such an implementation, there cannot be more valid 'int'
values than there are 'char' values, and the inversion requirement
implies that there cannot be more char values than there are valid 'int'
values. This means that we must also have, if 'i' is an int object
containing a valid representation, that

(int)(char)i == i

In particular, this applies when i==EOF, which is why comparing fgetc()
values with EOF is not sufficient to determine whether or not the call
was successful. Negative zero and positive zero have to convert to the
same unsigned char, which would make it impossible to meet both
inversion requirements, so it also follows that 'int' must have a 2's
complement representation on such a platform.

....
I consider it insane to have an unsigned plain char when character
literals can be negative.

You've already said that. What you haven't done so far is explained why.
I agree that there's a bit of conflict there, but 'insane' seems extreme.

....
Granted, one can create arbitrary character literals, but doing so
ventures into "contrived" territory. I only mean to include real
characters, which I think means ones in the source or execution
character sets.

There's no requirement that any member, not even of the basic execution
character set, have an encoding that is <= INT_MAX. It's pretty
unlikely for members of the basic execution set, but it seems a very
likely thing for members of the extended character set that are
represented by UCNs for code points that are greater than INT_MAX. All
such characters must have a character literal that is negative if
CHAR_MAX > INT_MAX.
In C++, character literals have type char, so if char is unsigned, then
by definition no character literal can be negative.

I'd forgotten that C++ had a different rule for the value of a character
literal than C does. The C rule is defined in terms of conversion of a
char object's value to type 'int', which obviously would be
inappropriate given that C++ gives character literals a type of 'char'.
Somehow I managed to miss that "obvious" conclusion, and I didn't bother
to check. Sorry.

....
I cannot recall having seen any code that would break if that mistake
were corrected, and I'm reasonably certain none of mine would because I

The essence of what I've been saying is that it's fairly difficult to
write such code, except by relying upon sizeof() or _Generic(), and
almost impossible to do so accidentally.
We know there are systems where sizeof(int)==1; can we really assume
that plain char is signed on all such implementations, which is the only
way for them to _avoid_ CHAR_MAX > INT_MAX?

Every time I've brought up the odd behavior of implementations which
have UCHAR_MAX > INT_MAX, it's been argued that they either don't exist
or are so rare that we don't need to bother worrying about them.
Implementations where CHAR_MAX>INT_MAX must be even rarer (since they
are a subset of implementations where UCHAR_MAX > INT_MAX), so I'm
surprised (and a bit relieved) to see someone actually arguing for the
probable existence of such implementations. I'd feel happier about it if
someone could actually cite one, but I don't remember anyone ever doing so.
My gut says more code would break on systems where CHAR_MAX > INT_MAX
than would break if character literals were chars;

Well, that follows from what I said above. Almost all breakage that
would occur if character literals were changed to char would occur on
platforms where CHAR_MAX > INT_MAX, and would therefore count for both
categories. However, I'll go farther, and say that it's not only "more
code", but "a lot more code".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,076
Messages
2,570,565
Members
47,200
Latest member
Vanessa98N

Latest Threads

Top