Converting CHAR_MAX+1 to char?

  • Thread starter Johannes Schaub (litb)
  • Start date
J

Johannes Schaub (litb)

What happens in this case?

char c = CHAR_MAX + 1;

Not sure what happens. The spec only defines behavior for signed and
unsigned integral types it seems. Is the behavior undefined?
 
B

Balog Pal

Johannes Schaub (litb) said:
What happens in this case?

char c = CHAR_MAX + 1;

Not sure what happens. The spec only defines behavior for signed and
unsigned integral types it seems. Is the behavior undefined?

As I read the standard, unless CHAR_MAX == INT_MAX, the result is that c is
assigned an implementation-defined value.

The stuff on the right is promoted to int. Then addition happens. If that
overflows, it is UB. If not, the result is converted to char.
 
J

Johannes Schaub (litb)

Balog said:
As I read the standard, unless CHAR_MAX == INT_MAX, the result is that c
is assigned an implementation-defined value.

But i can't find it in the spec. :( It doesn't say what happens if the value
doesn't fit :( Can you point me to the parts where it says that about char?
 
K

Kai-Uwe Bux

Paavo said:
Plain char can be signed or unsigned, as defined by the implementation.
If it is signed, then I think 4.7.3 holds: "If the destination type is
signed, the value is unchanged if it can be represented in the
destination type (and bit-field width); otherwise, the value is
implementation-defined."

If char is signed and its size is less than int, then CHAR_MAX+1 can be
represented as an int, but obviously not as a char, thus assigning this
back to a char is implementation-defined. If char is signed and has the
same size than int, then CHAR_MAX+1 would result in integer overflow,
invoking undefined behavior.

I think, its just a tad more twisted. CHAR_MAX + 1 looks like a constant
expression and then the provision from [5/5] should kick in, which says the
program is ill-formed. (However, the standard acknowledges in a note that
most existing implementations are non-conforming in this regard and ignore
integer overflow.)

[...]


Best

Kai-Uwe Bux
 
J

Johannes Schaub (litb)

Paavo said:
Plain char can be signed or unsigned, as defined by the implementation.
If it is signed, then I think 4.7.3 holds: "If the destination type is
signed, the value is unchanged if it can be represented in the
destination type (and bit-field width); otherwise, the value is
implementation-defined."

But where does it say that char can be signed? I only find text where it
says it could hold negative values. It does not seem to say that it can be
included in the list of signed or unsigned integer types by an
implementation :(
 
J

Johannes Schaub (litb)

Kai-Uwe Bux said:
Paavo said:
Plain char can be signed or unsigned, as defined by the implementation.
If it is signed, then I think 4.7.3 holds: "If the destination type is
signed, the value is unchanged if it can be represented in the
destination type (and bit-field width); otherwise, the value is
implementation-defined."

If char is signed and its size is less than int, then CHAR_MAX+1 can be
represented as an int, but obviously not as a char, thus assigning this
back to a char is implementation-defined. If char is signed and has the
same size than int, then CHAR_MAX+1 would result in integer overflow,
invoking undefined behavior.

I think, its just a tad more twisted. CHAR_MAX + 1 looks like a constant
expression and then the provision from [5/5] should kick in, which says
the program is ill-formed. (However, the standard acknowledges in a note
that most existing implementations are non-conforming in this regard and
ignore integer overflow.)

The addition is done in the int or unsigned int domain so this does not
apply to the range of char.
 
J

James Kanze

What happens in this case?
char c = CHAR_MAX + 1;
Not sure what happens. The spec only defines behavior for
signed and unsigned integral types it seems. Is the behavior
undefined?

Technically, it's implementation defined. In practice, it's a
pretty save bet that you'll get the same things as
char c = CHAR_MIN;
With, perhaps, a compiler warning.
 
K

Kai-Uwe Bux

Johannes said:
Kai-Uwe Bux said:
Paavo said:
Balog Pal wrote:


"Johannes Schaub (litb)" <[email protected]>
What happens in this case?

char c = CHAR_MAX + 1;

Not sure what happens. The spec only defines behavior for signed and
unsigned integral types it seems. Is the behavior undefined?

As I read the standard, unless CHAR_MAX == INT_MAX, the result is
that c is assigned an implementation-defined value.


But i can't find it in the spec. :( It doesn't say what happens if the
value doesn't fit :( Can you point me to the parts where it says that
about char?

Plain char can be signed or unsigned, as defined by the implementation.
If it is signed, then I think 4.7.3 holds: "If the destination type is
signed, the value is unchanged if it can be represented in the
destination type (and bit-field width); otherwise, the value is
implementation-defined."

If char is signed and its size is less than int, then CHAR_MAX+1 can be
represented as an int, but obviously not as a char, thus assigning this
back to a char is implementation-defined. If char is signed and has the
same size than int, then CHAR_MAX+1 would result in integer overflow,
invoking undefined behavior.

I think, its just a tad more twisted. CHAR_MAX + 1 looks like a constant
expression and then the provision from [5/5] should kick in, which says
the program is ill-formed. (However, the standard acknowledges in a note
that most existing implementations are non-conforming in this regard and
ignore integer overflow.)

The addition is done in the int or unsigned int domain

Yes, and it can overflow in the signed case.
so this does not apply to the range of char.

"This" being?

I see, I was not specific. My remark only applies to the case where there is
an overflow (CHAR_MAX == INT_MAX). The overflow is not related to the char
type at all; and I don't think, anybody claimed that. The additional point
is just that if an overflow happens in a constant expression, the program is
ill-formed.


Best

Kai-Uwe Bux
 
J

James Kanze

"Johannes Schaub (litb)" <[email protected]>
As I read the standard, unless CHAR_MAX == INT_MAX, the result
is that c is assigned an implementation-defined value.
The stuff on the right is promoted to int. Then addition
happens. If that overflows, it is UB.

If that overflows, the program is ill formed, because it is a
constant expression. (And you're right to point out this
possibility. I tend to forget it, because it isn't the case on
any of the machines I work on. But from what I understand, it's
a fairly frequent case on embedded platforms.)
If not, the result is converted to char.

Except that the results are guaranteed not to fit into a char,
so the results of the conversion are implementation defined.
The C standard goes further, and limits what those results can
be: either an implementation defined value, or an implementation
defined signal. (The C standard neglects to say what happens if
the conversion occurs at compile time. I would expect a
compiler error if the runtime implementation would raise a
signal, but I'm not sure that the standard actually allows
this.)

From a quality point of view, the only reasonable response would
be the signal. But for a number of historical reasons, that
would break so much code that it's not going to happen. In
practice, I think it's safe to say that you'll always get
CHAR_MIN.
 
J

James Kanze

Paavo Helde wrote:

[...]
But where does it say that char can be signed? I only find
text where it says it could hold negative values. It does not
seem to say that it can be included in the list of signed or
unsigned integer types by an implementation :(

Last sentence in §3.9.1/1: "a plain char object can take on
either the same values as a signed char or an unsigned char;
which one is implementation-defined."
 
K

Kai-Uwe Bux

James said:
Paavo Helde wrote:
[...]
Plain char can be signed or unsigned, as defined by the
implementation. If it is signed, then I think 4.7.3 holds:
"If the destination type is signed, the value is unchanged
if it can be represented in the destination type (and
bit-field width); otherwise, the value is
implementation-defined."
But where does it say that char can be signed? I only find
text where it says it could hold negative values. It does not
seem to say that it can be included in the list of signed or
unsigned integer types by an implementation :(

Last sentence in §3.9.1/1: "a plain char object can take on
either the same values as a signed char or an unsigned char;
which one is implementation-defined."

I was thinking of the same, but I think the problem runs deeper than
defining the set of values. Plain char, signed char, and unsigned char are
three distinguished types. Only signed char and unsigned char are listed as
signed and unsigned integral types. Plain char, like bool, is an integral
type that appears to be neither signed nor unsigned. This poses a problem
for interpreting integer conversions from and to plain char because even if
plain char cannot hold negative values and behaves like unsigned char, there
is formally no guarantee that, e.g., arithmetic with plain char is mod 2^N
or that integer values are converted mod 2^N. For that, one would need
farther reaching provisions.


Best

Kai-Uwe Bux
 
J

Johannes Schaub (litb)

Kai-Uwe Bux said:
Kai-Uwe Bux said:
Paavo Helde wrote:


Balog Pal wrote:


"Johannes Schaub (litb)" <[email protected]>
What happens in this case?

char c = CHAR_MAX + 1;

Not sure what happens. The spec only defines behavior for signed and
unsigned integral types it seems. Is the behavior undefined?

As I read the standard, unless CHAR_MAX == INT_MAX, the result is
that c is assigned an implementation-defined value.


But i can't find it in the spec. :( It doesn't say what happens if the
value doesn't fit :( Can you point me to the parts where it says that
about char?

Plain char can be signed or unsigned, as defined by the implementation.
If it is signed, then I think 4.7.3 holds: "If the destination type is
signed, the value is unchanged if it can be represented in the
destination type (and bit-field width); otherwise, the value is
implementation-defined."

If char is signed and its size is less than int, then CHAR_MAX+1 can be
represented as an int, but obviously not as a char, thus assigning this
back to a char is implementation-defined. If char is signed and has the
same size than int, then CHAR_MAX+1 would result in integer overflow,
invoking undefined behavior.

I think, its just a tad more twisted. CHAR_MAX + 1 looks like a constant
expression and then the provision from [5/5] should kick in, which says
the program is ill-formed. (However, the standard acknowledges in a note
that most existing implementations are non-conforming in this regard and
ignore integer overflow.)

The addition is done in the int or unsigned int domain

Yes, and it can overflow in the signed case.
so this does not apply to the range of char.

"This" being?

I see, I was not specific. My remark only applies to the case where there
is an overflow (CHAR_MAX == INT_MAX). The overflow is not related to the
char type at all; and I don't think, anybody claimed that. The additional
point is just that if an overflow happens in a constant expression, the
program is ill-formed.

Ah, i see now. Somehow i thought that if CHAR_MAX==INT_MAX, it must promote
to unsigned int. But i'm wrong, of course (CHAR_MAX is actually already of
the promoted type, which i missed too). If they are equal, only UCHAR_MAX
must be of unsigned int type, i think. Good catch on this, mate!
 
J

Johannes Schaub (litb)

Kai-Uwe Bux said:
James said:
Paavo Helde wrote:
[...]
Plain char can be signed or unsigned, as defined by the
implementation. If it is signed, then I think 4.7.3 holds:
"If the destination type is signed, the value is unchanged
if it can be represented in the destination type (and
bit-field width); otherwise, the value is
implementation-defined."
But where does it say that char can be signed? I only find
text where it says it could hold negative values. It does not
seem to say that it can be included in the list of signed or
unsigned integer types by an implementation :(

Last sentence in §3.9.1/1: "a plain char object can take on
either the same values as a signed char or an unsigned char;
which one is implementation-defined."

I was thinking of the same, but I think the problem runs deeper than
defining the set of values. Plain char, signed char, and unsigned char are
three distinguished types. Only signed char and unsigned char are listed
as signed and unsigned integral types. Plain char, like bool, is an
integral type that appears to be neither signed nor unsigned. This poses a
problem for interpreting integer conversions from and to plain char
because even if plain char cannot hold negative values and behaves like
unsigned char, there is formally no guarantee that, e.g., arithmetic with
plain char is mod 2^N or that integer values are converted mod 2^N. For
that, one would need farther reaching provisions.

Exactly, i was worried about that one. Haven't found a solution for it in
the spec :(
 
B

Bo Persson

Johannes said:
Kai-Uwe Bux said:
James said:
On Mar 13, 11:05 pm, "Johannes Schaub (litb)"
Paavo Helde wrote:

[...]
Plain char can be signed or unsigned, as defined by the
implementation. If it is signed, then I think 4.7.3 holds:
"If the destination type is signed, the value is unchanged
if it can be represented in the destination type (and
bit-field width); otherwise, the value is
implementation-defined."

But where does it say that char can be signed? I only find
text where it says it could hold negative values. It does not
seem to say that it can be included in the list of signed or
unsigned integer types by an implementation :(

Last sentence in §3.9.1/1: "a plain char object can take on
either the same values as a signed char or an unsigned char;
which one is implementation-defined."

I was thinking of the same, but I think the problem runs deeper
than defining the set of values. Plain char, signed char, and
unsigned char are three distinguished types. Only signed char and
unsigned char are listed as signed and unsigned integral types.
Plain char, like bool, is an integral type that appears to be
neither signed nor unsigned. This poses a problem for interpreting
integer conversions from and to plain char because even if plain
char cannot hold negative values and behaves like unsigned char,
there is formally no guarantee that, e.g., arithmetic with plain
char is mod 2^N or that integer values are converted mod 2^N. For
that, one would need farther reaching provisions.

Exactly, i was worried about that one. Haven't found a solution for
it in the spec :(

If we look at the type traits (C++0x), is_signed and is_unsigned are
defined for all arithmetic types, which includes plain char. There the
expression char(-1) < char(0) defines its signedness.


Bo Persson
 
J

Johannes Schaub (litb)

Bo said:
Johannes said:
Kai-Uwe Bux said:
James Kanze wrote:

On Mar 13, 11:05 pm, "Johannes Schaub (litb)"
Paavo Helde wrote:

[...]
Plain char can be signed or unsigned, as defined by the
implementation. If it is signed, then I think 4.7.3 holds:
"If the destination type is signed, the value is unchanged
if it can be represented in the destination type (and
bit-field width); otherwise, the value is
implementation-defined."

But where does it say that char can be signed? I only find
text where it says it could hold negative values. It does not
seem to say that it can be included in the list of signed or
unsigned integer types by an implementation :(

Last sentence in �3.9.1/1: "a plain char object can take on
either the same values as a signed char or an unsigned char;
which one is implementation-defined."

I was thinking of the same, but I think the problem runs deeper
than defining the set of values. Plain char, signed char, and
unsigned char are three distinguished types. Only signed char and
unsigned char are listed as signed and unsigned integral types.
Plain char, like bool, is an integral type that appears to be
neither signed nor unsigned. This poses a problem for interpreting
integer conversions from and to plain char because even if plain
char cannot hold negative values and behaves like unsigned char,
there is formally no guarantee that, e.g., arithmetic with plain
char is mod 2^N or that integer values are converted mod 2^N. For
that, one would need farther reaching provisions.

Exactly, i was worried about that one. Haven't found a solution for
it in the spec :(

If we look at the type traits (C++0x), is_signed and is_unsigned are
defined for all arithmetic types, which includes plain char. There the
expression char(-1) < char(0) defines its signedness.

Indeed, and bool is stated unsigned by that. I think that is_signed doesn't
refer to the core-language notation of "signed" tho.

So if noone can find the bug in the spec should someone do a issue report on
it?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top