How could a char be signed?

K

Keith Thompson

Seebs said:
In a strange historical quirk, the default type of char is actually a
third type which happens to have the same values and representation of
one or another of either signed char or unsigned char, but is not
either of the others. It is not always the case that a plain char
is signed, though, which makes it unlike all the other integer types.

For all the other integer types, if you omit the qualifier, you get
the signed version -- and you really do get the signed version of that
type, there is NO difference between "int" and "signed int".

For char, though, "char" may be either signed or unsigned, and whichever
it is, it remains a distinct type, even though it has the same range,
representation, and behavior. Which one is implementation-defined, so
it should be in the docs somewhere. (I have seen many compilers that
allow you to choose this.)

Another strange historical quirk is that it's implementation-defined
whether a plain int bit field is signed int or unsigned int.
 
N

Niklas Holsti

Malcolm said:
char may not be used to represent arbitrary bit patterns. The
justification is that some imaginary piece of haraware somewhere might
have "trap representations" that trigger errors when certain values
are loaded into "character registers".

That seems to contradict the statement in other posts that "char" has
the "same behaviour" as either "signed char" or "unsigned char"
(depending on the compiler).

In the (draft) copy of the C99 standard that I have, the rule that
speaks of trap representations (6.2.6.1 para 5) explicitly excludes
"character types", of which "char" is one, so it seems to me that any
representable value can be stored in a "char" variable and loaded from a
"char" variable.

Can you refer to a rule in the standard that shows that "char" can have
problems with trap representations?
 
K

Keith Thompson

Niklas Holsti said:
That seems to contradict the statement in other posts that "char" has
the "same behaviour" as either "signed char" or "unsigned char"
(depending on the compiler).

In the (draft) copy of the C99 standard that I have, the rule that
speaks of trap representations (6.2.6.1 para 5) explicitly excludes
"character types", of which "char" is one, so it seems to me that any
representable value can be stored in a "char" variable and loaded from a
"char" variable.

Can you refer to a rule in the standard that shows that "char" can have
problems with trap representations?

Not quite, but the phrasing is interesting:

Certain object representations need not represent a value of
the object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such
a representation is produced by a side effect that modifies
all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.
Such a representation is called a _trap representation_.

6.2.6.2 says that unsigned char cannot have padding bits;
since all CHAR_BIT bits are value bits, it cannot have any trap
representations. But 6.2.6.1p5 doesn't actually say that signed
or plain char cannot have trap representations, just that reading
a signed char trap representation doesn't cause undefined behavior.
(But then what is the behavior?)

An implementation with CHAR_BIT==8 might conceivably reserve the bit
pattern that would represent -128 as a trap representation.

I think the intent was that plain, signed, and unsigned char
cannot have trap representations, but I see a small loophole in
the current wording.
 
J

James Dow Allen

"Englishman" means an inhabitant of the South-Eastern chunk of those
little islands off the North-Western coast of Europe.

While I'm posting mainly to note certain exceptions (see below),
include me in the boring majority that tries to use words
as others do. Interestingly, c.l.c's own vociferous
Englishman joins John Kelly in the minority! :)

James Dow Allen
 
K

Keith Thompson

Kenneth Brody said:
Seebs said:
"Signed" chars may sound strange, but use the default (signed) type,
unless you have a specific need for unsigned.

In a strange historical quirk, the default type of char is actually a
third type which happens to have the same values and representation of
one or another of either signed char or unsigned char, but is not
either of the others. It is not always the case that a plain char
is signed, though, which makes it unlike all the other integer types.
[...]
Another strange historical quirk is that it's implementation-defined
whether a plain int bit field is signed int or unsigned int.

Which makes sense, in a way, given that you can have 1-bit bit fields, which
most people probably think of as 0 or 1, and requiring signed would mean 0
or -1.

That would argue in favor of making it unsigned by default, or
perhaps requiring the signedness to be explicit. The only reason
I can think of to make the signedness implementation-defined is to
cater to existing implementations.

How often do you want a bit field of a particular size but don't
care whether it's signed or unsigned?
 
S

Seebs

I keep hearing about these mysterious traps but have never actually used a
CPU which traps anything. Which are they? What will a conforming C
implementation do about a trap firing?

Nearly all CPUs trap SOMETHING. Segfaults and division by zero errors
are common.

In general, anything you do in response to a trap is conforming. Commonly,
operating systems will shut down programs which do such things.

-s
 
K

Keith Thompson

Kenneth Brody said:
But, typically, a "trap representation" is something that the mere reading
of the value causes "something". Merely loading an invalid value into a
pointer typically isn't a problem.

I'm not sure what's typical, but the behavior on accessing a trap
representation is undefined, not necessarily a "trap".

For example, suppose you have a conforming C implementation with 32-bit
2's-complement int, INT_MIN == -2**31, INT_MAX == +2**31-1 (i.e., the
usual extra negative value).

You could change <limits.h> so INT_MIN is -2***31+1 and update the
documentation to state that the bit pattern that would normally
represent -2**31 is a trap representation. The implementation
would still be conforming, with no other change in behavior.
The implementation would simply have chosen not to document the
(otherwise ordinary) behavior of accessing something with the
value -2**31.

If there's a problem in the CPU where some operations on that value
yield incorrect results, that might even be a reasonable thing to do.

[...]
True. What happens is up to the implementation, as the entire concept of a
"trap representation" is implementation-specific.

The concept of "trap representation" is standard. The (possibly
empty) set of trap representations for a given type is
implementation-defined. The behavior on accessing an object that
holds a trap representation is undefined.
 
S

Seebs

But, typically, a "trap representation" is something that the mere reading
of the value causes "something". Merely loading an invalid value into a
pointer typically isn't a problem.
However, on x86 processors running in "protected mode", the mere loading of
an invalid value into a segment register will cause a fault. So, in a mode
with "large data pointers", the following can cause a program crash:
void *foo(void)
{
void *uninitialized;
return uninitialized;
}

Good example.

.... I think it says a lot about the world that I've never had occasion to
know that the x86 has a trap representation.

Hmm. I wonder how many modern chips support signalling NaNs.

-s
 
C

Chris H

Joe Wright said:
[ snip all ]

I keep hearing about these mysterious traps but have never actually
used a CPU which traps anything. Which are they? What will a conforming
C implementation do about a trap firing?

Many CPU's have traps mainly those with a supervisor and user mode.
However this is all architecture specific and has nothing to do with the
compiler (of any language other than assembler).

And whilst I remember or at least as I remember it....

There are three types of char. signed char and unsigned char which
along with short, int, long and long long are interger types.

Then there is plain char which is a character type. The signedness of
this is implementation defined.

Plain char should only be used for characters and not be used as an
integer type. Only explicitly signed char and unsigned char should be
used as integer types.

AFAIK char is the only integer type which has an undefined state if it
is not explicitly signed or unsigned.
 
K

Keith Thompson

Chris H said:
AFAIK char is the only integer type which has an undefined state if it
is not explicitly signed or unsigned.

Implementation-defined, not undefined. It's also the only integer type
that's distinct from its signed and unsigned variants. int and signed
int are different names for the same type; char, signed char, and
unsigned char are three different types (two of which have the
same characteristics).

Another odd case is that it's implementatoin-defined whether an "int"
bit field is signed int or unsigned int
 
K

Keith Thompson

Kenneth Brody said:
Technically, "char" isn't an "integer type". ("signed char" and "unsigned
char" are, but plain old "char" is not.)
[chapter and verse snipped]

Interesting. I suspect that was an unintentional oversight.

(BTW, please don't send me e-mail copies of articles posted to Usenet.)
 
T

Tim Rentsch

Keith Thompson said:
Not quite, but the phrasing is interesting:

Certain object representations need not represent a value of
the object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such
a representation is produced by a side effect that modifies
all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.
Such a representation is called a _trap representation_.

6.2.6.2 says that unsigned char cannot have padding bits;
since all CHAR_BIT bits are value bits, it cannot have any trap
representations. But 6.2.6.1p5 doesn't actually say that signed
or plain char cannot have trap representations, just that reading
a signed char trap representation doesn't cause undefined behavior.
(But then what is the behavior?)

An implementation with CHAR_BIT==8 might conceivably reserve the bit
pattern that would represent -128 as a trap representation.

I think the intent was that plain, signed, and unsigned char
cannot have trap representations, but I see a small loophole in
the current wording.

There is no question that signed char (and also char if it is the
same as signed char) can have trap representations. 6.2.6.1p5
allows it generically; 6.2.6.2p2 allows it specifically. In
all other cases when a type cannot have trap representations,
the Standard contains an explicit statement to that effect,
and there is no such statement for signed char.

Given that a signed char can have trap representations, we can ask
when happens when we read such objects, using an lvalue of type
signed char. Personally I think this is undefined behavior, but
suppose it's defined -- what's the definition? The only definition
that makes any sort of sense is to load the trap representation
"value" as the value of the access.

Having gotten this trap representation "value", what can we do with
it? We should be able to store it into another object of type
signed char, that seems obvious. What else? It can't be combined
using any arithmetic operation (converting to 'int' would be
undefined behavior), or compared to another value (again undefined
behavior on conversion). In fact converting to any other type at
all is undefined behavior, since the "value" doesn't exist in any
meaningful sense except in the context of the original type.
I suppose an argument could be made that such a "value" of type
'signed char' could be converted to type 'char' if char has the
same representation as signed char. But there doesn't seem to
be anything else that could be done with it in a well-defined way.

On a somewhat more pragmatic note, for implementations (assuming
any actually exist) that have such trap representations, the
implementations probably define the results of comparisons, at
least against zero, in a sensible way. So standard string
operations would likely work, even if technically they would
constitute undefined behavior. Actually it isn't that far fetched
that some implementations would have trap representations for
signed char, since all it would take is a ones' complement
or signed-magnitude representation without a negative zero.

Summing up: such trap representations are possible; if they exist
the only thing to do with them is copy them around; pragmatically
they are likely to behave reasonably and not trap (but of course
you never know when your friendly looking processor is going to
transmogrify from Dr. x86 to Mr. DS9000).

One additional note -- of course, implementations are free to
define undefined behavior any way they like, and in that sense
'signed char' trap representations could be seen has having
well-defined behavior. But the Standard itself doesn't define the
behavior under such conditions (except perhaps for the noted
operations of copying and signed char/char conversions).
 
C

Chris H

John Kelly said:
Words and language are not confined to Google, Wikipedia, or your
favorite dictionary.

However English words are defined (confined by?) OED!

I think the Americans default to Websters.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top