F
Flash Gordon
Eric said:Andrew said:Andrew Poelstra wrote:
[...]
I believe all of these are guaranteed:
char has no padding bits
char has no trap representations
Would you mind revealing where you find these guarantees?
If they are in the Standard, I have overlooked them.
The first has been mentioned in this group many times (although it
may pertain only to unsigned char), and the second seemed to me a
logical extension.
There are special guarantees for unsigned char, so that
it is possible to treat the representation of any object as
an array of unsigned char. This would not work if unsigned
char had trap representation or contained indeterminately-
valued padding bits.
This is covered in 6.2.6.2 para 1 of N1124 which describes padding bits
and explicitly states that unsigned char cannot have them.
With the range requirements for signed char, this means that signed char
can only have padding bits if CHAR_BIT is greater than 8, and char can
only have padding bits if it is signed and CHAR_BIT is greater than 8.
However, I am unaware of any similar guarantees for char,
either signed or plain. On an implementation where plain char
is unsigned one can deduce that it has no padding bits or traps
(argument: On such an implementation, plain char can represent
all the values unsigned char can, and since the latter "fills
the code space" the former must, too). But the argument doesn't
hold for signed char, or for plain char on an implementation
where CHAR_MIN<0.
Specifically, CHAR_MIN is allowed to be -127 on a 2s-complement system
with -128 being a trap. In addition, -0 is allowed to be a trap on
1s-complement and sign-magnitude implementations. Specifically, section
6.2.6.2 para 2 of N1124 describes this for all signed integer types with
no exception mentioned for char or signed char.
So you can have a trap representation for char even on a system with
CHARBIT==8 although I am not aware of any such system.
Since fgetc "obtains that character as an unsigned char converted to
int" it is obviously possible for it to read the representation that for
char could be a trap. Since the fgets and friends are defined in terms
of fgetc (section 7.19.3 para 11) the representation they store must
IMHO be that of the unsigned char, especially as there is the one bit
pattern that could be a trap for signed char.
So, going back to the original question, which has fallen off on this
quote, if you have some form of byte array that has been read from a
file by fgetc then I believe the technically correct method would be to
use an unsigned char pointer to read the values, since with a char
pointer you could read a trap representation and in any case for 1s
complement or sign-magnitude reading with a char pointer then casting to
unsigned char would change the bit pattern and this would IMHO be wrong.
If, on the other hand, you are passing a string literal a byte at a time
to isupper, toupper etc, then using a char pointer and casting to
unsigned char would IMHO be the correct thing.
All in all, I think it is a bit of a mess if char is signed when it
comes to the library functions. However, the standard committee probably
inherited a mess from the existing practice.