Implementations with CHAR_BIT=32

S

Skarmander

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Just out of curiosity, does anyone know actual implementations that have
this?

S.
 
B

Ben Pfaff

Skarmander said:
The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.
Just out of curiosity, does anyone know actual implementations that have
this?

I have heard that some DSPs use this model, but not hosted
implementations.
 
S

Skarmander

Ben said:
Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.
<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

S.
 
B

Ben Pfaff

Skarmander said:
I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

Something like this is often used to detect end-of-file or error:
if (getc(file) == EOF) {
/* ...handle error or end of file... */
}
If "int" and "char" have the same range, then a return value of
EOF doesn't necessarily mean that an error or end-of-file was
encountered.
 
M

Michael Mair

Skarmander said:
<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

- Functions from <ctype.h> have an int parameter which is either
representable by unsigned char or equals the value of the macro
EOF (which is a negative integral constant expression).

- fgetc()/getc returns either the next character as unsigned char
converted to an int or EOF; with fputc(), you write an int which
is a char converted to unsigned char.

If the value range of unsigned char is not contained in int we
have signed integer overflow. If we, for one moment, assume this
overflow is well-defined and "wraps around" just as in the unsigned
integer case, then we still have the problem that we cannot
discern whether EOF is intended to be EOF or (int)((unsigned char) EOF).

So, character based I/O and <ctype.h> gives us some trouble.


Cheers
Michael
 
J

Jordan Abel

<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set
(which fails), and that the character set is "small" for some
reasonable value of "small", which does not include "32 bits" (this
will probably still hold).

eh - 20 bits [actually 20.0875-ish] bits is still pretty large, and
those will need to be stored in 32 bits most likely.
Either that or the application really wants 8-bit bytes, but is
using UCHAR_MAX because it looks neater (which could be considered a
bug, not just an assumption).

I don't quite see the EOF problem, though. It's probably just my
lack of imagination, but could you give a code snippet that fails?

It would be possible if there actually were a 32-bit character set, or
if getchar() read four bytes at a time from the file.
 
K

Keith Thompson

Jordan Abel said:
It would be possible if there actually were a 32-bit character set, or
if getchar() read four bytes at a time from the file.

getchar() by definition reads one byte at a time from the file.
A byte may be larger than 8 bits.
 
P

pete

Skarmander said:
<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though.
It's probably just my lack of
imagination, but could you give a code snippet that fails?

int putchar(int c);

putchar returns either ((int)(unsigned char)c) or EOF.

If sizeof(int) equals one, and c is negative,
then (unsigned char)c is greater than INT_MAX,
and that means that ((int)(unsigned char)c)
would be implementation defined
and possibley negative, upon success.
 
S

slebetman

Skarmander said:
<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

S.

See http://www.homebrewcpu.com/projects.htm
and scroll down to his last project "LCC retargeting for D16/M homebrew
computer".
The D16/M is a hobbyist-designed, homebrew CPU which is fully 16 bits.
It cannot even handle 8 bits. So the architecture has 16bit chars,
16bit shorts, 16bit ints and 16bit pointers.
And the compiler worked quite well...
If you are never going to process text files generated on other
systems, there's no reason for chars to be 8 bits.
 
J

Jack Klein

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Just out of curiosity, does anyone know actual implementations that have
this?

S.

I used an Analog Devices 32-bit SHARC DSP a few years ago, I forget
the exact model, where CHAR_BIT was 32 and all the integer types (this
was before long long) were 32 bits.

I currently do a lot of work with a Texas Instruments DSP in the
TMS32F28xx family where CHAR_BIT is 16 and the char, short, and int
types all share the same representation and size.

I imagine other DSPs from these and other manufacturers are similar,
although Freescale (was Motorola) has a 16 bit DSP that they say
supports CHAR_BIT 8, although I haven't used it.
 
T

Tim Rentsch

Ben Pfaff said:
Skarmander said:
The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. [...]

Just a reminder that CHAR_BIT == 32 and sizeof(int) == 1 both being
true doesn't automatically imply either that INT_MAX == CHAR_MAX or
that INT_MIN == CHAR_MIN. In particular,

INT_MAX 2147483647 CHAR_MAX 1073741823
INT_MIN -2147483648 CHAR_MIN -1073741824

are allowed, or even

INT_MAX 2147483647 CHAR_MAX 127
INT_MIN -2147483648 CHAR_MIN -128

are allowed.

What I would expect in a hosted implementation with CHAR_BIT == 32
and sizeof(int) == 1 is

INT_MAX 2147483647 CHAR_MAX 2147483647
INT_MIN -2147483648 CHAR_MIN -2147483647

So EOF could be -2147483648 and there would be no conflict with any
character value. Of course, on such a system, outputting binary data
would most likely be done with unsigned char rather than char.
 
R

Richard Bos

Tim Rentsch said:
Ben Pfaff said:
Skarmander said:
The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. [...]

Just a reminder that CHAR_BIT == 32 and sizeof(int) == 1 both being
true doesn't automatically imply either that INT_MAX == CHAR_MAX or
that INT_MIN == CHAR_MIN. In particular,

INT_MAX 2147483647 CHAR_MAX 1073741823
INT_MIN -2147483648 CHAR_MIN -1073741824

are allowed, or even

INT_MAX 2147483647 CHAR_MAX 127
INT_MIN -2147483648 CHAR_MIN -128

are allowed.

Not for char, it isn't. Other types can have padding bits; char
(unsigned in any case, and AFAIK since a TC some time ago the other
kinds, too) can not.

Richard
 
T

Tim Rentsch

Tim Rentsch said:
Ben Pfaff said:
The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. [...]

Just a reminder that CHAR_BIT == 32 and sizeof(int) == 1 both being
true doesn't automatically imply either that INT_MAX == CHAR_MAX or
that INT_MIN == CHAR_MIN. In particular,

INT_MAX 2147483647 CHAR_MAX 1073741823
INT_MIN -2147483648 CHAR_MIN -1073741824

are allowed, or even

INT_MAX 2147483647 CHAR_MAX 127
INT_MIN -2147483648 CHAR_MIN -128

are allowed.

Not for char, it isn't. Other types can have padding bits; char
(unsigned in any case, and AFAIK since a TC some time ago the other
kinds, too) can not.

I need to ask for a reference. It's true that unsigned char can't
have padding bits, but I can't find any evidence that signed char
can't have padding bits. Apparently it _was_ true in C89/C90 that the
committee didn't expect that signed char's would ever have padding
bits; however, it seems like this unstated assumption was cleared up
in a DR (DR 069, I believe). The language in 6.2.6.2 p1,p2 seems to
say fairly plainly that padding bits aren't allowed for unsigned char
but are allowed for all other integer types.

There was a posting from Doug Gwyn in comp.std.c on July 12 of this
year saying that signed char's could have padding bits. A search in
Google Groups

"signed char" "padding bits" numeration

should turn that up. The message was:



Also: even if signed char's have no padding bits, it's still
true that INT_MIN < SCHAR_MIN can hold, which lets INT_MIN
serve as a value for EOF.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,039
Messages
2,570,376
Members
47,034
Latest member
MickieEgan

Latest Threads

Top