How does C cope if architecture doesn't address bytes?

J

James Harris

My K&R 2nd ed has in the Reference Manual appendix, A7.4.8 sizeof yields
the number of BYTES required to store an object of the type of its operand.
What happens if C is running on a machine that addresses larger words only?
Shouldn't sizeof be defined to return the smallest number of 'storage
units' required to store an object of the type of its operand?

As a general point, is there a guide to what aspects of C would fail if run
on a machine which is not 8-bit byte addressed? IIRC the some of the Crays
define a char, an int and a long int all as 64-bit and some PICs use 12-bit
words.

I have a sneaking suspicion that C is so well defined (or, mature) that it
if used properly it would not run in to too much bother on these machines.
Any idea of where it would go wrong, if anywhere?
 
M

Michael Mair

Hello James,

James said:
My K&R 2nd ed has in the Reference Manual appendix, A7.4.8 sizeof yields
the number of BYTES required to store an object of the type of its operand.
What happens if C is running on a machine that addresses larger words only?
Shouldn't sizeof be defined to return the smallest number of 'storage
units' required to store an object of the type of its operand?

As a general point, is there a guide to what aspects of C would fail if run
on a machine which is not 8-bit byte addressed? IIRC the some of the Crays
define a char, an int and a long int all as 64-bit and some PICs use 12-bit
words.

I have a sneaking suspicion that C is so well defined (or, mature) that it
if used properly it would not run in to too much bother on these machines.
Any idea of where it would go wrong, if anywhere?

Er, note one thing: Bytes do not necessarily consist of 8 Bits.
In <limits.h>, CHAR_BIT is defined which tells you the number of
bits per byte. CHAR_BIT is at least 8.

Bytes are the smallest adressable memory units your _implementation_
offers. It may well be that the hardware architecture allows only to
address processor words of, say, X bits but the compiler offers you
a data type consisting of Y bits, X=m*Y, m>1, and enables you to use
pointers to address variables of that data type at Y-bit boundaries.

If you want to be sure how many bits you are working with, use
<limits.h> or, in C99, the (u)intN_t, N=8,16,32,64, integral data types
offered by <stdint.h>.


Cheers
Michael
 
G

Goran Larsson

James Harris said:
My K&R 2nd ed has in the Reference Manual appendix, A7.4.8 sizeof yields
the number of BYTES required to store an object of the type of its operand.
True.

What happens if C is running on a machine that addresses larger words only?

That is no problem.
Shouldn't sizeof be defined to return the smallest number of 'storage
units' required to store an object of the type of its operand?

It already does this. The 'storage units' in C are bytes.
As a general point, is there a guide to what aspects of C would fail if run
on a machine which is not 8-bit byte addressed?

Nothing fails. Not all bytes are eight bits and the byte used in the C
standard is certainly not restricted to be eight bits.

| 3.6
| 1 byte
| addressable unit of data storage large enough to hold any member
| of the basic character set of the execution environment
|
| 2 NOTE 1 It is possible to express the address of each individual
| byte of an object uniquely.
|
| 3 NOTE 2 A byte is composed of a contiguous sequence of bits, the
| number of which is implementation-defined. The least significant
| bit is called the low-order bit; the most significant bit is
| called the high-order bit.
IIRC the some of the Crays
define a char, an int and a long int all as 64-bit and some PICs use 12-bit
words.

Then the C implementation would use 64 bits or 12 bits for its byte. The
Cray implementation could, with a performance penalty, fake eight bit
bytes by using non-native addresses (add a three bit field to the
address) and use pack/unpack code for all byte accesses.
I have a sneaking suspicion that C is so well defined (or, mature) that it
if used properly it would not run in to too much bother on these machines.
Any idea of where it would go wrong, if anywhere?

The problems are much more likely to appear in applications written
by programmers that "knows" that a byte is eight bits.
 
T

Trent Buck

Quoth Goran Larsson on or about 2004-11-13:
The problems are much more likely to appear in applications written
by programmers that "knows" that a byte is eight bits.

Is the 'char' type always exactly one byte wide?

-trent
 
R

Rich Gibbs

Trent Buck said the following, on 11/13/04 11:47:
Quoth Goran Larsson on or about 2004-11-13:



Is the 'char' type always exactly one byte wide?

Yes, sizeof(char) == 1. However, the size of a byte (in bits) is NOT
always 8.
 
G

Gordon Burditt

My K&R 2nd ed has in the Reference Manual appendix, A7.4.8 sizeof yields
the number of BYTES required to store an object of the type of its operand.

Forget anything you may have read outside the context of C about
how big a byte is. A byte is the same size as a char, and neither
is necessarily 8 bits. A char and a byte are *AT LEAST* 8 bits.
What happens if C is running on a machine that addresses larger words only?
Shouldn't sizeof be defined to return the smallest number of 'storage
units' required to store an object of the type of its operand?

sizeof(char) == 1, by definition. On a machine that addresses larger
words only, char probably IS one of those larger words.
As a general point, is there a guide to what aspects of C would fail if run
on a machine which is not 8-bit byte addressed?

C does not assume 8-bit bytes. Bytes are AT LEAST 8 bits, but
Standard C will not fail if a byte happens to be 9 or 12 or 23 bits.
Unportable programs can fail in all sorts of stupid ways. Some
machines, incidentally, use character sets that won't fit in 8 bits
anyway.
IIRC the some of the Crays
define a char, an int and a long int all as 64-bit and some PICs use 12-bit
words.

I have a sneaking suspicion that C is so well defined (or, mature) that it
if used properly it would not run in to too much bother on these machines.
Any idea of where it would go wrong, if anywhere?

An interesting case to consider is the GE-635, used at Dartmouth
College in 1969 and 1970 (at least that's when I used it) for the
Dartmouth Time-Sharing System. I believe the smallest addressable
unit was 36 bits (maybe 18). There was an instruction called Insert
Character, which used a special kind of fat pointer used in
character-based instructions called a "tally word" stored in memory
(and was frequently used in "auto-increment" mode) which contained
(a) a memory address, (b) a character counter, (c) a flag indicating
whether characters were 6 or 9 bits, and (d) a character counter.
There was a similar instruction for storing characters.

Now: you are the implementor. Do you use 9-bit characters (6-bit
characters are not an allowed choice) and fat pointers for char
pointers, or do you use 36-bit characters and normal pointers?
C didn't exist then, but this machine was rather low on memory as
were all machines of its era.

Even if the character-handling instructions referred to above did not
exist, it WOULD be possible to cobble together software-only support
for 9-bit characters using a fat pointer, and library routines that,
given a fat pointer, can extract the right character out of memory
and load it into a register. This would probably be so slow that
it would be considered unthinkable.

Gordon L. Burditt
 
M

Mark F. Haigh

James said:
My K&R 2nd ed has in the Reference Manual appendix, A7.4.8 sizeof yields
the number of BYTES required to store an object of the type of its operand.
What happens if C is running on a machine that addresses larger words only?
Shouldn't sizeof be defined to return the smallest number of 'storage
units' required to store an object of the type of its operand?

There's a couple of solutions that have been used (by compilers):

1. Fake 8 bit bytes, with a performance penalty. Couple loads and
stores with the appropriate mask and shift instructions. Do some
similar magic for pointers to char and void (which must have identical
representation).

2. Define CHAR_BIT to the architecture's smallest addressable unit.
For example, on some DSP implementations, char, short, and int are all
32 bits.

The choice between the two may be configurable via command-line switches
to the compiler.

Carefully written highly portable C programs can run on either #1 or #2,
with #2 being preferable. However, IIRC, POSIX defines CHAR_BIT to be
8, and therefore POSIX compliant programs that are not also highly
portable C programs are likely to have problems with #2.

C is especially prevalent in embedded systems. In many of these systems
it's the only "high"-level language available. Many of the readers of
this group have to regularly deal with option #2 on these systems, which
in part explains the general disdain for the "all the world's a
foo"-assuming code floating around.
As a general point, is there a guide to what aspects of C would fail if run
on a machine which is not 8-bit byte addressed? IIRC the some of the Crays
define a char, an int and a long int all as 64-bit and some PICs use 12-bit
words.

C won't fail; rather programs making non-portable assumptions will fail.
Exactly how these programs will fail is something that's notoriously
hard to predict.
I have a sneaking suspicion that C is so well defined (or, mature) that it
if used properly it would not run in to too much bother on these machines.
Any idea of where it would go wrong, if anywhere?

Correct. C is very carefully defined so as to not exclude
implementations on strange and wonderful hardware. I/O code that
assumes bytes are octets (ie some networking code), and code that
assumes things about memory layout are the most obvious candidates for
problems.


Mark F. Haigh
(e-mail address removed)
 
D

Dan Pop

My K&R 2nd ed has in the Reference Manual appendix, A7.4.8 sizeof yields
the number of BYTES required to store an object of the type of its operand.
What happens if C is running on a machine that addresses larger words only?

It is the C implementation that defines what is a byte. It can either
make it the size of a word or make it a fraction of a word and perform
all the magic needed to simulate byte addressing. Both approaches have
been used in real world implementations.
Shouldn't sizeof be defined to return the smallest number of 'storage
units' required to store an object of the type of its operand?

It already is. It's just that C calls a storage unit "byte".
As a general point, is there a guide to what aspects of C would fail if run
on a machine which is not 8-bit byte addressed?

None. Addressing on the C abstract machine is not defined in terms of
8-bit bytes. It is defined in terms of CHAR_BIT bytes.

It is C programs assuming CHAR_BIT == 8 that can fail in a multitude of
modes, so wide that it cannot be described.
IIRC the some of the Crays
define a char, an int and a long int all as 64-bit and some PICs use 12-bit
words.

I've never heard about implementations for Cray vector processors using
64-bit bytes, although this is definitely a possibility. I've heard
about two different ways of simulating octet addressing used on Cray
implementations.
I have a sneaking suspicion that C is so well defined (or, mature) that it
if used properly it would not run in to too much bother on these machines.

C itself has no problems on any kind of binary hardware (or hardware with
native support for any positive power of 2). A conforming implementation
for a BCD machine would be horribly inefficient, however.
Any idea of where it would go wrong, if anywhere?

Anywhere where programmers made assumptions not guaranteed by the
definition of the language. CHAR_BIT == 8 is only one of the many such
assumptions.

Dan
 
J

James Harris

Goran Larsson said:
Then the C implementation would use 64 bits or 12 bits for its byte. The
Cray implementation could, with a performance penalty, fake eight bit
bytes by using non-native addresses (add a three bit field to the
address) and use pack/unpack code for all byte accesses.

So, in this hypothetical case, there are two different implementation
options. Say the standard library encoded strings in the packed form it
seems the compiler could, albeit with performance penalty, convert all
string element references such as

my_string

to read or write the appropriate bits. That makes sense but are there any
operations other than string indexing that could fall foul of the
representation? For example,

#include <stdio.h>
#include <malloc.h>
#define CHARLIM 1000
int main(void) {
int i;
char *buf;
buf = (char *) malloc (CHARLIM);
/* check buf */
for (i = 0; i < CHARLIM; i++) {
buf = 'x';
}
return 0;
}

wouldn't this, despite the compiler's cleverness in converting chars from
bytes to a packed form, allocate more memory than necessary? If there are
eight chars packed in to a 'byte' wouldn't the compiler allocate eight
times as much memory as gets used? I think the problem here is assuming
that malloc returns pointers to 'bytes'. I think it returns pointers to
void - and void's have no size.

Basically, my question is, given that my assumptions are correct (about
which I am by no means certain), how /should/ I write the above code so it
is portable to any implementation?
 
C

Christian Bau

"James Harris said:
Goran Larsson said:
Then the C implementation would use 64 bits or 12 bits for its byte. The
Cray implementation could, with a performance penalty, fake eight bit
bytes by using non-native addresses (add a three bit field to the
address) and use pack/unpack code for all byte accesses.

So, in this hypothetical case, there are two different implementation
options. Say the standard library encoded strings in the packed form it
seems the compiler could, albeit with performance penalty, convert all
string element references such as

my_string

to read or write the appropriate bits. That makes sense but are there any
operations other than string indexing that could fall foul of the
representation? For example,

#include <stdio.h>
#include <malloc.h>
#define CHARLIM 1000
int main(void) {
int i;
char *buf;
buf = (char *) malloc (CHARLIM);
/* check buf */
for (i = 0; i < CHARLIM; i++) {
buf = 'x';
}
return 0;
}

wouldn't this, despite the compiler's cleverness in converting chars from
bytes to a packed form, allocate more memory than necessary? If there are
eight chars packed in to a 'byte' wouldn't the compiler allocate eight
times as much memory as gets used? I think the problem here is assuming
that malloc returns pointers to 'bytes'. I think it returns pointers to
void - and void's have no size.

Basically, my question is, given that my assumptions are correct (about
which I am by no means certain), how /should/ I write the above code so it
is portable to any implementation?


It is portable. If we assume that in this implementation malloc will
always return whole native words, then it would return a pointer to 125
native words = 1000 bytes.

Now lets say you want to allocate an array for 100 doubles; one double
is a native word of eight bytes. You will have sizeof (double) = 8, 100
* sizeof (double) = 800, and malloc (800) returns a pointer to an array
of 100 native words.

One thing that might catch you out is that a cast from double* to char*
to intptr_t to double* will most likely result in garbage, because the
casts to and from intptr_t most likely don't change the respresentation,
but the cast from double* to char* will change the representation. Cast
from double* to intptr_t to double* would be fine, and so would be a
cast from double* to char* to intptr_t to char* to double*, but not
casting from char* to intptr_t and from intptr_t to double*.
 
D

Dan Pop

In said:
One thing that might catch you out is that a cast from double* to char*
to intptr_t to double* will most likely result in garbage, because the
casts to and from intptr_t most likely don't change the respresentation,

This is not necessarily true on such an implementation, which may want
to make all addresses byte addresses when converted to intptr_t.
but the cast from double* to char* will change the representation.

So what? The only problematic scenario is that starting with a char *
that is not aligned on a word boundary. But you're starting with a
double *, so all the conversion chain is likely to work without any
problem.
Cast
from double* to intptr_t to double* would be fine, and so would be a
cast from double* to char* to intptr_t to char* to double*, but not
casting from char* to intptr_t and from intptr_t to double*.

Again, if the char * is word aligned, no information is lost in the
conversion.

Dan
 
J

James Harris

Now lets say you want to allocate an array for 100 doubles; one double
is a native word of eight bytes. You will have sizeof (double) = 8, 100
* sizeof (double) = 800, and malloc (800) returns a pointer to an array
of 100 native words.

This is making more sense to me now. I'm a bit confused by the use of 800
in the malloc call above. Would malloc (100 * sizeof(double)) be more
portable? To be portable should all calls to malloc except chars specify
sizeof()?

Incidentally, K&Rv2 in 7.8.5 Storage Management says than malloc returns a
pointer which has the proper alignment for the object in question. I can't
see that it can know for sure what the space is to be used for so assume
that malloc always returns a block of memory with the coarsest alignment
required by the implementation.
One thing that might catch you out is that a cast from double* to char*
to intptr_t to double* will most likely result in garbage, because the
casts to and from intptr_t most likely don't change the respresentation,
but the cast from double* to char* will change the representation. Cast
from double* to intptr_t to double* would be fine, and so would be a
cast from double* to char* to intptr_t to char* to double*, but not
casting from char* to intptr_t and from intptr_t to double*.

I presume you mean that a char pointer would be shifted left three bits (in
this example) and the lower three bits used for the char within the machine
word. What would happen if those bits were not 000 and it was cast to a
double*? I suppose, more to the point, would the same thing happen in any
implementation when a char* is cast to a double* - possibly setting the low
bits to zero?
 
D

Dik T. Winter

> Incidentally, K&Rv2 in 7.8.5 Storage Management says than malloc returns a
> pointer which has the proper alignment for the object in question. I can't
> see that it can know for sure what the space is to be used for so assume
> that malloc always returns a block of memory with the coarsest alignment
> required by the implementation.

This is right.

Indeed, there is nothing in the standard that requires anything but a no-op
when casting from double* to intptr_t, and the other way around. On the
other hand casting from double* to char*, vv., requires a conversion if
needed.
> I presume you mean that a char pointer would be shifted left three bits (in
> this example) and the lower three bits used for the char within the machine
> word. What would happen if those bits were not 000 and it was cast to a
> double*? I suppose, more to the point, would the same thing happen in any
> implementation when a char* is cast to a double* - possibly setting the low
> bits to zero?

On that hypothetical machine, indeed, that would happen. But there is
nothing that forces it to happen on a non byte-addressable machine. The
point is that when you convert from double* to int (or long, or whatever)
there is nothing in the standard that states that the result must have
any meaning. If the machine is byte-addressable, it is likely that no
problem does occur.

I have used one machine where (in the context of double *d) the following
might be false:
d == (double *)(long)(char *)d
and two machines where the following might be false (in the context of
char *c):
c == (char *)(double *)c
 
R

Richard Bos

James Harris said:
This is making more sense to me now. I'm a bit confused by the use of 800
in the malloc call above. Would malloc (100 * sizeof(double)) be more
portable? To be portable should all calls to malloc except chars specify
sizeof()?

Very much so, yes. In fact, there's an even better way than using sizeof
(type). You nearly always assign malloc() to a pointer object. You can
use sizeof *pointer instead of sizeof (type). For example, instead of

struct foo *head;
... lots of code ...
head=malloc(number * sizeof (struct foo));

you can write

struct foo *head;
... lots of code ...
head=malloc(number * sizeof *head);

Now the malloc() call remains correct even if maintenance makes it
necessary for you to make head a struct bar *, instead - and you don't
even have to remember to change all malloc() calls! You can't even
accidentally overlook one.
This style is not, btw, more portable than that with sizeof (type), but
it is more solid and maintainable.
Incidentally, K&Rv2 in 7.8.5 Storage Management says than malloc returns a
pointer which has the proper alignment for the object in question. I can't
see that it can know for sure what the space is to be used

Well, the implementation is allowed to use whatever magic is available
to it. For example, TTBOMK it's legal for it to scan the statement in
which the malloc() call occurs, figure out the type of the pointer which
it is assigned to, and instead of writing CALL (malloc), @ACC, @PTR to
the object code, write CALL (mallocalign), @ACC, @SIZ, @PTR. However...
for so assume that malloc always returns a block of memory with the
coarsest alignment required by the implementation.

....most implementations seem to do this, probably because trying to
optimise alignment would be a lot of trouble for very little gain.

Richard
 
D

Dave Vandervies

Richard Bos said:
Well, the implementation is allowed to use whatever magic is available
to it. For example, TTBOMK it's legal for it to scan the statement in
which the malloc() call occurs, figure out the type of the pointer which
it is assigned to, and instead of writing CALL (malloc), @ACC, @PTR to
the object code, write CALL (mallocalign), @ACC, @SIZ, @PTR. However...

To handle malloc alignment in the compiler rather than the library in
the general case, the implementation magic would have to be more magical
than that.

Consider:
--------
void *debug_malloc(size_t size,char *file,int line)
{
void *ptr=malloc(size);
if(global_params.debug_file)
fprintf(global_params.debug_file,"Malloc %lu bytes at %s:%d: Got %p\n",(unsigned long)size,file,line,ptr);
return ptr;
}
--------
The alignment this call to malloc needs won't be known until run time,
and the compiler won't know when it's compiling code that calls this
function that it needs to know the alignment requirement.


I believe it's also legal to do something like this:
--------
int i;
short *s;
double *d;

s=malloc(sizeof *s * sizeof *d);
for(i=0;i<sizeof *d;i++)
s=42*i;

/*Is direct conversion short * -> double * allowed to do The Wrong Thing?
I suspect it is, but forcing the conversion to go through void * is,
if I'm not mistaken, required to get it right.
*/
d=(void *)s;
for(i=0;i<sizeof *s;i++)
d=42.0/i;
--------
so at the call to malloc the compiler would need to know the alignments
of all the types the pointer might be used to point at.


dave
 
R

Richard Bos

To handle malloc alignment in the compiler rather than the library in
the general case, the implementation magic would have to be more magical
than that.

Hmmm... I was rather imagining an implementation handling malloc
alignment in the implementation, rather than in either part of it. If
compiler and library are so strictly separate that there is no
communication between them, you do lose a lot of these optimisation
tricks, that much is true.
void *debug_malloc(size_t size,char *file,int line)
{
void *ptr=malloc(size);
if(global_params.debug_file)
fprintf(global_params.debug_file,"Malloc %lu bytes at %s:%d: Got %p\n",(unsigned long)size,file,line,ptr);
return ptr;
}

Certainly; but it's also quite legal to apply optimisations when you
know you can, and leave them out when you can't determine their
validity.
I believe it's also legal to do something like this:
--------
int i;
short *s;
double *d;

s=malloc(sizeof *s * sizeof *d);
for(i=0;i<sizeof *d;i++)
s=42*i;

/*Is direct conversion short * -> double * allowed to do The Wrong Thing?
I suspect it is, but forcing the conversion to go through void * is,
if I'm not mistaken, required to get it right.
*/
d=(void *)s;


I'm not sure. The Standard says

# The pointer returned if the allocation succeeds is suitably aligned so
# that it may be assigned to a pointer to any type of object and then
# used to access such an object or an array of such objects in the space
# allocated (until the space is explicitly deallocated).

It is possible to read this as saying that the pointer must be used for
a single type of object, IMO. (Barring unsigned char aliasing, of
course.)

Richard
 
N

Neo

Mark F. Haigh said:
There's a couple of solutions that have been used (by compilers):

1. Fake 8 bit bytes, with a performance penalty. Couple loads and stores
with the appropriate mask and shift instructions. Do some similar magic
for pointers to char and void (which must have identical representation).

2. Define CHAR_BIT to the architecture's smallest addressable unit. For
example, on some DSP implementations, char, short, and int are all 32
bits.

Right, on my TMS320C5402 DSP char, short and int are all 16-bits.
C is not at all a problem here. Its still, the plain old C.

Well if U want to get how many bits, a byte consists of on Ur platform:

printf("sizeof(char) = %d\n", sizeof(char));
printf("Number of bits in a Byte = %d\n", CHAR_BIT);

In my case output is :
sizeof(char) = 1
Number of bits in a Byte = 16

Byte is simply a byte, it can consists of "n" bits.
this is especially important when U R in embedded...
The choice between the two may be configurable via command-line switches
to the compiler.

Carefully written highly portable C programs can run on either #1 or #2,
with #2 being preferable. However, IIRC, POSIX defines CHAR_BIT to be 8,
and therefore POSIX compliant programs that are not also highly portable C
programs are likely to have problems with #2.

C is especially prevalent in embedded systems. In many of these systems
it's the only "high"-level language available. Many of the readers of
this group have to regularly deal with option #2 on these systems, which
in part explains the general disdain for the "all the world's a
foo"-assuming code floating around.


C won't fail; rather programs making non-portable assumptions will fail.
Exactly how these programs will fail is something that's notoriously hard
to predict.


Correct. C is very carefully defined so as to not exclude implementations
on strange and wonderful hardware. I/O code that assumes bytes are octets
(ie some networking code), and code that assumes things about memory
layout are the most obvious candidates for problems.

Exactly, I've to take care when interfacing with devices and outside world
those assumes 8-bit byte interface. I've to convert my 16-bit data to 8-bit
type data and vice-versa. When reading chips I've to mask-off the higher
order byte like this :
value = 0x00FF & *reg_offset;

like wise for sending data to other devices over network splitting is done
like this :
p_data->a_value[0] = HIBYTE(e1card[port_index].los_counter);
p_data->a_value[1] = LOBYTE(e1card[port_index].los_counter);
p_data->a_value[0] = HIBYTE(e1card[port_index].ais_counter);
p_data->a_value[1] = LOBYTE(e1card[port_index].ais_counter);
..
..
..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,154
Messages
2,570,870
Members
47,400
Latest member
FloridaFvt

Latest Threads

Top