Abstraction layer between C and CPU

M

Mike Wahler

Thomas Stegen said:
Mike said:
[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Suggestion for new FAQ, and request for answer:

Q: How do I clear the context?
A: __________________________

-Mike
 
M

Mike Wahler

Jonathan Burd said:
Thomas said:
Mike said:

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

A 'C byte' must have at least eight bits.
To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.

IOW a char must fit in a byte.
I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

I'll let you decide if this is explicit or not:

ISO/IEC 9899:1999 (E)

5.2.4.2.1 Sizes of integer types <limits.h>

1 The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives. Moreover, except
for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by
expressions that have the same type as would an expression that
is an object of the corresponding type converted according to the
integer promotions. Their implementation-defined values shall be
equal or greater in magnitude (absolute value) to those shown, with
the same sign.

-- number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8


-Mike
 
V

Vinko Vrsalovic

Mike Wahler wrote:

[...regarding char type...]
Correct. It's simply the 'smallest addressible unit of storage',
which is required to have a minimum size of eight bits, but can
be larger (and often is on certain architectures). From a C
perspective, 'byte' and 'character' are synonymous.

What about unsigned char then? Shouldn't it be one bit shorter than
char?

If it is one bit shorter, then I tend to think that char isn't the
'smallest addressible unit of storage', or, if it isn't one bit
shorter, what's the point of having that type?

I'm probably missing some fundamental concept, which I'm hoping you can
clarify for me.

V.
 
M

Mike Wahler

Vinko Vrsalovic said:
Mike Wahler wrote:

[...regarding char type...]
Correct. It's simply the 'smallest addressible unit of storage',
which is required to have a minimum size of eight bits, but can
be larger (and often is on certain architectures). From a C
perspective, 'byte' and 'character' are synonymous.

What about unsigned char then?

All the character types have a size of one (byte) by
definition.
Shouldn't it be one bit shorter than
char?
No.


If it is one bit shorter,

It's not.
then I tend to think that char isn't the
'smallest addressible unit of storage',

A byte is. An object of any of the (3) character types
must fit in a byte.
or, if it isn't one bit
shorter, what's the point of having that type?

For representing unsigned values. One 'side-effect'
is that 'unsigned char' has twice the range as 'signed char'.
(Note that 'plain' char type may be either signed or unsigned,
that's up to the implementation. But 'char', 'signed char',
and 'unsigned char' are treated as three distinct types.)
I'm probably missing some fundamental concept, which I'm hoping you can
clarify for me.

I hope I did.

-Mike
 
K

Keith Thompson

Mike Wahler said:
Not necessarily. Also note that the addresses of two separate
objects will not necessarily reflect their relationship in
source code. e.g.:

int i;
int j;

the address of 'j' need not be greater than address of 'i'
nor is their difference guaranteed to be sizeof(int).
(the only time this *is* guaranteed is when the
objects are adjacent elements (the subscript of one
is one more or less than the subscript of the other)
of the same array).

It's actually worse than that. Comparing the addresses of two
distinct objects (using <, <=, >, or >=) invokes undefined behavior
unless the objects are both part of some larger object. (Equality or
inequality comparson is ok.)

[...]
-- I've seen code where two or more arrays would be declared side by
side

C has rather 'free' formatting rules, e.g. more than one
declaration or statment can appear on a single line.
int array1[] = {1,2,3}; int array2[] = {4,5,6};

However I recommend against this practice.

I don't think the OP was asking about the arrangement in the source.

The following:

int a[10], b[10], c[20];

int a[10]; int b[10]; int c[20];

int a[10];
int b[10];
int c[20];

are exactly equivalent as far as the language is concerned. In all
three cases, the language guarantees nothing about the placement of
the three array objects in memory. They might or might not be
adjacent, and they could be in any order. (A compiler could choose a
different layout depending on what the source looks like, but it could
just as easily choose a different layout depending on the phase of the
moon.)

It is possible for a program to find out whether they're contiguous;
for example, if a+10 == b, then a and b are contiguous. But no sane
program should ever take advantage of this. Just treat them as three
distinct objects.
 
K

Keith Thompson

Mike Wahler said:
For representing unsigned values. One 'side-effect'
is that 'unsigned char' has twice the range as 'signed char'.

Not typically. On a typical 8-bit two's-complement implementation,
unsigned char has a range of 0..255, and signed char has a range of
-128..+127.
(Note that 'plain' char type may be either signed or unsigned,
that's up to the implementation. But 'char', 'signed char',
and 'unsigned char' are treated as three distinct types.)

Yes, and in fact plain char must have the same characteristics as
either signed char or unsigned char. (Your description left open the
possibility of plain char being an (un)signed type with a different
range than (un)signed char.)
 
K

Keith Thompson

Chris Croughton said:
On Fri, 21 Jan 2005 10:11:21 +0000, Thomas Stegen


Use "characters" or "chars" to refer to the C entities and "octets" to
refer to the 8 bit quanities, and shun the overloaded term "bytes".


It isn't, even those of use who have worked on machines with odd byte
lengths often now use it only ablut 8 bit quantities, because that
represents the vast majority of machines these days (most of the DSP
programmers I know refer to the basic -- and only -- memory units as
"words").

The C standard uses the term "byte" extensively, and it uses it
consistently in the way defined in the standard, never as a
specifically 8-bit quantity. In my opinion, there's no need to shun
the use of the word "byte" in this newsgroup. The meaning is clear in
this context.
 
M

Mike Wahler

Keith Thompson said:
Not typically. On a typical 8-bit two's-complement implementation,
unsigned char has a range of 0..255, and signed char has a range of
-128..+127.

Oops, I meant twice the range of positive values.
Yes, and in fact plain char must have the same characteristics as
either signed char or unsigned char. (Your description left open the
possibility of plain char being an (un)signed type with a different
range than (un)signed char.)

That's me, incomplete. :)

-Mike
 
A

Albert van der Horst

I have seen that code too: In FORTRAN in the seventies.
Not in C lately.
int a[10], b[10], c[20];
int i;
for(i = 0; i < 40; i++)
{
a = 0; /* zero initializes all three arrays */
}


No, you can't rely on that, even if you only care about the "virtual"
addresses. Accessing an array element outside of its defined range
of indices is forbidden and leads to undefined behaviour. That code
may work on a certain platform when compiled with a certain compiler
but there's no guarantee that it works with any other compiler or on
a different platform.


An interesting compiler would be one that takes care of enforcing
array bounds by using hardware means (e.g. a b and c would each have
their own so-called descriptors in an Intel architecture.)
It could very well be that b[0] has
the same physical address as a[10] would have (so to speak),
but addressing it through the ``b descriptor succeeds, but through the
``a' descriptor fails with a memory fault.
A compiler that takes care to use the correct descriptor would be
perfectly conforming, but would do a good job in killing
unwarranted assumptions or wrong headed conclusions from experiments.

It would be a nice compiler to have at universities.
 
W

Walter Roberson

:An interesting compiler would be one that takes care of enforcing
:array bounds by using hardware means (e.g. a b and c would each have
:their own so-called descriptors in an Intel architecture.)

Ah, like VAX.

It's been more than 20 years, so my memory is probably faulty, but I
seem to recall that DEC had a hard time porting [K&R] C to VAX while
still permitting type-casting. I think I heard that they effectively
ended up turning off all descriptors. Hmmm, when I heard about it then
I didn't think about that, but now I realize that it likely wasn't
really possible to turn off descriptors, since they were
hardware-level. So they perhaps ended up doing something like declaring
all of memory as one large array of bytes. Pretty much the same
solution as on Intel segmented architectures when the 286 or so came
out.
 
C

Chris Torek

:An interesting compiler would be one that takes care of enforcing
:array bounds by using hardware means (e.g. a b and c would each have
:their own so-called descriptors in an Intel architecture.)

Ah, like VAX.

It's been more than 20 years, so my memory is probably faulty, but I
seem to recall that DEC had a hard time porting [K&R] C to VAX while
still permitting type-casting. I think I heard that they effectively
ended up turning off all descriptors. ...

The VAX was a pretty conventional machine. VMS used lots of
descriptors but they were not implemented in hardware. You could
be thinking of the INDEX instruction, perhaps; but it merely did
a range check and multiply, and was tremendously outperformed by
doing a separate range-check-and-multiply on any 11/780 with hardware
multiply, because INDEX always used a microcode loop instead of
using the hardware. (Oops.)

The old Burroughs A-series machines would have been much trickier.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,416
Latest member
LionelQ387

Latest Threads

Top