A byte can be greater than 8 bits?

B

borophyll

As I read it, C99 states that a byte is an:

"addressable unit of data storage large enough to hold any member of
the basic character
set of the execution environment" (3.6)

and that a byte must be at least 8 bits:

"The values given below shall be replaced by constant expressions
suitable for use in #if
preprocessing directives. Moreover, except for CHAR_BIT and
MB_LEN_MAX, the
following shall be replaced by expressions that have the same type as
would an
expression that is an object of the corresponding type converted
according to the integer
promotions. Their implementation-defined values shall be equal or
greater in magnitude
(absolute value) to those shown, with the same sign."

number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8 (5.2.4.2.1)

Does this mean that a byte can be larger than 8 bits (ie CHAR_BIT >
8)? I have gotten the impression that a byte, or unsigned char, was
always 8 bits, but perhaps I was wrong. If I am not, is there
somewhere in the standard that defines a byte as always being 8 bits?

Regards,
B.
 
R

Roberto Waltman

As I read it, C99 states that a byte is an:
"addressable unit of data storage large enough to hold any member of
the basic character set of the execution environment" (3.6)

and that a byte must be at least 8 bits:
...
Does this mean that a byte can be larger than 8 bits (ie CHAR_BIT >
8)? I have gotten the impression that a byte, or unsigned char, was
always 8 bits, but perhaps I was wrong.

Correct - For example, Texas Instruments C compiler for the DSP 2000
processor family defines CHAR_BIT == 16, and sizeof(char) ==
sizeof(short) == sizeof(int) == 1
 
M

Martin Wells

Does this mean that a byte can be larger than 8 bits (ie CHAR_BIT >
8)?


Yes, a byte can have more than eight bits. Welcome to the world of
portable C programming :D Here's a few other things to look out for in
portable programming:

1: Null pointers aren't necessarily represented as all-bits-zero, so
think twice about using memset(array_of_pointers,0,sizeof
array_of_pointers).

2: Integer types other than unsigned char may contain padding bits, so
stay away from memcmp(arr1,arr2,sizeof arr1).

3: Number systems other than two's complement may be used, so be
careful about doing things like using -1 to represent all-bits-one.

4: Function pointers might not fit inside ANY of the other types (e.g.
such as void* or unsigned long), so don't do that.

5: CHAR_MAX <= SHRT_MAX <= INT_MAX <= LONG_MAX
UCHAR_MAX <= USHRT_MAX <= UINT_MAX <= ULONG_MAX
It's possible to have UCHAR_MAX > LONG MAX

6: Learn about integer promotion. It's really not that complicated.
Basically, before an operation can be performed on any kind of value,
it must be promoted to its "promoted" type.
if (INT_MAX >= UCHAR_MAX) then unsigned char and unsigned short
are promoted to int, otherwise they're promoted to unsigned int.

7: Conversions from unsigned integer types to signed integer types
work as expected if the number is within range, otherwise the
behaviour is implementation-defined (i.e. the maker of the compiler
can do what it wants, but it must document what it does in its
manual).

That's all I can come up with off the top of my head.

Martin
 
R

Richard

I would be very interested for posters here to list the current systems
they use which would cause problems by breaking the rules. It might be a
good edition to the FAQ for people to go out and find real systems so as
to understand better why they need to be careful:

Martin Wells said:
Yes, a byte can have more than eight bits. Welcome to the world of
portable C programming :D Here's a few other things to look out for in
portable programming:

A) Systems where a byte is more than 8 bits :
1: Null pointers aren't necessarily represented as all-bits-zero, so
think twice about using memset(array_of_pointers,0,sizeof
array_of_pointers).

B) Systems where a NULL pointer can not be set by applying 0s.
2: Integer types other than unsigned char may contain padding bits, so
stay away from memcmp(arr1,arr2,sizeof arr1).

C) Places where the padding bits are not concistently set between 2
arrays of the same types.
3: Number systems other than two's complement may be used, so be
careful about doing things like using -1 to represent all-bits-one.

D) Where -1 is not an all bits on.
4: Function pointers might not fit inside ANY of the other types (e.g.
such as void* or unsigned long), so don't do that.

E) Where a function pointer can not be stored in a VOID
pointer. (Regardless of style)
 
P

Philip Potter

Richard said:
I would be very interested for posters here to list the current systems
they use which would cause problems by breaking the rules. It might be a
good edition to the FAQ for people to go out and find real systems so as
to understand better why they need to be careful:

I partly agree and partly disagree here. I agree because it shows to new
people that strictly following the standard /is/ important because these
common assumptions aren't necessarily true. But I disagree because it
may encourage the view that if a system cannot be found which differs
from the norm, then it doesn't matter that we're making assumptions.
A) Systems where a byte is more than 8 bits :

DSPs are a common example here.
B) Systems where a NULL pointer can not be set by applying 0s.

What do you mean by "applying 0s"? Null pointers can always be set by:
foo *x = (foo *)0;
(I'm not sure if the cast is necessary) even if the null pointer
representation is not all-bits-zero.

I personally don't know of any non-all-bits-zero null pointer systems,
but I've never seen reason to worry about it. I know /my/ code will work
everywhere regardless. As Kernighan and Ritchie wisely point out, ``if
you don't know how they are done on various machines, that innocence may
help to protect you.''
C) Places where the padding bits are not concistently set between 2
arrays of the same types.

That is user-program dependent. Any system with padding bits can have
them set by the user program:
int main(void) {
unsigned int arr1[2], arr2[2];
arr1[0] = arr1[1] = (unsigned int) -1;
memset(arr2,2*sizeof(unsigned int),1,(unsigned char)-1);
/* At this point, arr1's values equal arr2's values but
the padding bits may differ. */
}
D) Where -1 is not an all bits on.

This is compicated again. For unsigned types, -1 is always all-bits-one.
I can't think why you'd want to set a signed type to all-bits-one.
E) Where a function pointer can not be stored in a VOID
pointer. (Regardless of style)

I can't think why you'd want to do this.
 
R

Richard

Philip Potter said:
I partly agree and partly disagree here. I agree because it shows to
new people that strictly following the standard /is/ important because
these common assumptions aren't necessarily true. But I disagree
because it may encourage the view that if a system cannot be found
which differs from the norm, then it doesn't matter that we're making
assumptions.

No. I think it's important to show that these systems really
exist. Otherwise a lot of people will think it's all a lot of hot
air. It does no harm to demonstrate WHERE the standard benefits the
programmer. Not some airy fairy "there might be a system with a 13 bit
char" for example.
DSPs are a common example here.


What do you mean by "applying 0s"? Null pointers can always be set by:
foo *x = (foo *)0;

See above where it clearly says using memset.
(I'm not sure if the cast is necessary) even if the null pointer
representation is not all-bits-zero.

I personally don't know of any non-all-bits-zero null pointer systems,
but I've never seen reason to worry about it. I know /my/ code will
work everywhere regardless. As Kernighan and Ritchie wisely point out,
if you don't know how they are done on various machines, that
innocence may help to protect you.''

Yes, but this has nothing to do with the question. The question being
posed is to demonstrate where this might be an issue if you do NOT stick
to the rules.
C) Places where the padding bits are not concistently set between 2
arrays of the same types.

That is user-program dependent. Any system with padding bits can have
them set by the user program:
int main(void) {
unsigned int arr1[2], arr2[2];
arr1[0] = arr1[1] = (unsigned int) -1;
memset(arr2,2*sizeof(unsigned int),1,(unsigned char)-1);
/* At this point, arr1's values equal arr2's values but
the padding bits may differ. */
}

This is a forced issue and nothing to do with the question about
compiler or platform specifics. We are talking two arrays of the same
objects begin compared without using memset. In the real word - not the
hypothetical world.
This is compicated again. For unsigned types, -1 is always
all-bits-one. I can't think why you'd want to set a signed type to
all-bits-one.

This is not the question. The question is what real platforms does -1
not get represented by an all bits on. This comes up all the time.
I can't think why you'd want to do this.

This is not the question. The question is where it would not work.
 
P

Philip Potter

Richard said:
No. I think it's important to show that these systems really
exist. Otherwise a lot of people will think it's all a lot of hot
air. It does no harm to demonstrate WHERE the standard benefits the
programmer. Not some airy fairy "there might be a system with a 13 bit
char" for example.

But again, if you don't know how things are done on systems, you will be
protected from it. I know that NULL is not guaranteed to be
all-bits-zero, and I don't know where it is and where it isn't. I'm not
even sure whether or not NULL is all-bits-zero on x86, and I'm happy to
stay that way.

It's good to have a couple of examples of why certain common assumptions
shouldn't be relied on, but compiling a comprehensive list is asking for
people to rely on that list instead.
See above where it clearly says using memset.

Yes, but this has nothing to do with the question. The question being
posed is to demonstrate where this might be an issue if you do NOT stick
to the rules.

This is why writing standard C is the /easy/ way: because you don't need
to compile such compatibility lists.
2: Integer types other than unsigned char may contain padding bits, so
stay away from memcmp(arr1,arr2,sizeof arr1).
C) Places where the padding bits are not concistently set between 2
arrays of the same types.
That is user-program dependent. Any system with padding bits can have
them set by the user program:
int main(void) {
unsigned int arr1[2], arr2[2];
arr1[0] = arr1[1] = (unsigned int) -1;
memset(arr2,2*sizeof(unsigned int),1,(unsigned char)-1);
/* At this point, arr1's values equal arr2's values but
the padding bits may differ. */
}

This is a forced issue and nothing to do with the question about
compiler or platform specifics. We are talking two arrays of the same
objects begin compared without using memset. In the real word - not the
hypothetical world.

Why does your requirement exclude use of memset? Someone who uses
memcmp() will also use memset(). Someone who uses memset() may not do so
for every array he uses.
This is not the question. The question is what real platforms does -1
not get represented by an all bits on. This comes up all the time.

Give me an example of code which relies on -1 being all-bits-one.
This is not the question. The question is where it would not work.

Why is that question relevant if noone tries to do it? I don't see you
asking for a list of platforms where NULL is 0xdeadbeef, because no code
depends on that.
 
J

jacob navia

Philip said:
Give me an example of code which relies on -1 being all-bits-one.

#include <stdio.h>
int maskNLowerBits(unsigned d,int n)
{
return d & (-1 << n);
}

int main(void)
{
unsigned i;
unsigned u = (unsigned)-1;

for (i=0; i<32;i++)
printf("Masking lower %d bits of %x is: %x\n",
i,u,maskNLowerBits(u,i));
}

Output:
Masking lower 0 bits of ffffffff is: ffffffff
Masking lower 1 bits of ffffffff is: fffffffe
Masking lower 2 bits of ffffffff is: fffffffc
Masking lower 3 bits of ffffffff is: fffffff8
Masking lower 4 bits of ffffffff is: fffffff0
Masking lower 5 bits of ffffffff is: ffffffe0
Masking lower 6 bits of ffffffff is: ffffffc0
Masking lower 7 bits of ffffffff is: ffffff80
Masking lower 8 bits of ffffffff is: ffffff00
Masking lower 9 bits of ffffffff is: fffffe00
Masking lower 10 bits of ffffffff is: fffffc00
Masking lower 11 bits of ffffffff is: fffff800
Masking lower 12 bits of ffffffff is: fffff000
Masking lower 13 bits of ffffffff is: ffffe000
Masking lower 14 bits of ffffffff is: ffffc000
Masking lower 15 bits of ffffffff is: ffff8000
Masking lower 16 bits of ffffffff is: ffff0000
Masking lower 17 bits of ffffffff is: fffe0000
Masking lower 18 bits of ffffffff is: fffc0000
Masking lower 19 bits of ffffffff is: fff80000
Masking lower 20 bits of ffffffff is: fff00000
Masking lower 21 bits of ffffffff is: ffe00000
Masking lower 22 bits of ffffffff is: ffc00000
Masking lower 23 bits of ffffffff is: ff800000
Masking lower 24 bits of ffffffff is: ff000000
Masking lower 25 bits of ffffffff is: fe000000
Masking lower 26 bits of ffffffff is: fc000000
Masking lower 27 bits of ffffffff is: f8000000
Masking lower 28 bits of ffffffff is: f0000000
Masking lower 29 bits of ffffffff is: e0000000
Masking lower 30 bits of ffffffff is: c0000000
Masking lower 31 bits of ffffffff is: 80000000

OK?

And you haven't given ANY example as Richard asked!
 
E

Eric Sosman

jacob said:
#include <stdio.h>
int maskNLowerBits(unsigned d,int n)
{
return d & (-1 << n);
}

Maybe Philip should have specified "code that doesn't
invoke undefined behavior." After the fix s/1/1u/ the
corrected code no longer relies on -1 being all-bits-one.
 
J

jacob navia

Eric said:
Maybe Philip should have specified "code that doesn't
invoke undefined behavior." After the fix s/1/1u/ the
corrected code no longer relies on -1 being all-bits-one.
??
Is it legal to make -1u??
Unary minus applied to unsigned operand?
 
J

jacob navia

Eric said:
Yes: unary minus takes an "arithmetic type" (6.5.3.3).
Anyway, that relies still in that all bits not dropping
of the left side are all 1. And that did not change with this minor
cosmetic change!
 
J

jacob navia

Philip Potter wrote:

[snip]

The fact that Philippe could not bring a single example
means probably that there isn't any machines where
all those terrible conditions apply!
 
R

Richard

Philip Potter said:
But again, if you don't know how things are done on systems, you will
be protected from it. I know that NULL is not guaranteed to be
all-bits-zero, and I don't know where it is and where it isn't. I'm
not even sure whether or not NULL is all-bits-zero on x86, and I'm
happy to stay that way.

It's good to have a couple of examples of why certain common
assumptions shouldn't be relied on, but compiling a comprehensive list
is asking for people to rely on that list instead.


No. It is there to make the point that it really is true. I never asked
for a comprehensive list. Just some examples.

"Look, if you do it the proper way wou wont get in a jam like you would
if you did it that way and your code moves to platform X".
 
B

Ben Bacarisse

jacob navia said:
??
Is it legal to make -1u??
Unary minus applied to unsigned operand?

If you suspect it is not, why not say why?

It seems to me to to be perfectly well defined. The '1u' has promoted
type 'unsigned int', the '-' negates this. The result (mathematical
-1) is converted back to the promoted type (unsigned int) by
repeatedly adding (or subtracting) one more than the maximum value
representable in that type (UINT_MAX + 1).
 
J

jacob navia

Ben said:
If you suspect it is not, why not say why?

It seems to me to to be perfectly well defined. The '1u' has promoted
type 'unsigned int', the '-' negates this. The result (mathematical
-1) is converted back to the promoted type (unsigned int) by
repeatedly adding (or subtracting) one more than the maximum value
representable in that type (UINT_MAX + 1).

Anyway...

AND the rest of the code supposes that this is all ones.
 
C

Charlie Gordon

jacob navia said:
#include <stdio.h>
int maskNLowerBits(unsigned d,int n)
{
return d & (-1 << n);
}

Jacob, you know this invokes undefined behaviour.

You should have written ``-1U << n'' and that would be portable across all
representations.

Personally, I would prefer if Standard C mandated 2s-complement and defined
the behaviour of shifting signed integer types left and right. Maybe we
should define a new language, Real-C where these awkward historical detail
would be removed...
int main(void)
{
unsigned i;
unsigned u = (unsigned)-1;

Cast is useless.
If the compiler complains, use ``-1U''.
for (i=0; i<32;i++)
printf("Masking lower %d bits of %x is: %x\n",
i,u,maskNLowerBits(u,i));
}

Output:
Masking lower 0 bits of ffffffff is: ffffffff
Masking lower 1 bits of ffffffff is: fffffffe ....
Masking lower 30 bits of ffffffff is: c0000000
Masking lower 31 bits of ffffffff is: 80000000

OK?

Mostly irrelevant.
And you haven't given ANY example as Richard asked!

I get the same answers: some DSPs do, once a friend told me of a computer
that did weird stuff...

Some DSPs do have non 8 bit bytes, and some old Crays too.
Using those for anything non specific is bound to cause surprises!

For the rest (non twos-complement, padding bytes in integers...) we can
safely ignore the issues.
 
C

Charlie Gordon

Philip Potter said:
Richard wrote:
C) Places where the padding bits are not concistently set between 2
arrays of the same types.

That is user-program dependent. Any system with padding bits can have them
set by the user program:
int main(void) {
unsigned int arr1[2], arr2[2];
arr1[0] = arr1[1] = (unsigned int) -1;
memset(arr2,2*sizeof(unsigned int),1,(unsigned char)-1);
/* At this point, arr1's values equal arr2's values but
the padding bits may differ. */
}

No, nasal demons flew out of my nose... because it matters more to pass
arguments to memset in correct numbers and order which the compiler would
have hinted at had you included <string.h>.
 
B

Ben Bacarisse

jacob navia said:
Anyway...

AND the rest of the code supposes that this is all ones.

It is guaranteed to be all ones in the value bits -- which is just
about as much as one can hope for. On an implementation with padding
bits in unsigned types of size > 1, one could do tricks involving
arrays of unsigned char to get at padding bits, but "masking for bits"
usually means masking for "value bits".

What problem are you seeing? Have I missed something here? Eric's
remark simply means that -1u does not depend on -1 being all-bits-one.
 
P

Philip Potter

jacob said:
Anyway...

AND the rest of the code supposes that this is all ones.

...which is not necessarily the case because of padding bits. But relying
on -1u being all ones is not the same as relying on -1 being all ones.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,828
Latest member
LauraCastr

Latest Threads

Top