to calculate bitsize of a byte

P

pete

John F wrote:
Try this:

unsigned char x = 0;
unsigned int y = INT_MAX; /*assuming less than MAX_INT bits*/

while( y )
{
x |= 1;
x <<= 1
}

How is that loop ever going to end
when nothing changes the value of y?
 
K

Keith Thompson

Jordan Abel said:
Incidentally, UCHAR_MAX cannot be the same as INT_MAX, because int has a
sign bit, and unsigned char may not have padding bits.

Suppose CHAR_BIT==16, UCHAR_MAX==65535, sizeof(int)==2 (32 bits), and
int has 15 padding bits. Exotic, but possible.
 
J

John F

pete said:
How is that loop ever going to end
when nothing changes the value of y?

thanks.... that's my pitfall of writing and not trying to lint it...

should decrement:

while( y-- ) ... of course...

thanks for correction.

regards
John
 
P

pete

Jordan Abel wrote:
But where is this "integral promotion" coming from. I thought that a
value in an expression could only be promoted to the type of another
value involved in the same expression
- not some random type (int) found elsewhere.

new.c prints out "4" on my system.

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
char a = 0;

printf("\nsizeof (a + a) is %u\n", (unsigned)sizeof (a + a));
return 0;
}

/* END new.c */
 
P

pete

John said:
thanks.... that's my pitfall of writing and not trying to lint it...

should decrement:

while( y-- ) ... of course...

Is your intention to iterate that loop
INT_MAX number of times?

I think this is more better:

unsigned char x = -1;
unsigned int y = 0;

while (x >>= 1) {
++y;
}
 
P

Peter Nilsson

Jordan said:
Why is anything being promoted to a type that is not the type of any
subexpression that is involved in the expression?

You mean why does integral promotion exist? That's been covered
numerous times before in clc. But basically, int is supposed to be
the most efficient integer type in a C implementation.
But where is this "integral promotion" coming from. I thought that a
value in an expression could only be promoted to the type of another
value involved in the same expression

That's arithmetic conversion.
- not some random type (int) found elsewhere.

It's not random. Like I said, it has been covered before. [I'm just too
lazy
to find the references for you, but try comp.std.c. And it never _ever_
hurts to search for posts by Chris Torek. :-]
No. I think you haven't yet seen the difference between integral
promotion and arithmetic conversion.

Eh - It seems illogical, so I checked the standard [actually an
anonymous c89 draft, but close enough to the standard, doubt it's been
changed]

Integral promotion in C99 added extended integer types into the mix.
If an int can represent all values of the original type, the value is
converted to an int; otherwise it is converted to an unsigned int.
These are called the integral promotions.

Incidentally, UCHAR_MAX cannot be the same as INT_MAX, because int
has a sign bit, and unsigned char may not have padding bits.

If CHAR_BIT >= 15, there's nothing preventing int from being 2 bytes
and
having CHAR_BIT-1 padding bits. [Thus having 1 sign bit and CHAR_BIT
value bits, sufficient to store any value of an unsigned char.]
 
J

John F

pete said:
Is your intention to iterate that loop
INT_MAX number of times?

How long does it take on a modern machine? Negligable for a single
execution. (Rationale see below.)
I think this is more better:

unsigned char x = -1;

Is propagation to 0xFFFF...FF garanteed? Thinking of other implementations
of signed values I'm not so sure. (signed bit, instead of two's
complement... with sign-bit not being affected by >> e.g. if this is still a
legal implementaion [1:12 am here... saturated with coffee and still too
tired to dig that standard up])
unsigned int y = 0;

while (x >>= 1) {
++y;
}

Would work on some machines. IIRC some preserve the sign on a shift right (I
hope this is still right...) ... thus this would loop forever.

John
 
P

pete

John said:
:

Is propagation to 0xFFFF...FF garanteed?

Yes.
N869
6.3.1.3 Signed and unsigned integers
[#1] When a value with integer type is converted to another
integer type other than _Bool, if the value can be
represented by the new type, it is unchanged.
[#2] Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.

((-1) + (UCHAR_MAX + 1)) == UCHAR_MAX == 0xF...F
Would work on some machines.
IIRC some preserve the sign on a shift right (I
hope this is still right...) ... thus this would loop forever.

No.
The type of (x) is unsigned char.
It's value is positive after being initialized to -1,
and stays positive until it is decreased to zero,
at which time the loop ends.
 
J

John F

pete said:
John said:
:

Is propagation to 0xFFFF...FF garanteed?

Yes.
N869
6.3.1.3 Signed and unsigned integers
[#1] When a value with integer type is converted to another
integer type other than _Bool, if the value can be
represented by the new type, it is unchanged.
[#2] Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.

((-1) + (UCHAR_MAX + 1)) == UCHAR_MAX == 0xF...F

AH!!! Thanks!
No.
The type of (x) is unsigned char.
It's value is positive after being initialized to -1,
and stays positive until it is decreased to zero,
at which time the loop ends.

I should go to sleep :) Thanks.

unsigned int y=0;
unsigned char x=-1;
while( y++, x >> 1 );

does the job...

going to bed _now_ ...
John
 
M

Michael Wojcik

In the programming language that *is* C, the implementation *must*
define CHAR_BIT. It's not optional.
To complicate matters, char is not required by the C standard to be the
machine "byte". It is only required to be at least 8 bits. And a
machine "byte" is not always 8 bits....

True. However:
So, given that it can't strictly be done in C

I don't think this is true. Consider that:

- integers must have a pure binary representation, which means
value bits are in order
- you can inspect the representation of any object using a
pointer to unsigned char

Given that, try this code:

#include <stdio.h>
int main(void)
{
unsigned i, c;
unsigned char *bp1, *bp2;

if (sizeof i > 1)
{
i = 0;
bp1 = bp2 = (unsigned char *)&i;
bp1++; /* point to second byte */
bp2 += sizeof i - 1; /* point to second-to-last byte */
for (c = 0; *bp1 == 0 && *bp2 == 0; c++)
i = 1u << c;
printf("bytes have %d bits in this implementation\n", (int)c - 1);
}
return 0;
}

This works for both big- and little-endian architectures, as far as I
can see, because it tests when the shift operator alters the second
or penultimate bytes. I don't offhand see a way for a conforming
implementation (where sizeof(unsigned) > 1) to produce the wrong
answer.

Extending the program to handle the case where sizeof(unsigned) is 1
is left as an exercise for the reader.

--
Michael Wojcik (e-mail address removed)

"Well, we're not getting a girl," said Marilla, as if poisoning wells were
a purely feminine accomplishment and not to be dreaded in the case of a boy.
-- L. M. Montgomery, _Anne of Green Gables_
 
S

slebetman

Peter said:
Yes, which is why it's best to use the term 'byte' in the C language
sense,
and terms like octet or machine word for other senses.

A hardware 'byte' and 'octet' are again different things. In hardware,
a byte may be 6 bits but an octet by definition is always 8 bits even
on machines with 6 bit bytes (in which case, on that machine, it takes
2 bytes to correctly encode an octet). Which is why it is best to use
the them 'char' if you want to talk about byte in the C language sense
because you can't use the term octet to talk about bytes except in the
strict case where a byte is 8 bits.

So the original question should really have asked to calculate the size
of 'char' rather than 'byte'. But then again, in that case, a char is
always CHAR_BIT bits by definition and the code to calculate is useless
in the real world.
 
J

Jordan Abel

A hardware 'byte' and 'octet' are again different things. In hardware,
a byte may be 6 bits but an octet by definition is always 8 bits even
on machines with 6 bit bytes (in which case, on that machine, it takes
2 bytes to correctly encode an octet).

And a "machine word" is much more likely to be an int than a "byte" of
any sort.
 
N

Neil

david said:
Hi, Rod,

Martin and Vladimir has given a shorten method to find bit size of a
byte in c :). I think maybe Joe was not intended get an answer of
calculation, not looking up in an included file.

In programming language like C, compiler may set CHAR_BIT predefined.
But I wonder whether there is some cases when we don't know the bit
size of a byte in hardware CPU, and should calculate it by programming
language. The case's just like kidding, since we don't know bits of
CPU, how could programming language would work on it?

Of course the compiler does not have to be the same size as the CPU.
x86 is a good example there are 16, 32 and 64 bit compilers.
The compiler determines the byte size not the CPU.
It could have 32 bit bytes on an 8 bit CPU.
 
S

slebetman

Jordan said:
And a "machine word" is much more likely to be an int than a "byte" of
any sort.

Hmm.. this is slowly getting to be off-topic in c.l.c. Anyway...

A byte does not refer to a word. A machine word in hardware does indeed
usually correspond to an int. A byte is usually defined as the smallest
addressible unit of memory for an architecture whereas a word is
usually defined as the native register size of the architecture
(whatever that means).

The smallest addressible memory may not always be 8 bits. And on
machines where this is smaller than 8 bits (for example 6 bits was once
very common) the C standard requires chars to be at least 8 bits. Hence
the C standard requires that on such machines two bytes be used to
represent a char. It is also common to find modern machines,
particularly DSPs, where the machine byte is 32 bits. On such machines
it is not possible to address smaller than 32 bit values (you can
access individual octets though by masking and shifting but you cannot
address individual octets).
 
R

Rod Pemberton

david ullua said:
Thanks for both of your reply.
I take a look at limits.h and other head files, there is really
something need to be read.
Before i knew CHAR_BIT is a defined value, I wrote the following
snippet to do the job:
(now i realize it is not neccessary)
char c = '\01';
int i=0;
do
{
i++;
printf("%d:%x(dex)\n",i,c);
c = c<<1;
}while(c>0);
printf("bit count in a byte:%d",i);

An alternate method that should work for ones and twos complement integers:

#include <stdio.h>
int main(void)
{
unsigned short i,j;
/* i and j must be the same type needed */
/*
unsigned long i,j;
char i,j;
*/

unsigned short k;
for(i=1,j=0,k=0;i!=j;k++,j=i,i<<=1);
k--;
printf("\nbits %d\n",k);
return(0);
}

Rod Pemberton
 
S

stathis gotsis

Hmm.. this is slowly getting to be off-topic in c.l.c. Anyway...

A byte does not refer to a word. A machine word in hardware does indeed
usually correspond to an int. A byte is usually defined as the smallest
addressible unit of memory for an architecture whereas a word is
usually defined as the native register size of the architecture
(whatever that means).

The smallest addressible memory may not always be 8 bits. And on
machines where this is smaller than 8 bits (for example 6 bits was once
very common) the C standard requires chars to be at least 8 bits. Hence
the C standard requires that on such machines two bytes be used to
represent a char. It is also common to find modern machines,
particularly DSPs, where the machine byte is 32 bits. On such machines
it is not possible to address smaller than 32 bit values (you can
access individual octets though by masking and shifting but you cannot
address individual octets).

Based on what you said i came to the assumption that CHAR_BIT is the minimum
multiple of the number of bits in the machine byte which is equal or greater
than 8. So, given the CHAR_BIT value, one can find one or two numbers, one
of which will be the machine byte size. That means that the machine byte can
be roughly determined through C.
 
S

slebetman

stathis said:
Based on what you said i came to the assumption that CHAR_BIT is the minimum
multiple of the number of bits in the machine byte which is equal or greater
than 8. So, given the CHAR_BIT value, one can find one or two numbers, one
of which will be the machine byte size. That means that the machine byte can
be roughly determined through C.

Hmm.. But what about a machine with 9 bit bytes where CHAR_BIT is 8
(the C compiler simply ignores the most significant bit)? I believe
that the standard doesn't allow padding bits in char but I think in
this case the most significant bit is not a padding bit, it is simply
ignored by the compiler author. In this case a string of chars will
still be contiguous from the point of view of C but you won't be able
to ever find out the size of the machine byte from C.

I'm not so sure about how conformant is such an implementation. Any
standard experts here think it's valid?
 
K

Keith Thompson

stathis gotsis said:
Based on what you said i came to the assumption that CHAR_BIT is the minimum
multiple of the number of bits in the machine byte which is equal or greater
than 8. So, given the CHAR_BIT value, one can find one or two numbers, one
of which will be the machine byte size. That means that the machine byte can
be roughly determined through C.

You're assuming that "the machine byte" is meaningful.

If by "the machine byte" you mean the smallest directly addressible
unit of memory, I've worked on systems with CHAR_BIT==8 where the
smallest addressible unit of memory is 64 bits. (Strictly speaking,
setting CHAR_BIT to 64 would have been more correct, but that would
have broken too much code and killed compatibility with other
systems.)

If you want to know what a "machine byte" is, look for the word "byte"
in the CPU reference manual. I don't believe any other method can
give you a reliable and meaningful answer.
 
P

Peter Nilsson

Based on what you said i came to the assumption that CHAR_BIT is
the minimum multiple of the number of bits in the machine byte which
is equal or greater than 8.

No. CHAR_BIT is a C implementation construct that refers exclusively to
the C implementation on which it is defined. The underlying platform
need
not have _any_similarity in terms of 'machine bytes' or 'words'.

As Keith has pointed out, there have been hosted 8-bit byte
implementations where the only addressable unit on the underlying
archecture were 64-bit words.

There have also been 36-bit machines where it was common assembly
practice to pack 6 6-bit character entities into a word, but the C
implementations used 4 9-bit bytes.
So, given the CHAR_BIT value, one can find one or two numbers, one
of which will be the machine byte size.

No, you can't. It's likely that CHAR_BIT will match the underlying
architecture because implementations tend to promote efficiency,
but this is not guaranteed.
That means that the machine byte can be roughly determined
through C.

The fundamental issue (from clc's point of view) is that C is an
abstract
language. If you want to guage the 'machine byte' (whatever that may
mean to you), then you should abandon ISO C and use implementation
specific constructs or assumptions. [Thus reducing the portability of
your code.]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,176
Messages
2,570,948
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top