Bit-fields and integral promotion

C

CBFalconer

Kevin said:
Section 6.3.1.1p2:

"The following may be used in an expression wherever an int or
unsigned int may be used:

- An object or expression with an integer type whose integer
conversion rank is less than the rank of int and unsigned int.
- A bit-field of type _Bool, int, signed int or unsigned int.

If an int can represent all values of the original type, the value
is converted to an int; otherwise it is converted to an unsigned
int. These are called the integer promotions."

The ambiguity arises from what the "original type" is, and hence
what "all values" are. In the case of the 4-bit unsigned bitfield,
is it of type unsigned int, so all values are 0..UINT_MAX, or are
all values 0..15?

I'd always understood it to be the latter interpretation, so it
promotes to int. A look check at our compiler agrees with this -
it promotes to int, unless in pcc compatibility mode where it
promotes to unsigned int.

This may be supported by 6.7.2.1p9 which states:

"A bit-field is interpreted as a signed or unsigned integer type
consisting of the specified number of bits."

That at least gives wording strong enough to allow the bit-field to
have a distinct type with range 0..15 for the purposes of 6.3.1.1p2.

Maybe the way to attack it is by how any sane code generator
designer would go about it. The first thing to do is to get the
memory block holding the thing in question into a register. The
next is to shift that register so that the field in question is
right justified. Now the question arises of what to do with the
unspecified bits. They may either be masked off to 0 (i.e. the
field was unsigned) or jammed to copies of the left hand bit of the
original field (i.e. the field was signed, assuming 2's
complement). For 1's complement things are the same here, and for
sign-magnitude different (closer to unsigned treatment, but move
the sign bit over to its proper place).

After that we have an entity in a register, which is assumedly the
most convenient size that is normally used for ints, signed or
unsigned, and we can proceed from there. It seems to me that most
designers would opt for the unsigned version, because it is simpler
to process, and they are allowed to.

That means that the signed/unsigned characteristic of the bit field
is propagated into any expressions using it.

It also means that wherever given the choice, the designer will
make a bit field unsigned because it means less processing and less
chance of overflows and consequent UB.
 
J

Joe Wright

CBFalconer said:
Aha. The installation can interpret a bit field as either signed
or unsigned, but I believe needs to document which choice it has
taken. Thus either original behaviour can be legitimate, depending
on the system documentaion. This could be used as a test for what
the documentation should say.
#include <stdio.h>

struct S
{
unsigned int a:4;
unsigned int b:16;
};

void foo(void) {
puts("foo");
}

void bar(void) {
puts("bar");
}

int main(void)
{
struct S s;
s.a = 0;
s.b = 0;

unsigned char c = 0;
unsigned u = 0;

if (s.a - 5 < 0)
foo();
else
bar();

if (s.b - 5 < 0)
foo();
else
bar();

if (c - 5 < 0)
foo();
else
bar();

if (u - 5 < 0)
foo();
else
bar();

return 0;
}


My output is..

foo
foo
foo
bar

...and it must be so with any C compiler. Its called Integral Promotion.

From K&R2 A6.1 pp 197

"A character, short integer or an integer bit-field, all either
signed or not, or an object of enumeration type, may be used in an
expression wherever an integer may be used. If an int can represent all
the values of the original type, then the value is converted to int;
otherwise the value is converted to unsigned int."

In the first three cases the values of the narrower types are converted
to int before 5 is subtracted from its value yielding a negative result.
In the fourth case the maximum value of u cannot be represented by an
int and so remains unsigned and therefore positive.
 
W

Wojtek Lerch

CBFalconer said:
Kevin said:
"A bit-field is interpreted as a signed or unsigned integer type
consisting of the specified number of bits."

That at least gives wording strong enough to allow the bit-field to
have a distinct type with range 0..15 for the purposes of 6.3.1.1p2.

Maybe the way to attack it is by how any sane code generator
designer would go about it. [...]

Not really. This isn't about what a sane implementor is likely to do. It's
more about how much the C standard promises to forgive an insane
implementor...

....
After that we have an entity in a register, which is assumedly the
most convenient size that is normally used for ints, signed or
unsigned, and we can proceed from there. It seems to me that most
designers would opt for the unsigned version, because it is simpler
to process, and they are allowed to.

The only thing they are allowed to is decide whether a bit-field declared as
plain "int" is signed or unsigned. Once that's been decided, you can forget
about plain "int".

After you've gone through the shifting and masking or whatever and you have
the bit pattern in a register, you don't get to decide, again, whether to
interpret it as a signed int or an unsigned int. The standard says that it
depends on whether the range of the "original type" fits into the range of
int. The text is unclear on what the "original type" means, but doesn't
sound as if it were meant to give implementors a choice here.
 
J

Jack Klein

Maybe the way to attack it is by how any sane code generator
designer would go about it. The first thing to do is to get the
memory block holding the thing in question into a register. The
next is to shift that register so that the field in question is
right justified. Now the question arises of what to do with the
unspecified bits. They may either be masked off to 0 (i.e. the
field was unsigned) or jammed to copies of the left hand bit of the
original field (i.e. the field was signed, assuming 2's
complement). For 1's complement things are the same here, and for
sign-magnitude different (closer to unsigned treatment, but move
the sign bit over to its proper place).

After that we have an entity in a register, which is assumedly the
most convenient size that is normally used for ints, signed or
unsigned, and we can proceed from there. It seems to me that most
designers would opt for the unsigned version, because it is simpler
to process, and they are allowed to.

Up to here, you're doing OK.
That means that the signed/unsigned characteristic of the bit field
is propagated into any expressions using it.

Now you've stumbled.

Think about it, you have just copied a storage unit full of bits into
a register, and perhaps right shifted that register to place the bit
field in the least significant bits of that register. Because the bit
field is defined as unsigned, you fill all the higher bits of the
registers with 0, most likely with a bitwise AND. If the bit field
had been signed and contained a positive value, you would have done
the same.

So now you have the value of the bit field in an int-sized or larger
register. Is it in a signed int register, or an unsigned int
register? Er, there aren't such things, it's just in an int-sized
register. Perhaps in 1948 there was a vacuum tube and relay based
computer that actually had, and needed, separate signed integer and
unsigned integer registers, but I rather doubt it.

So we still have the value of the bit field, right justified with
leading 0 bits, in an integer register. Is it signed or unsigned?
Impossible to tell at this point, since the object representation of a
positive value in a signed integer type is required by the standard to
be absolutely identical to that of the same value in the corresponding
integer type.

So we have a value in a register that is either a signed or unsigned
int, impossible to tell from looking at the bits. Whether the C
object type of that register is signed into or unsigned int depends on
what happens next at the object code level:

1. On some architectures and for some operations, it depends on what
processor instructions are executed using the contents of the
register. ISTM to remember some processors that had different
instructions for signed and unsigned operations such as multiply and
divide, but I could be wrong.

2. Far more commonly, the actual significance of whether the register
was signed or unsigned only happens after the operation instruction is
performed, and affects the interpretation of the result, or even of
whether or not the result is defined.

So there is no difference in overhead whatsoever in converting the
unsigned bit field to either a signed or unsigned int.
It also means that wherever given the choice, the designer will
make a bit field unsigned because it means less processing and less
chance of overflows and consequent UB.

That can be stated much more simply and succinctly as:

The programmer should used unsigned bit fields when only positive
values need to be stored, and signed bit fields when both positive and
negative values are used.
 
J

Jonathan Burd

Joe said:
#include <stdio.h>

struct S
{
unsigned int a:4;
unsigned int b:16;
};

void foo(void) {
puts("foo");
}

void bar(void) {
puts("bar");
}

int main(void)
{
struct S s;
s.a = 0;
s.b = 0;

unsigned char c = 0;
unsigned u = 0;

if (s.a - 5 < 0)
foo();
else
bar();

if (s.b - 5 < 0)
foo();
else
bar();

if (c - 5 < 0)
foo();
else
bar();

if (u - 5 < 0)
foo();

(u - 5 < 0) will always be false.
else
bar();

<snip>


Regards,
Jonathan.
 
C

CBFalconer

Jack said:
Up to here, you're doing OK.


Now you've stumbled.

Think about it, you have just copied a storage unit full of bits
into a register, and perhaps right shifted that register to place
the bit field in the least significant bits of that register.
Because the bit field is defined as unsigned, you fill all the
higher bits of the registers with 0, most likely with a bitwise
AND. If the bit field had been signed and contained a positive
value, you would have done the same.

No, you've missed the complications involved in assuming the bit
field to be signed. That means the other bits have to be set to
copies of the fields sign bit, in either 1's or 2's complement
machines. For sign magnitude the appropriate bit has to be
exchanged with the sign bit, after zeroing the extra bits. These
manipulations are more complex than the unsigned version (which
simply zeroes some bits), and thus to be avoided. Laziness is a
virtue here.

In all cases we now have a bit pattern in a register, and external
type knowledge saying whether that pattern describes a signed or
unsigned integer. That external knowledge comes from the original
declaration of the bit field. That knowledge also governed whether
or not to go through the sign-extension gyrations described above.

All further processing is done as if the reworked register content
had been loaded in one fell swoop from somewhere, together with the
un/signed type knowledge.

Having gotten here with our sane code generator implementor, I
maintain we now have the right clue about how to handle the usual
arithmetic conversions on the bit field. We now base them on the
original declaration as un/signed, because that minimizes the
work. This is the final clue as to what the standard should say,
were it to say anything, which so far it does not AFAICT. We do
not base it on the range of values the bit field can hold.

.... snip ...
The programmer should used unsigned bit fields when only positive
values need to be stored, and signed bit fields when both positive
and negative values are used.

We are not trying to constrain the programmer, we are trying to
interpret what s/he actually wrote.
 
X

xarax

TTroy said:
Like I said in my other post, there is total confusion around integral
promotions and arithmetic conversions. All 5 of the 5 C programmers at
my workplace don't understand it, which says a lot (about them and the
topic). I bet Chris Torek is the only one who truly understands it.

You're saying that the standard allows an unsigned
integer type to be demoted to signed integer type?
 
W

Wojtek Lerch

xarax said:
You're saying that the standard allows an unsigned
integer type to be demoted to signed integer type?

"If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int. These
are called the integer promotions." (6.3.1.1p2)
 
K

kuyper

xarax said:
....
You're saying that the standard allows an unsigned
integer type to be demoted to signed integer type?

No, we're saying that the standard REQUIRES an unsigned type whose
values can all be represented by an int to be PROmoted to an int. See
section 6.3.1.1p2.
 
J

Jack Klein

No, you've missed the complications involved in assuming the bit
field to be signed. That means the other bits have to be set to
copies of the fields sign bit, in either 1's or 2's complement
machines. For sign magnitude the appropriate bit has to be
exchanged with the sign bit, after zeroing the extra bits. These
manipulations are more complex than the unsigned version (which
simply zeroes some bits), and thus to be avoided. Laziness is a
virtue here.

The gyrations involved in sign extending negative unsigned bit fields
on non 2's complement platforms are relevant, and no different than
those such a platform must go through to convert an ordinary signed
integer type with a negative value to a wider signed type. If there
are any such monstrosities still in existence with current C compiler
support, they pay for the obsolete architecture.
In all cases we now have a bit pattern in a register, and external
type knowledge saying whether that pattern describes a signed or
unsigned integer. That external knowledge comes from the original
declaration of the bit field. That knowledge also governed whether
or not to go through the sign-extension gyrations described above.

All further processing is done as if the reworked register content
had been loaded in one fell swoop from somewhere, together with the
un/signed type knowledge.

What does the "signed/unsigned" knowledge have to do with it? On most
architectures, chars have fewer bits than ints and UCHAR_MAX <
INT_MAX. So in an expression involving an unsigned char, you wind up
with the same thing, namely a narrower bit field filling an int size
object, and the knowledge of whether the object it came from was
signed or unsigned. Despite the fact that the value originated in an
unsigned char, the int-sized object must be treated as signed.
Having gotten here with our sane code generator implementor, I
maintain we now have the right clue about how to handle the usual
arithmetic conversions on the bit field. We now base them on the
original declaration as un/signed, because that minimizes the
work. This is the final clue as to what the standard should say,
were it to say anything, which so far it does not AFAICT. We do
not base it on the range of values the bit field can hold.

Admittedly it is unfortunate that the standard does not specifically
mention bit fields in describing the usual integer conversions, and
hopefully that can be rectified in a TC or later version.

But since the standard selected what they call "value preserving" over
"sign preserving" operation, it would be seriously inconsistent and a
real source of problems if an 8-bit unsigned char promoted to a signed
int but an 8-bit unsigned bit field promoted to an unsigned int. That
would be rather absurd, don't you think?
... snip ...

We are not trying to constrain the programmer, we are trying to
interpret what s/he actually wrote.

Ah, you snipped your particular statement that my comment addressed,
so I am putting it back in:
It also means that wherever given the choice, the designer will
make a bit field unsigned because it means less processing and less
chance of overflows and consequent UB.

I misinterpreted your meaning, so my comment doesn't apply. I was
thrown off by what I think is some incompleteness in your wording. I
think what you meant to say by "make a bit field unsigned" would be
better conveyed by the words "make an unsigned bit field promote to
unsigned int".

But despite the omission from the standard, it seems silly to think
that the compiler designer is given a choice here. Since all other
promotions and conversions are rather scrupulously defined, I find it
hard to believe that the intent was to leave the results of using an
unsigned bit field in an expression implementation defined. In fact,
nothing is implementation-defined unless the standard specifically
states that it is implementation-defined.

In fact, given the lack of mention, using the value of a bit field in
an expression technically produces undefined behavior based on the
wording of the standard today.
 
W

Wojtek Lerch

Jack Klein said:
In fact, given the lack of mention, using the value of a bit field in
an expression technically produces undefined behavior based on the
wording of the standard today.

It's not exactly a lack of mention: 6.3.1.1p2 does mention bit-fields, and
it's quite clear that it attempts to define a behaviour. The only thing
that is not clear whether the "original type" is meant to refer to the type
the bit-field is "interpreted as" according to 6.7.2.1p9, or simply to the
type specifier used in the declaration of the field, as the list in
6.3.1.1p2 seems to suggest.

In short, it's not undefined behaviour. It's unclearly defined behaviour.
:-/
 
C

Christian Bau

Jack Klein said:
That can be stated much more simply and succinctly as:

The programmer should used unsigned bit fields when only positive
values need to be stored, and signed bit fields when both positive and
negative values are used.

Since this is about the only place where there is a difference between
"signed int" and "int", I would say

The programmer should use bit fields using the type "unsigned int" when
only positive values need to be stored, and bit fields using the type
"signed int" when both positive and negative values are used, but
_never_ bit fields using the type "int" on its own, because then you
don't know whether the bitfield will actually be signed or unsigned.

Independent from that, the integer promotions will promote any signed
bitfield to "int", and any unsigned bit field with less bits than type
"int" will be promoted to "int" as well. But that won't help if you
tried to store a negative value into a bitfield that was specified as
"int" if the compiler decided to make it unsigned int.
 
X

xarax

Wojtek Lerch said:
It's not exactly a lack of mention: 6.3.1.1p2 does mention bit-fields, and
it's quite clear that it attempts to define a behaviour. The only thing that
is not clear whether the "original type" is meant to refer to the type the
bit-field is "interpreted as" according to 6.7.2.1p9, or simply to the type
specifier used in the declaration of the field, as the list in 6.3.1.1p2 seems
to suggest.

In short, it's not undefined behaviour. It's unclearly defined behaviour. :-/

I think it's a question of whether bit-fields
of any length less than an int are actually
promoted at all, or whether the compiler is
allowed to treat a bit-field as a full-sized
integer (even though the source bits may be
narrower than an int).

Is a bit-field that is declared as

"unsigned int foo:CHAR_BIT;"

treated the same as an unsigned char, or
is it "unsigned int" that happens to be
loaded from a memory location is only CHAR_BIT
wide? Are bit-fields defined in the standard
as "promoted", or are only "ordinary" integer
types "promoted" to int?
 
C

CBFalconer

xarax said:
.... snip ...

Is a bit-field that is declared as

"unsigned int foo:CHAR_BIT;"

treated the same as an unsigned char, or
is it "unsigned int" that happens to be
loaded from a memory location is only CHAR_BIT
wide? Are bit-fields defined in the standard
as "promoted", or are only "ordinary" integer
types "promoted" to int?

If you google back in this thread for my contributions, you will
see I favor the second alternative, and my reasons therefore.
 
K

Kevin Bracey

In message <[email protected]>
CBFalconer said:
Maybe the way to attack it is by how any sane code generator
designer would go about it. The first thing to do is to get the
memory block holding the thing in question into a register. The
next is to shift that register so that the field in question is
right justified. Now the question arises of what to do with the
unspecified bits. They may either be masked off to 0 (i.e. the
field was unsigned) or jammed to copies of the left hand bit of the
original field (i.e. the field was signed, assuming 2's
complement). For 1's complement things are the same here, and for
sign-magnitude different (closer to unsigned treatment, but move
the sign bit over to its proper place).

After that we have an entity in a register, which is assumedly the
most convenient size that is normally used for ints, signed or
unsigned, and we can proceed from there. It seems to me that most
designers would opt for the unsigned version, because it is simpler
to process, and they are allowed to.

All very well, but how is this any different to the case of loading
an unsigned char? The standard states that an unsigned char is promoted
to int (assuming int is bigger than char). In the following structure,
why should a and b be treated differently?

struct
{
unsigned char a;
unsigned b:8;
} x;

It seems logical to me that bitfields should follow the same basic "value-
preserving" promotion rules as other sub-integer types. The handling
involved is basically the same, even though it may require a bit more manual
work in the code generator. Mind you, I predominantly work on a CPU
with no hardware support for signed char, so even the standard types can
require manual sign-extension.

The fact that bitfields may require a bit more work than native widths
doesn't strike me as sufficient reason to have different semantics.
That means that the signed/unsigned characteristic of the bit field
is propagated into any expressions using it.

It also means that wherever given the choice, the designer will
make a bit field unsigned because it means less processing and less
chance of overflows and consequent UB.

I kind of agree - certainly, our compiler makes bitfields unsigned by
default. Nonetheless, unsigned bitfields smaller than 31 bits still
promote to int, because that's what we believe the standard requires.

Regardless of the formal type, the unsignedness can still be remembered. But
if you're going to bother doing that, you can do much better than just
remember an "unsigned" (ie top bit clear) flag - our compiler is capable of
tracking the number of significant low order bits in an expression.

For example, after promotion, x.b above is of type int, but the compiler
knows the value of the expression occupies 8 bits (if interpreted as
unsigned), or 9 bits (if interpreted as signed) [call it 8u/9s for short].

This information then propagates through expressions:

x.a [8u/9s]
x.b [8u/9s]
x.a+x.b [9u/10s]
x.a-x.b [32u/10s] {result could be negative, thus 32 unsigned bits}
x.a^x.b [8u/9s]
x.a*x.b [16u/17s]
x.a & 0xffff [8u/9s]

A good use of this significant bit information is to skip
narrowing/sign-extending stages when storing back into a narrow type; our
architecture has no instructions specifically for this purpose when working
in registers.
 
C

CBFalconer

Kevin said:
.... snip ...

All very well, but how is this any different to the case of loading
an unsigned char? The standard states that an unsigned char is
promoted to int (assuming int is bigger than char). In the following
structure, why should a and b be treated differently?

struct
{
unsigned char a;
unsigned b:8;
} x;

Because a char, of any flavor, occupies a complete addressable
unit, and can be loaded and stored without affecting anything else
(at least as far as C is concerned). That does not apply to
bitfields, which may spread over byte demarcations (but not over
int demarcations). In addition, a char is a defined type. A
bitfield isn't. C does not have subranges and close typeing.
 
L

Lawrence Kirby

On Fri, 28 Jan 2005 16:37:52 +0000, xarax wrote:

....
That should not change the answer. Remember that the
field is unsigned, regardless of its bit width. Therefore,
the subtraction expression is unsigned.


Check their documentation to see if they support
unsigned bit fields. They may be ignoring the
"unsigned" qualifier.

If they are C compilers they are required to support unsigned bit-fields.

Lawrence
 
K

Kevin Bracey

In message <[email protected]>
CBFalconer said:
Because a char, of any flavor, occupies a complete addressable
unit, and can be loaded and stored without affecting anything else
(at least as far as C is concerned). That does not apply to
bitfields, which may spread over byte demarcations (but not over
int demarcations).

I'd agree that there's a potential difference in the implementation (although
I'm sure there are implementations that load & store chars like a bitfield,
and others that can load aligned bitfields like x.b as a char). But is that
sufficient reason to cause such a significant difference in the semantics?

The very decision to adopt "value-preserving" promotions was to minimise
unexpected behaviour. Having bitfields alone be "unsigned-preserving" would
be rather unexpected, surely?

I suppose it's just unexpected if you have my view of bitfields though - I
automatically think of "unsigned :8" as just being a custom sub-integer type,
akin to unsigned char. Maybe you think of it as an unsigned int which just
happens to be not allocated all its bits.
In addition, a char is a defined type. A bitfield isn't.

I think 6.7.2.1p9 disagrees with you. The bitfield's type may not have a
name, but it is a type.
 
L

Lawrence Kirby

What can never be negative? s.a can't, but s.a-5 can.


6.7.2.1p9: "A bit-field is interpreted as a signed or unsigned integer type
consisting of the specified number of bits." The type of s.a is a 4-bit
unsigned type. Since all the values of such a type can be represented by
int, the integer promotions convert it to int rather than to unsigned int,
and the value of s.a-5 is -5 rather than UINT_MAX-4.

This seems to make bit-field specifications part of C's type system,
unless you want to say that "is interpreted as" means something other than
"is". I would really hate to have to go there. However if bit-fieldness is
a property of type it is rather unfortunate that this isn't mentioned in
6.2.5 (the definition of C's types). If it isn't a property of type then
for example the conversion rule specified in 6.3.1.3p2 has problems e.g.
when trying to assign the value 16 to a 4 bit unsigned bit-field:

"Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in
the new type until the value is in the range of the new type."

There is enough of the standard that assumes that bit-field width is a
type property to make it difficult to interpret it otherwise. If it is
then consider 6.3.1.1p2:

"If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int."

Here "original type" presumably means the unpromoted type or the type of
our original value i.e. what we are promoting from. If bit-field width is
part of a type then the width must be significant in determining whether
the value is promoted to int or unsigned int.

In which case I agree that s.a - 5 must evaluate to -5 in the original
example.

Lawrence
 
A

Alex Fraser

[big snip]

What are you trying to say?

That bit-fields declared with plain int should be unsigned?
That bit-fields should be "unsigned preserving"?
Both? Something else?

Promoting a signed bit-field to (signed) int requires extra effort to copy
or move the sign bit, but promoting an unsigned bit-field to (signed) int
does not.

Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top