byte alignment in structures and unions

A

anon.asdf

Hi!

I want to assign the number 129 (binary 10000001) to the MSB (most
significant byte) of a 4-byte long and leave the other lower bytes in-
tact!

-working on normal pentium (...endian)

I want to do it with code that does NOT use shifts (<<) , bit-
operations (| &) !!
So the compiler will have to do the work and I'll introduce
appropriate structs and unions.


#include <stdio.h>

struct each_of_four {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
}
/*__attribute__ ((packed))*/
;

union align_long_and_each_of_four {
long dummy; /* 4 bytes */
struct each_of_four four;
}
/*__attribute__ ((packed))*/
;


int main(void)
{
long val; // 4 bytes

/****************** TEST A: COMPILER ERROR - WHY?
*********************/
((union align_long_and_each_of_four) val).four.byte3 = (unsigned
char) 129;






#define FUNNY_NUMBER ((union align_long_and_each_of_four) \
(const long) ((129<<24) | (val & 16777215))).four.byte3
// 16777215 = 2^24-1

printf("test FUNNY_NUMBER: %d\n", FUNNY_NUMBER);

/****************** TEST B: COMPILER ERROR - WHY?
*********************/
((union align_long_and_each_of_four) val).four.byte3 = FUNNY_NUMBER;

return 0;
}


Compiler error report--->
test_align.c:25: error: invalid lvalue in assignment
test_align.c:39: error: invalid lvalue in assignment



How can this be fixed??

Thanks
anon.asdf
 
A

anon.asdf

/****************** TEST A: COMPILER ERROR - WHY?
*********************/
((union align_long_and_each_of_four) val).four.byte3 = (unsigned
char) 129;


The really interesting here, is that the following code DOES work!

{
union align_long_and_each_of_four tmp;

tmp.four.byte3 = (unsigned char) 129;
}

But still - how can the compiler error in TEST A be fixed??
 
E

Eric Sosman

Hi!

I want to assign the number 129 (binary 10000001) to the MSB (most
significant byte) of a 4-byte long and leave the other lower bytes in-
tact!

-working on normal pentium (...endian)

I want to do it with code that does NOT use shifts (<<) , bit-
operations (| &) !!
So the compiler will have to do the work and I'll introduce
appropriate structs and unions.


#include <stdio.h>

struct each_of_four {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
}
/*__attribute__ ((packed))*/
;

union align_long_and_each_of_four {
long dummy; /* 4 bytes */
struct each_of_four four;
}
/*__attribute__ ((packed))*/
;


int main(void)
{
long val; // 4 bytes

/****************** TEST A: COMPILER ERROR - WHY?
*********************/
((union align_long_and_each_of_four) val).four.byte3 = (unsigned
char) 129;






#define FUNNY_NUMBER ((union align_long_and_each_of_four) \
(const long) ((129<<24) | (val & 16777215))).four.byte3
// 16777215 = 2^24-1

printf("test FUNNY_NUMBER: %d\n", FUNNY_NUMBER);

/****************** TEST B: COMPILER ERROR - WHY?
*********************/
((union align_long_and_each_of_four) val).four.byte3 = FUNNY_NUMBER;

Because you cannot cast to or from a union (or struct)
type: They are not "scalar types" (6.5.4p2). Keep in mind
that a cast is an operator that converts a value, not a
magical "let's pretend" construct. And in any case, the
value produced by a cast operator has the same status as
a value produced by (for example) a unary minus operator:
You cannot write `-x = 42', either.
return 0;
}


Compiler error report--->
test_align.c:25: error: invalid lvalue in assignment
test_align.c:39: error: invalid lvalue in assignment



How can this be fixed??

One way is

((unsigned char*)&val)[3] = 129;

Of course, this fails miserably if `val' is not four bytes
long with the MSB in the fourth position. A better way is

val = (val & 0xffffffUL) | (129UL << 24);

(Yes, I know you said you didn't want to use shifts or
bitwise operators. Tough: It's a better way anyhow.)

A final thought: *Every* solution has the problem that
it makes non-portable assumptions about what happens to
the value of `val' when you reach in and hammer one of its
bytes. When you do so, you have left the guarantees of the
C language behind, and will need to make your way in
uncharted territory without their protection. Things would
be somewhat better with `unsigned long', but ...
 
A

Army1987

Hi!

I want to assign the number 129 (binary 10000001) to the MSB (most
significant byte) of a 4-byte long and leave the other lower bytes in-
tact!

-working on normal pentium (...endian)

I want to do it with code that does NOT use shifts (<<) , bit-
operations (| &) !!
l %= 0x01000000;
l += 129 * 0x01000000;
This works regardless of endianness.
So the compiler will have to do the work and I'll introduce
appropriate structs and unions.
#include <stdio.h>

struct each_of_four {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
}
/*__attribute__ ((packed))*/
What was wrong with unsigned char bytes[4], which causes the same
thing that the stuff you commented out would do somewhere, but in
standard C?
union align_long_and_each_of_four {
long dummy; /* 4 bytes */
struct each_of_four four;
}
/*__attribute__ ((packed))*/
;
What was wrong with { long dummy; unsigned char four[4]; }?
int main(void)
{
long val; // 4 bytes

/****************** TEST A: COMPILER ERROR - WHY?
*********************/
((union align_long_and_each_of_four) val).four.byte3 = (unsigned
char) 129;
Because the result of a cast isn't a lvalue.
Try
((unsigned char *)&val)[3] = 129;
No unions and no struct needed.
#define FUNNY_NUMBER ((union align_long_and_each_of_four) \
(const long) ((129<<24) | (val & 16777215))).four.byte3
Didn't you say you didn't want to use bitwise operations?
 
A

Army1987

/****************** TEST A: COMPILER ERROR - WHY?
*********************/
((union align_long_and_each_of_four) val).four.byte3 = (unsigned
char) 129;
[snip]
But still - how can the compiler error in TEST A be fixed??
If you *really* want to do that, try
(union align_long_and_each_of_four *)val->four.byte3 = 129;
But there are better ways to do that, see my other reply.
 
A

anon.asdf

One way is

((unsigned char*)&val)[3] = 129;

Thank you for the insights!

((unsigned char*)&val)[3] = 129;
is elegant.

I wonder if the compiler resolves it (above) to the same shifts as
val = (val & 0xffffffUL) | (129UL << 24);
or utilizes some tighter optimization, if the architecture allows it.
??

Things would
be somewhat better with `unsigned long', but ...

How does `unsigned long' change the situation?

-anon.asdf
 
A

anon.asdf

l %= 0x01000000;
l += 129 * 0x01000000;
This works regardless of endianness.


Unfortunately this does not work! Try

{
long val = (129<<24) + 1;
val %= 0x01000000;
val += 129 * 0x01000000;
printf("%ld\n", val); // get -2147483647, but should be -2130706431
}

What was wrong with { long dummy; unsigned char four[4]; }?

Nothing. I could use:
((unsigned char *)&dummy)[3] = four[3];


Try
((unsigned char *)&val)[3] = 129;
No unions and no struct needed.

Yes - that's perfect!
Didn't you say you didn't want to use bitwise operations?

True.
But I'm hoping the compiler will resolve it to a constant, so the
shifts are only in the c-code, but not in the machine code.

Thanks for your comments!

-anon.asdf
 
A

anon.asdf

If you *really* want to do that, try
(union align_long_and_each_of_four *)val->four.byte3 = 129;

Thanks! thats good - what I was looking for!
- but you forgot the & and parenthesis:

((union align_long_and_each_of_four *)&val)->four.byte3 = 129;

Regards,
anon.asdf
 
A

anon.asdf

#define FUNNY_NUMBER ((union align_long_and_each_of_four) \
True.
But I'm hoping the compiler will resolve it to a constant, so the
shifts are only in the c-code, but not in the machine code.

Thanks for your comments!

-anon.asdf

My comment here is incorrect! It can never be a constant, since it
includes the variable val .
-anon.asdf
 
C

Chris Torek

... ((unsigned char*)&val)[3] = 129; is elegant.

Elegant, but not terribly portable, and on some machines, a lot
slower than the shift-and-mask method:
I wonder if the compiler resolves it (above) to the same shifts as
val = (val & 0xffffffUL) | (129UL << 24);
or utilizes some tighter optimization, if the architecture allows it.

This depends on the architecture *and* the optimizer.

Taking the address of variables defeats some optimizers entirely.
In such cases, the compiler may "throw up its hands in defeat" as
it were, and compile code like:

store reg, mem | put "val" into RAM so it can be modified piece-wise
movi #129, t0 | tempreg = constant
store tmp, mem+3 | set mem[3]
load reg, mem | pull "val" back out of RAM

which, on register-oriented machines where RAM is slow compared to
the CPU, may take a dozen or more clock cycles. (Clever caches may
manage to shrink this to just 3 clock cycles in the best case: one
for the first store, one for the second store done "in parallel" with
the move-immediate, and one for the load.)

The shift-and-mask version might instead compile to:

movih #0xff00, t0 | tempreg = 0xff00 << 16
andn reg, t0, reg | val &= ~tempreg
movih #0x8100, t0 | tempreg = 0x8100 << 16 (ie 129UL << 24)
or reg, t0, reg

which, although it is still four instructions, executes in two
clock cycles (two instructions per clock), regardless of cache
activity and RAM and so on.

Other optimizers are a bit (or even a lot) more clever, and can
indeed turn the one sequence into the other.

The main disadvantage to the "access individual bytes of variable"
method is that it not only depends on the size of bytes -- which
tends to be exactly 8 bits across a wide variety of machines today,
so that you are relatively safe there -- but also on the "endian-ness"
of the CPU, which tends to vary. The shift-and-mask version,
although it is more verbose in source form, is a lot easier for
most optimizers.
 
E

Eric Sosman

One way is

((unsigned char*)&val)[3] = 129;


Thank you for the insights!

((unsigned char*)&val)[3] = 129;
is elegant.

I wonder if the compiler resolves it (above) to the same shifts as
val = (val & 0xffffffUL) | (129UL << 24);
or utilizes some tighter optimization, if the architecture allows it.
??

Which of the very many C compilers is "the" compiler
you have in mind?

(No, don't answer: It's a rhetorical question, intended
to make you think.)

How does `unsigned long' change the situation?

It doesn't -- for people who already "know" that a
long has four eight-bit bytes arranged in Little-Endian
order and using two's complement representation with no
padding bits and no traps. Such people already "know"
a good deal more than the C language guarantees.

It's probably fairly safe to ignore the possibility
of padding bits, trap representations, and formats other
than two's complement; such things are definitely out of
fashion these days and you're unlikely to encounter them.
(But it *is* a fashion-driven industry; things that were
once chic may become so again ...) Even so, there are
plenty of machines whose longs use eight eight-bit bytes,
plenty of machines that arrange their longs (of whatever
length) in Big-Endian order, and even some machines that
use 32-bit bytes. C can run on all of these -- but your
program will not run on them if you use too much of what
you "know."

Sometimes it is necessary to make use of system-specific
knowledge in order to do something that portable C cannot
do or cannot do well. But those occasions are far rarer
than many people seem to suppose; there is usually a way
to get it done (for many, many values of "it") without
resorting to trickery. The only reason you have given for
using trickery is "I want to do it" this way -- I don't
find that a compelling reason.
 
P

pete

Try
((unsigned char *)&val)[3] = 129;
No unions and no struct needed.

You can assign the value of 129
to the highest addressed byte of any object,
this way:

((unsigned char *)&val)[sizeof val - 1] = 129;
 
A

Army1987

That'd be the representation of a negative integer.
l %= 0x01000000;
l += 129 * 0x01000000;
This works regardless of endianness.
It would work if l were unsigned long, and either 129 * 0x01000000
fitted in a signed long, or I wrote l += 129U * 0x01000000; (or
l += 0x81000000; of course).
 
W

Walter Roberson

Army1987 wrote:
I want to assign the number 129 (binary 10000001)
to the MSB (most significant byte) of a 4-byte long
and leave the other lower bytes in-tact!
Try
((unsigned char *)&val)[3] = 129;
No unions and no struct needed.
You can assign the value of 129
to the highest addressed byte of any object,
this way:
((unsigned char *)&val)[sizeof val - 1] = 129;

Yes, that should indeed assign into the highest addressed byte.
Unfortunately the highest addressed byte might not be the MSB
(most significant byte). On big-endian machines, it would
often be the lowest addressed byte that is the MSB.
 
K

Keith Thompson

pete said:
Try
((unsigned char *)&val)[3] = 129;
No unions and no struct needed.

You can assign the value of 129
to the highest addressed byte of any object,
this way:

((unsigned char *)&val)[sizeof val - 1] = 129;

But the question was how to assign a value to the most significant
byte, not the highest addressed byte.
 
C

CBFalconer

pete said:
Try
((unsigned char *)&val)[3] = 129;
No unions and no struct needed.

You can assign the value of 129
to the highest addressed byte of any object,
this way:

((unsigned char *)&val)[sizeof val - 1] = 129;

Seems to give funny results on my machine. (Which happens to place
the MSByte of an integer in the lowest order address). The world
is not defined by an X86.
 
W

Walter Roberson

[/QUOTE]
That'd be the representation of a negative integer.

Not necessarily.

A) We don't know how big a byte is on the target machine. It
might not be overflow.
B) If you are working in signed mode on an 8 bit byte,
then it is overflow and so not defined;
C) If you are working unsigned, it is not overflow, but if the
machine is a seperated-sign machine, the correspondance between
sign bit and arithmetic values is unspecified (but other
representation constraints pretty much imply the seperated-sign
would have to be the most significant bit.)
 
A

anon.asdf

That'd be the representation of a negative integer.


{
long val = (129<<24) + 1;
val %= 0x01000000;
val += 129 * 0x01000000;
printf("%ld\n", val); // get -2147483647, but should be -2130706431

}

referring to the above
It would work if val were unsigned long, and either 129 * 0x01000000
fitted in a signed long, or I wrote val += 129U * 0x01000000; (or
val += 0x81000000; of course).

It can also work if val is signed long - as follows:
{
long val /* = 0 */;
val %= 0x01000000U;
val += 129U * 0x01000000;
printf("%ld\n", val);
}

-anon.asdf
 
A

Army1987

That'd be the representation of a negative integer.

Not necessarily.

A) We don't know how big a byte is on the target machine. It
might not be overflow.[/QUOTE]
Speak for yourself. I do know how big a byte is on the OP's
machine.
B) If you are working in signed mode on an 8 bit byte,
then it is overflow and so not defined;
Well, do you think anybody will speak of single bytes in a larger
object in terms of a signed char?
C) If you are working unsigned, it is not overflow, but if the
machine is a seperated-sign machine, the correspondance between
sign bit and arithmetic values is unspecified (but other
representation constraints pretty much imply the seperated-sign
would have to be the most significant bit.)
When the sign bit is set, the value is negative (provided it isn't
a trap), period. This is true in any of the three allowed
representations. And I happen to know that the OP has two's
complement and no trap representation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top