Unions, storage, ABI's

K

Koen

Hi!

Does anyone know what the standard says about the way unions are
stored in C? I mean the following:

Let's say you have a union with a double and a char field:

union MyUnion_t
{
double Double;
char Char;
};

MyUnion_t aUnion;

Is it standardized somehow which byte of the allocated storage the
Char field will use?

And a related question: if you dump unions in binary form to a file,
and then reload them from the file on a different platform, or with a
program compiled by a different compiler, are you guaranteed to get
back what you stored? (I think not, but I'm not sure)

And another point: if you use the same union across different ABI's,
will that work without problems? For example: if you get a pointer to
such a union from a library compiled by a specific compiler, and then
use it in another module compiled with a different compiler, will
aUnion.Char work as expected?

Someone on the comp.std.c group found this:
C99 6.7.2.1p14 says:

The size of a union is sufficient to contain the largest of its
members. The value of at most one of the members can be stored in
a union object at any time. A pointer to a union object, suitably
converted, points to each of its members (or if a member is a bit-
field, then to the unit in which it resides), and vice versa.

That doesn't seem to guarantee much does it?
I was hoping to use (*ptr).Char or (*ptr).Double without having to cast
anything. Looks like no more details are specified about whether this is
supposed
to work across different ABI's:

MyUnion_t theUnion;
somelib->getUnion(&theUnion);

where this is in my own program, and somelib is a function table into a
library possibly compiled by another compiler.
Will the libary return a union with the same binary layout as my own
program would do?

Koen
 
I

infobahn

Koen said:
Hi!

Does anyone know what the standard says about the way unions are
stored in C? I mean the following:

Let's say you have a union with a double and a char field:

union MyUnion_t
{
double Double;
char Char;
};

MyUnion_t aUnion;

Is it standardized somehow which byte of the allocated storage the
Char field will use?

&aUnion.Double and &aUnion.Char are equal.
And a related question: if you dump unions in binary form to a file,
and then reload them from the file on a different platform, or with a
program compiled by a different compiler, are you guaranteed to get
back what you stored? (I think not, but I'm not sure)

You're right; you're not.
 
K

Keith Thompson

Koen said:
Does anyone know what the standard says about the way unions are
stored in C? I mean the following:

Let's say you have a union with a double and a char field:

union MyUnion_t
{
double Double;
char Char;
};

MyUnion_t aUnion;

Is it standardized somehow which byte of the allocated storage the
Char field will use? [...]

I was hoping to use (*ptr).Char or (*ptr).Double without having to cast
anything. Looks like no more details are specified about whether this is
supposed
to work across different ABI's:

MyUnion_t theUnion;
somelib->getUnion(&theUnion);

where this is in my own program, and somelib is a function table into a
library possibly compiled by another compiler.
Will the libary return a union with the same binary layout as my own
program would do?

Of course you can use (*ptr).Char or (*ptr).Double to refer to the
corresponding members (or, more idiomatically, ptr->Char or
ptr->Double). Casting would be neither necessary nor useful, since
the expression is already of the correct type. If theUnion contains
valid data, you can access it; if it doesn't, casting won't help.

There's no guarantee that code generated by different compilers will
use the same layout for unions, structs, or anything else. But if the
compilers are on the same system, and you can link code generated by
them into a single executable, the compiler vendors will almost
certainly have made some effort to make their layouts compatible.
 
J

Jens.Toerring

Koen said:
Does anyone know what the standard says about the way unions are
stored in C? I mean the following:
Let's say you have a union with a double and a char field:
union MyUnion_t
{
double Double;
char Char;
};
MyUnion_t aUnion;

You need

union MyUnion_t aUnion;

here since MyUnion_T isn't typdefed but just a tag...
Is it standardized somehow which byte of the allocated storage the
Char field will use?

What you cited later, i.e.

tells that both the 'Double' and 'Char' members will be at the very
first address of the union. I.e.

&AUninon.Double == (double *) &AUnion
&AUninon.Char == (char * ) &AUnion

That's what is meant by that sentence, i.e. you can use &AUnion after
suitable casting either as a pointer to the 'Double' or to the 'Char'
member (if doing so makes much sense is a different question).
And a related question: if you dump unions in binary form to a file,
and then reload them from the file on a different platform, or with a
program compiled by a different compiler, are you guaranteed to get
back what you stored? (I think not, but I'm not sure)

On a different platform this will definitely not be guaranteed to
work - the floating point format can be completely different (already
sizeof(double) may differ) and also the number of bits in a char could
be different (i.e. the value of CHAR_BIT from limits.h>) as well as
the encoding. Even when compiled with a different compiler you could
theoretically get in trouble if the use different numbers of bits to
store floating point numbers, but I would guess that this is an extre-
mely unlikely case.
And another point: if you use the same union across different ABI's,
will that work without problems? For example: if you get a pointer to
such a union from a library compiled by a specific compiler, and then
use it in another module compiled with a different compiler, will
aUnion.Char work as expected?

As a rule I would expect it to work since a lot of things might break
if different compilers use different chars or doubles (you probably
wouldn't be able to link at all in that case since the libraries would
need t use a different libc). But I don't think there's a promise in
the standard that it will always work.
Someone on the comp.std.c group found this:
That doesn't seem to guarantee much does it?
I was hoping to use (*ptr).Char or (*ptr).Double without having to cast
anything. Looks like no more details are specified about whether this is
supposed to work across different ABI's:

No, you don't need a cast. The sentence just says that you can use the
address of the union after suitable conversion as a pointer to each
of its members, which guarantees that each member lays at the start
of the structure. But there's nothing wrong with using 'ptr->Char' or
'ptr->Double' without the cast - by specifying the member you already
tell the compiler which type is meant. So 'ptr->Char' is a char and
'ptr->Double' a double without any casts.
MyUnion_t theUnion;
somelib->getUnion(&theUnion);
where this is in my own program, and somelib is a function table into a
library possibly compiled by another compiler.
Will the libary return a union with the same binary layout as my own
program would do?

As I wrote above, it's very likely to work, but I don't see that there's
a guarantee. But if it doesn't work I would expect things to fail at the
linking stage since such differences would rather likely lead to a lot of
trouble all over the place.
Regards, Jens
 
C

CBFalconer

Koen said:
Does anyone know what the standard says about the way unions are
stored in C? I mean the following:

Let's say you have a union with a double and a char field:

union MyUnion_t
{
double Double;
char Char;
};

MyUnion_t aUnion;

Nobody seems to have pointed out that this declaration is wrong.
There is no type MyUnion_t known. Only "union MyUnion_t" is a
valid type. This is why some people would rather declare the type
in a typedef:

typedef unsion MyUnion_t {
double Double;
char Char;
} MyUnion_t;

Same for structures.
 
K

Keith Thompson

CBFalconer said:
Nobody seems to have pointed out that this declaration is wrong.
There is no type MyUnion_t known. Only "union MyUnion_t" is a
valid type. This is why some people would rather declare the type
in a typedef:

typedef unsion MyUnion_t {
double Double;
char Char;
} MyUnion_t;

Same for structures.

Actually it was pointed out, but of course Usenet is asynchronous.

Code that uses a struct, union, or enum tag as a type name is often an
indication that the author is compiling the code as C++ (which does
allow this).

This is yet another argument against compiling C with a C++ compiler;
you can miss errors like this that will prevent the code from
compiling with a C compiler. If you actually have a need to produce
code that compiles as both C and C++, you need to compile it with both
C and C++ compilers; each will catch errors that the other doesn't.
(If you think you have such a need, it's very likely that you don't.)
 
C

CBFalconer

Keith said:
.... snip ...

Code that uses a struct, union, or enum tag as a type name is often
an indication that the author is compiling the code as C++ (which
does allow this).

This is yet another argument against compiling C with a C++ compiler;
you can miss errors like this that will prevent the code from
compiling with a C compiler. If you actually have a need to produce
code that compiles as both C and C++, you need to compile it with both
C and C++ compilers; each will catch errors that the other doesn't.
(If you think you have such a need, it's very likely that you don't.)

You can avoid this by calling gcc through an alias that imposes the
"-x c" option. I call that alias cc, and it automatically appends
--help when no options are given, or imposes "-x c -W -Wall -ansi
-pedantic -Wwrite-strings -Wfloat-equal -gstabs+ -01" for anything
else.
 
K

Koen

Keith said:
Actually it was pointed out, but of course Usenet is asynchronous.

Code that uses a struct, union, or enum tag as a type name is often an
indication that the author is compiling the code as C++ (which does
allow this).

This is yet another argument against compiling C with a C++ compiler;
you can miss errors like this that will prevent the code from
compiling with a C compiler. If you actually have a need to produce
code that compiles as both C and C++, you need to compile it with both
C and C++ compilers; each will catch errors that the other doesn't.
(If you think you have such a need, it's very likely that you don't.)

You guys are absolutely right about this, thanks for pointing that out!
I kind of missed that (but OTOH it wasn't really the point of my post).
Koen
 
K

Koen

Koen said:
Hi!

Does anyone know what the standard says about the way unions are
stored in C? I mean the following:

Let's say you have a union with a double and a char field:

union MyUnion_t
{
double Double;
char Char;
};

MyUnion_t aUnion;

Is it standardized somehow which byte of the allocated storage the
Char field will use?

Thanks for all the help! I think I can conclude that there is no real
guarantee for the cross-ABI / cross-platform compatibility of binary
dumps.
Koen
 
J

Jens.Toerring

Koen said:
Thanks for all the help! I think I can conclude that there is no real
guarantee for the cross-ABI / cross-platform compatibility of binary
dumps.

Yes, definitely. It probably will work if you stay with the same
platform, but once the files have to read back in on a different
platform you're out of luck. I learned that the hard way some
years ago...
Regards, Jens
 
K

Keith Thompson

Yes, definitely. It probably will work if you stay with the same
platform, but once the files have to read back in on a different
platform you're out of luck. I learned that the hard way some
years ago...

And if your data contains pointers, they won't be usable across
executions of the same program. It might rarely make sense to write
pointers to a file if you're going to read them back in the same
execution of the program.

(I suspect the OP knows this, but we've seen code here that does
this.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Extending unions and ABI? 12
Leading padding in unions 25
Unions in structures 9
unions within an array 10
gcc, aliasing rules and unions 3
Memory layout in unions 6
Unions Redux 26
byte alignment in structures and unions 20

Members online

Forum statistics

Threads
473,995
Messages
2,570,235
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top