Real Life Unions

B

bluejack

A recent post asking for help with unions reminded me of this
component of the C language that I have only used a couple of times,
and those almost entirely out of personal whim -- Unions for the sake
of Unions simply because I wanted to see one in action.

Granted: it makes it possible to save a few bytes of storage when you
have something that can be a chicken or a rooster, but not both, and
you're always going to know which it is.

But I don't think I've ever encountered a problem where I smacked
myself in the head and said, "A union would be perfect for this!" It
always seems like a design error to go jamming different types into a
single symbol, or at least a design that's asking for trouble.

That said, I don't write device drivers, kernels, or software for
embedded devices. Space is not usually the constraint that concerns
me: clarity, maintainability, and buglessness are almost always my #1
goals.

But I was curious, in the community of c programmers, whether Unions
had an honored place in the engineer's toolkit, or whether they were
more like an old awl kicking around the dark corners of the box. Feel
free to provide instructive examples of great moments in Union history!
 
E

Eric Sosman

bluejack said:
A recent post asking for help with unions reminded me of this
component of the C language that I have only used a couple of times,
and those almost entirely out of personal whim -- Unions for the sake
of Unions simply because I wanted to see one in action.

Granted: it makes it possible to save a few bytes of storage when you
have something that can be a chicken or a rooster, but not both, and
you're always going to know which it is.

But I don't think I've ever encountered a problem where I smacked
myself in the head and said, "A union would be perfect for this!" It
always seems like a design error to go jamming different types into a
single symbol, or at least a design that's asking for trouble.

That said, I don't write device drivers, kernels, or software for
embedded devices. Space is not usually the constraint that concerns
me: clarity, maintainability, and buglessness are almost always my #1
goals.

But I was curious, in the community of c programmers, whether Unions
had an honored place in the engineer's toolkit, or whether they were
more like an old awl kicking around the dark corners of the box. Feel
free to provide instructive examples of great moments in Union history!

IMHO unions are closer to the dark corners than to the
well-lit top drawer, but they're a tool that would be missed
if someone borrowed them and forgot to return them to your
tool chest.

"Space-saving" has become something of a dirty word, along
with "efficiency." But this is largely a backlash against
excess, against the ritualistic drive to save one more byte
or one more microsecond in a "Hello, world!" program. There
remain situations where savings are important.

And among those, space-saving situations arise more often
than (justifiable) time-saving situations. You've mentioned
the use of unions in "poor man's O-O" polymorphic situations,
and it's easy to see that in a rich object hierarchy with a
lot of more-than-kin-but-less-than-kind relationships the space
savings could be huge. Also, in such situations space savings
can easily equate to time savings: compared to the modern CPU,
memory is d-o-g s-l-o-w and getting slower (we reward our memory
manufacturers for density and for cost, not for speed). If you
can pack four Thing instances into a cache line instead of just
two, you stand a chance of cutting the cache miss rate in half
and making your program 30-60% faster. That sort of efficiency
isn't in the "dirty word" department; it's nothing like the
foolish exercise of shaving a microsecond off a loop that will
run three times total.

You mentioned device drivers. There's a field that's ripe
for polymorphic programming: You'll have a generic framework
that bridges between the device-blind and device-aware parts
of the system, and maybe in the device-aware layer you'll have
further abstraction (we're in the SCSI driver, but are we
talking to a disk or to a tape?). A convenient way to build
such frameworks is to let the lowest-level pieces build structs
with all the information their specific devices need, and then
hand them off to higher-level layers that are unaware of the
specifics. This is polymorphism in action, chum, and unions
are a fine way to achieve it. Not the only way, but a good one.

So, always look for the union label.
 
B

Beej Jorgensen

bluejack said:
Feel free to provide instructive examples of great moments in Union
history!

Some Internet sockets implementations used to use a union like this
(some probably still do) for IPv4 addresses:

struct in_addr {
union {
struct { u_char s_b1,s_b2,s_b3,s_b4; } S_un_b;
struct { u_short s_w1,s_w2; } S_un_w;
u_long S_addr;
} S_un;
}
#define s_addr S_un.S_addr

This way you could look at the internet address as 4 8-bit chunks, 2
16-bit chunks, or 1 32-bit chunk (using the define to access it as
"foo.s_addr".)

Turns out most people used the 32-bit representation, and under Linux,
anyway, the union is long gone:

typedef uint32_t in_addr_t;
struct in_addr
{
in_addr_t s_addr;
};

But looking down the code a little more, union is used again for the
IPv6 stuff:

/* IPv6 address */
struct in6_addr
{
union
{
uint8_t u6_addr8[16];
uint16_t u6_addr16[8];
uint32_t u6_addr32[4];
} in6_u;
#define s6_addr in6_u.u6_addr8
#define s6_addr16 in6_u.u6_addr16
#define s6_addr32 in6_u.u6_addr32
};

-Beej
 
I

Ian Collins

bluejack said:
A recent post asking for help with unions reminded me of this
component of the C language that I have only used a couple of times,
and those almost entirely out of personal whim -- Unions for the sake
of Unions simply because I wanted to see one in action.
But I was curious, in the community of c programmers, whether Unions
had an honored place in the engineer's toolkit, or whether they were
more like an old awl kicking around the dark corners of the box. Feel
free to provide instructive examples of great moments in Union history!
One large swag of code that uses (big) unions is the standard Unix
windowing environment, Xlib. All X events are represented as structures
within one union, where the first structure member is the event ID.
 
E

Ernie Wright

bluejack said:
[...]
But I don't think I've ever encountered a problem where I smacked
myself in the head and said, "A union would be perfect for this!" It
always seems like a design error to go jamming different types into a
single symbol, or at least a design that's asking for trouble.

That said, I don't write device drivers, kernels, or software for
embedded devices. Space is not usually the constraint that concerns
me: clarity, maintainability, and buglessness are almost always my #1
goals.

But I was curious, in the community of c programmers, whether Unions
had an honored place in the engineer's toolkit, or whether they were
more like an old awl kicking around the dark corners of the box. Feel
free to provide instructive examples of great moments in Union history!

One of the commercial programs I worked on supports a plug-in API that
uses unions. Plug-ins are passed a pointer to a function,

int ( *execute )( void *, LWCommandCode cmd, int argc,
const DynaValue *argv, DynaValue *result );

that can be used to invoke any of several hundred commands implemented
by the program. The command's arguments and return value are wrapped
in DynaValues, a union of more primitive types,

http://home.comcast.net/~erniew/lwsdk/docs/dynaval.html

so that commands can take arguments and return values of arbitrary
type.

That's obviously not the only way this could be handled. The program
could supply a structure or an array containing pointers to every
command function, for example. But using the one execute() function
greatly simplifies the linkage between program and plug-in, makes it
possible for the program to type-check the command arguments, and makes
forward and backward compatibility much easier to support.

Different versions of the program support different sets of commands.
Newer versions have new commands not supported by older versions, and
commands are occasionally deprecated. Using the execute() mechanism,
a single plug-in binary will work with all versions of the program. If
the plug-in attempts to execute() a command that isn't supported, the
program can handle that gracefully, and therefore so can the plug-in.
With a structure or table of function pointers, the program and plug-in
would potentially be meeting at a mismatched interface, and bad things
would happen that neither could control.

This isn't arcane: communication between separate execution units is
pretty commonplace. Nor are unions being used merely to save a few
bytes. And it's not polymorphism in the typical sense: we wouldn't say
that we're defining hundreds of variants of execute().

The plug-in API does provide an alternative,

int ( *evaluate )( void *, const char *cmdstring );

in which the command and its arguments are passed in a string. There
are arguments favoring both approaches.

- Ernie http://home.comcast.net/~erniew
 
W

William Ahern

A recent post asking for help with unions reminded me of this
component of the C language that I have only used a couple of times,
and those almost entirely out of personal whim -- Unions for the sake
of Unions simply because I wanted to see one in action.

Granted: it makes it possible to save a few bytes of storage when you
have something that can be a chicken or a rooster, but not both, and
you're always going to know which it is.

But I don't think I've ever encountered a problem where I smacked
myself in the head and said, "A union would be perfect for this!" It
always seems like a design error to go jamming different types into a
single symbol, or at least a design that's asking for trouble.

I do use unions for space savings, and I often run across situations where
that benefit is significant. But more than that, when I use unions I use
them because they make my code more concise, and the intent expressed.

Most recent example, reading an interleaved RTSP stream. Interleaved
means that RTP packets (normally sent over UDP) are mixed in w/ the RTSP
stream control exchange (on TCP). When you break those down there are
several different types of RTP messages, and several different types of
RTSP messages. But, at the front of the stream processing pipeline, you
just want to grab the next message off of the stream, and at the other end
spit out another generic message. So, you can imagine unions come in very
handy here, and they make the code quite digestible.

When you have a complex structure, it can be hard to keep track of each
member, and what its used for. A union says, "these elements are
mutually exclusive, yet share a similar relationship to this parent".
Being able to convey such relationships in code is priceless.

I suppose its sort of obvious in the above that I usually employ unions
as members within larger structures, and not as standalone data
structures.
 
K

Kenneth Brody

bluejack wrote:
[...]
But I don't think I've ever encountered a problem where I smacked
myself in the head and said, "A union would be perfect for this!" It
always seems like a design error to go jamming different types into a
single symbol, or at least a design that's asking for trouble.
[...]

Consider a database in which there could be multiple types of records
in the file, with the layout being describe with:

struct record
{
int type;
union
{
TYPE_A A;
TYPE_B B;
TYPE_C C;
}
data;
};

Another example is passing "messages" to an event-driven program.
There are multiple types of messages, such as keyboard input, mouse
input, windows messages, and so on. By creating a struct which
contains the message type, plus a union of all of the different
types of messages, you can pass the struct to a single event
handler routine, rather than having to pass the message to a
different function for each type of event. (Plus, you can add
new event types without affecting existing event handlers.)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
K

Kenny McCormack

Kenneth Brody said:
Another example is passing "messages" to an event-driven program.
There are multiple types of messages, such as keyboard input, mouse
input, windows messages, and so on. By creating a struct which
contains the message type, plus a union of all of the different
types of messages, you can pass the struct to a single event
handler routine, rather than having to pass the message to a
different function for each type of event. (Plus, you can add
new event types without affecting existing event handlers.)

You are all, of course, talking around the problem as usual.

The point is that there is nothing topical in this newsgroup that you
can do with a union that you couldn't do with a struct (at the cost of
using more memory/space - another thing we're not allowed to talk about
here).
 
A

Adrian Hawryluk

Kenny said:
You are all, of course, talking around the problem as usual.

The point is that there is nothing topical in this newsgroup that you
can do with a union that you couldn't do with a struct (at the cost of
using more memory/space - another thing we're not allowed to talk about
here).
If you are implying that you can always cast a struct over any address,
then ok, but doing so is not... as safe? as intuitive? and it looses the
meaning except in documentation (which every programmer does, right? ;))

A union can be used to group structs which can be used in place of each
other. They can also be used to find out the largest amount of memory
to have to allocate at compile time given a set of structs. This
happens in the case of message passing using something like QNiX
Send-Receive-Reply mechanism, where it requires that you copy the
message from one process space to another, you need to allocate an
appropriate buffer size.

The polymorphic *like* ability is very useful and is safer than a
programmer casting as the cast could be anything, where as a union at
least instructs the programmer that the type should be one of a smaller
defined set.


Adrian
 
K

Kenny McCormack

If you are implying that you can always cast a struct over any address,
then ok, but doing so is not... as safe? as intuitive? and it looses the
meaning except in documentation (which every programmer does, right? ;))

I'm not sure what you mean by this, but all I'm saying is that if you
have:

union foo { object1;object2;object3 }

that is exactly the same as:

struct foo { object1;object2;object3 }

in terms of what you can with it.

In particular, using unions to do on-the-fly type "spoofing" (which I
always thought was their primary purpose) is OT in this NG.
 
B

bluejack

struct record
{
int type;
union
{
TYPE_A A;
TYPE_B B;
TYPE_C C;
}
data;
};

Right; this is how I've used them in the past, and the need to provide
an explicit type indicator to inform the handling code what is in the
union is displeasing to me;

However, with many of the examples provided, I see how unions can
create flexible, and extensible *interfaces*. The case of the X11
event model being a prime example. This, to me, was the enlightenment
I sought, and I thank everyone for their input!

Back to my original silly metaphor, if something is a Chicken or a
Rooster but not both you might define a union to express this. Then,
defining a function that accepts the type indicator and the union, if
at a later date your code grows to encompass Cornish Game Hens,
Turkeys, and Roast Peking Duck, you do not need to change the
interface.

Naturally, the same thing could be effectively accomplished with a
type indicator and a void pointer, but all things considered, a union
is safer than a void pointer any day.

And, sure, a struct *could* do the job: but if the types are mutually
exclusive, and you will need a type indicator to point code to the
meaningful member, then a union is the *right* structure.

Thanks!
 
W

William Ahern

You are all, of course, talking around the problem as usual.

The point is that there is nothing topical in this newsgroup that you
can do with a union that you couldn't do with a struct (at the cost of
using more memory/space - another thing we're not allowed to talk about
here).

By that same logic there's nothing one could do w/ a struct that one
couldn't do with careful, mindful, and tedious bit operations on elements
of arrays of unsigned char.

There's nothing one can do with typedef's that one cannot do without. We
can even begin throwing out arithmetic operators.
 
B

Ben Pfaff

William Ahern said:
There's nothing one can do with typedef's that one cannot do
without.

How do you call va_arg for a function pointer type without
typedef?
 
M

micans

A recent post asking for help with unions reminded me of this ...
free to provide instructive examples of great moments in Union history!

Let me add another question to this. One of my applications uses a
sparse matrix library. Values are encoded as floats or doubles (it is
a typedef, set at compile time).
Now there are occasions where I'd like to store something else, say a
pointer to some object specific to that matrix entry.
The need arises only infrequently. I then code in whatever way seems
logical (using ad hoc data structures), but I've wondered about
replacing the value with a union of say { void* obj, double val }. It
probably hinges on the extent to which code can be reused for the two
separate cases. I believe there is certainly scope for that in my
application - it will be application specific in general.

It is tempting to compare this with some container type that works on
(void*) data with callbacks operating on the data (say a generic hash
library). In case of a sparse matrix this approach would penalize the
default case (doubles) prohibitively in terms of speed. I believe the
union approach would probably be alright in this respect (but you
don't get the same code genericity).

Any thoughts on this?
stijn
 
W

William Ahern

How do you call va_arg for a function pointer type without
typedef?

Aha! You don't. You could manage just as well without function pointers.

Ok. Now this is just getting silly. =)
 
K

Kenneth Brody

bluejack said:
struct record
{
int type;
union
{
TYPE_A A;
TYPE_B B;
TYPE_C C;
}
data;
};

Right; this is how I've used them in the past, and the need to provide
an explicit type indicator to inform the handling code what is in the
union is displeasing to me; [...]
Back to my original silly metaphor, if something is a Chicken or a
Rooster but not both you might define a union to express this. Then,
defining a function that accepts the type indicator and the union, if
at a later date your code grows to encompass Cornish Game Hens,
Turkeys, and Roast Peking Duck, you do not need to change the
interface.

Naturally, the same thing could be effectively accomplished with a
type indicator and a void pointer, but all things considered, a union
is safer than a void pointer any day.
[...]

What about a linked list of fowl? Yes, the linked list could be
a list of void pointers to the item, but unions allow you to have
a list of the items themselves. (Ditto for "array" instead of
"linked list", though in that case you need to make sure that if
the size of the union changes, you need to rebuild any libraries
using it.)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
Y

Yevgen Muntyan

Ben said:
How do you call va_arg for a function pointer type without
typedef?

#include <stdio.h>
#include <stdarg.h>

void foo (void)
{
printf ("Hi there!\n");
}

void func (int a, ...)
{
va_list ap;
va_start (ap, a);

{
void (*f) (void) = va_arg (ap, void(*)(void));
if (f)
f ();
}

va_end (ap);
}

int main (void)
{
void (*f) (void) = foo;
func (0, f);
func (0, foo);
return 0;
}
 
B

Ben Pfaff

void (*f) (void) = va_arg (ap, void(*)(void));

That is non-standard code:

7.15.1.1 The va_arg macro

Synopsis
1 #include <stdarg.h>
type va_arg(va_list ap, type);

[...] The parameter type shall be a type name specified such
that the type of a pointer to an object that has the
specified type can be obtained simply by postfixing a * to
type.
 
M

Mark McIntyre

But I was curious, in the community of c programmers, whether Unions
had an honored place in the engineer's toolkit,

They're ideal for interfaces. Take a look at a sockets implementation
sometime. A popular programming language also uses a union "under the
hood" to store its untyped data type.


--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
Y

Yevgen Muntyan

Ben said:
void (*f) (void) = va_arg (ap, void(*)(void));

That is non-standard code:

7.15.1.1 The va_arg macro

Synopsis
1 #include <stdarg.h>
type va_arg(va_list ap, type);

[...] The parameter type shall be a type name specified such
that the type of a pointer to an object that has the
specified type can be obtained simply by postfixing a * to
type.

Oh, so that's UB. I believed gcc would emit a warning in a case like
this (with -ansi -pedantic -Weverything).

Thanks,
Yevgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top