Hmmm... I think that would be one heck of a rubbish compiler (or more
precisely *optimizing* compiler)!
If the standard allows that kind of stuff, then it simply is not
bullet-proof enough.
Many people think so; C is not the language to use if you want
bullet-proof code.
Because tmp occurs in both lines. Every compiler should notice that!
The rule violated by your code is written to cover situations where that
is not the case, without creating a special exception for when it is the
case. Type punning is very fundamentally a hack; if you really want to
write for bullet-proof code, you want a language where such a hack
constitutes a syntax error or a constraint violation; ideally, a
language which doesn't even present you with a mechanism for describing
such behavior. For bullet-proof code, you don't want a language which
actually defines the behavior that C leaves undefined.
Even if I "obfuscate" like this:
tmp.c1 = 1;
char *cp = ((struct b*)&tmp);
printf("%d\n", cp->c1);
I'd expect any compiler that gets it wrong, to be complete rubbish.
Why? Because it's an optimizer BUG.
Why?
Because any simple compiler, that does not optimize... get's it right!
And if any simple compiler get's it right, then any optimization must
guarantee to get it right as well.
I mean: if the C standard allows one to create optimizers that result
in such ... ummm... surprises (read: "rubbish"), then the standard is
faulty in my eyes.
OK: here's an example of the kind of code for which the relevant rules
were defined, which may give you a better understanding of why they
exist. It's enormously simplified from any realistic example, but the
key features of this example that are relevant to the aliasing rules are
also commonplace features of much real-world code:
struct department {
int department_id;
// ...
};
struct employee {
int employee_id;
int department_id;
// ...
};
// The key point is that employee_id and department_id are both at the
// start of their structs, and have the same type.
static void lay_off_employees(
struct department *dept,
struct employee employees[],
int num
){
for(int emp = 0; emp<num; emp++)
{
if(employees[emp].department_id == dept->department_id)
{
// Deal with lots of other issues.
// We're done now!
employees[emp].employee_id = -1;
}
}
// ...
}
Key point to notice: dept->department_id is not changed through an
lvalue of type "struct department" anywhere inside the lay_off_employees
loop. It can therefore be treated as a loop invariant, which need only
actually be evaluated once, before the loop begins. In other words, the
implementation is free to compile that function as if it were written:
int dept_id = dept->department_id;
for(int emp=0; emp<num; emp++)
{
if(employee[emp].department_id == dept_id)
...
This is one of the simplest and oldest of loop optimization strategies.
Now, consider the following ridiculous code:
employees[523].employee_id = human_resources.department_id;
lay_off_employees((struct department*)(employees+523),
employees, 600);
The possibility of writing such code means that dept->department_id is
not actually a loop invariant. If it weren't for C's anti-aliasing
rules, the behavior of such code would be defined, and as soon as
emp==523, the value of dept->department_id would have to change to -1.
The simplest approach to making that work would be to accept the
performance hit of repeatedly re-evaluating dept->department_id during
every pass through the loop, just in case it might change as a result of
something that happens inside the loop. There are more complicated
approaches can be used to reduce the performance hit, by retrieving it
only once at the start of the loop, and a second time after emp==523.
However, those ARE more complicated approaches, and they don't scale
well when the number of different expressions that might alias each
other gets as large as 5 or 10 different expressions (in typical
real-world code the number of potentially-aliasing expressions is much
larger than that, particularly if there's a lot of pointers floating
around).
Because it is undefined behavior for *dept to refer to the same object
as employees[emp], an implementation is allowed to not worry about the
possibility that someone would write such ridiculous code, so they can
treat dept->department_id as if it actually were a loop invariant that
could be moved out of the loop. Only code that has undefined behavior
would be broken by treating it as a loop invariant.
This seems like a trivial issue in this simplified context, but keep in
mind that, without the anti-aliasing rules, any lvalue derived from a
pointer could potentially alias any other lvalue, if that pointer had
the right value and compatible alignment requirements. The anti-aliasing
rules guarantee that a compiler only needs to worry about aliasing
between lvalues of certain types, greatly reducing the problem, with
corresponding increases in the feasibility of performing such optimizations.