T
Tim Rentsch
Joshua Maurice said:[snip]
I was wondering if you would give me your expert opinion on a very
related issue, namely the union DR and a related issue. I've had a
bunch of musing and questions up on comp.std.c++ for a while now, and
I've received no replies. It involves several problems inherited from
C, some of which are closely related to the strict aliasing rule and
the union DR.
"Effective type" rules, not "strict aliasing" rules. These terms
are different, and only "effective type" is defined by the ISO
Standard. "Strict aliasing" is a gcc-ism, and means something
different.
[snip]
Ok. Let's look at a couple of simpler examples.
#include <stdlib.h>
int main()
{
void* p;
int* x;
float* y;
p = malloc(sizeof(int) + sizeof(float));
x = p;
y = p;
*x = 1;
*y = 2;
return *y;
}
Does the above program have any undefined behavior?
No.
What about the
following program?
#include <stdlib.h>
void foo(int* x, float* y)
{
*x = 1;
*y = 2;
}
int main()
{
void* p;
int* x;
float* y;
p = malloc(sizeof(int) + sizeof(float));
x = p;
y = p;
foo(x, y);
return *y;
}
Again the behavior for this program is defined.
The problem I see is that the strict aliasing rule's intent AFAIK is
Again I think you mean "effective type" rules rather than "strict
aliasing" rules.
to allow a compiler to transform foo to:
void foo(int* x, float* y)
{
*y = 2;
*x = 1;
}
as an "int*" and a "float*" may not alias. However, this would break
the program from the read in main of "*y" which reads a memory
location through a float lvalue which was last written to through an
int lvalue.
Yes it is, and yes it would. The rules for effective type
are poorly constructed and don't allow as much freedom as
most people expect.
Hopefully you should see how this is related to the union
DR now, ex:
#include <stdlib.h>
void foo(int* x, float* y)
{
*x = 1;
*y = 2;
}
int main()
{
union { int x; float y; };
foo(&x, &y);
return y;
}
The linked thread in comp.lang.c++ itself links to a DR resolution on
the c++ standard committee's website which IMHO does nothing to
address these problems.
(I haven't tried to read up on any c++ issues.)
In short: We want the compiler to be able to optimize assuming that
sufficiently differently typed pointers do not alias. This allowance
has been phrased as "You may not read an object through an lvalue of a
sufficiently different type." We must also support starting an
object's lifetime by writing to member sub-objects through an lvalue
of that member sub-object, such as:
#include <stdlib.h>
typedef struct T { int x; int y; } T;
int main()
{ T* t;
t = malloc(sizeof(T));
t->x = 1;
return t->x;
}
Taken together, this leads to the union DR (mentioned above), and the
highly related problem that any userland memory allocator appears to
at least run afoul of the union DR, and also possibly have undefined
behavior because a piece of memory may be treated as several
sufficiently different types while in the care of the userland memory
allocator.
Frankly, I don't see how you can get your cake and eat it too. You
could throw out the strict aliasing rule as an optimization allowance,
Again I believe you mean effective type rules.
and then it all works, but that's definitely not preferred. I think
it's unacceptable to say that you can't have userland memory
allocators either.
It's perfectly possible to have user-written memory allocators.
The only restriction is they might not be portable to other
implmentations, and that's why malloc() exists, to be a portable
solution. But nothing in the Standard prevents user-written
memory allocators.
If we could start over, some language-special
syntax which takes a pointer and a type could signal the start and end
of object lifetimes, and with that I think everything would fall into
place - but we can't do that without breaking all existing C code, so
that's out. Something has to give, and I don't see what.
Aliasing is related to, but not totally sychronized with, object
lifetimes. Also there may be, using unions, two objects that
occupy exactly the same memory locations.
PS: There's a minor issue as well, namely the following program:
#include <stdlib.h>
typedef struct T { int x; int y; } T;
typedef struct U { int x; int y; } U;
int main()
{ void* v;
T* t;
U* u;
int* x;
int* y;
v = malloc(sizeof(T) + sizeof(U));
t = v;
u = v;
if (&t->y != &u->y)
return 1;
x = &t->x;
y = &t->y;
*x = 1;
*y = 2;
/* Ok. Do we have a T object or a U object?
Why? All I see are writes through int lvalues. */
x = &u->x; /* UB? */
Gray area but IMO the stronger argument is that this is defined.
return *x; /* UB? */
Here I think the stronger argument is that this may be undefined,
because u->x was never used to store a value (and because the
object at '*t' does not alias with the object at '*u').
}
What do you want to make of it? IIRC, the intent was to disallow
reading a T object through a U lvalue, but I don't quite know what
that means when all of the reads and writes are made through sub-
object fundamental type lvalues, never aggregate lvalues in C. (The
situation is a little different for C++ due to virtual et al.) I think
we could solve this by saying that as long as all of the fundamental
writes are reads are consistent with a single complete object type,
then it's defined behavior. This punts a little bit though, as we need
to define when object lifetimes begins and end, which I expounded upon
at length above.
The rules regarding effective type and the conditions under which
aliasing is allowed or not allowed need to be revisited and written
appropriately. Unfortunately that is quite a difficult task.