A basic (?) problem with addresses (gcc)

T

Tim Rentsch

Joshua Maurice said:
[snip]

I was wondering if you would give me your expert opinion on a very
related issue, namely the union DR and a related issue. I've had a
bunch of musing and questions up on comp.std.c++ for a while now, and
I've received no replies. It involves several problems inherited from
C, some of which are closely related to the strict aliasing rule and
the union DR.

"Effective type" rules, not "strict aliasing" rules. These terms
are different, and only "effective type" is defined by the ISO
Standard. "Strict aliasing" is a gcc-ism, and means something
different.
[snip]

Ok. Let's look at a couple of simpler examples.

#include <stdlib.h>
int main()
{
void* p;
int* x;
float* y;

p = malloc(sizeof(int) + sizeof(float));
x = p;
y = p;
*x = 1;
*y = 2;
return *y;
}

Does the above program have any undefined behavior?
No.

What about the
following program?

#include <stdlib.h>
void foo(int* x, float* y)
{
*x = 1;
*y = 2;
}
int main()
{
void* p;
int* x;
float* y;

p = malloc(sizeof(int) + sizeof(float));
x = p;
y = p;
foo(x, y);
return *y;
}

Again the behavior for this program is defined.
The problem I see is that the strict aliasing rule's intent AFAIK is

Again I think you mean "effective type" rules rather than "strict
aliasing" rules.
to allow a compiler to transform foo to:
void foo(int* x, float* y)
{
*y = 2;
*x = 1;
}
as an "int*" and a "float*" may not alias. However, this would break
the program from the read in main of "*y" which reads a memory
location through a float lvalue which was last written to through an
int lvalue.

Yes it is, and yes it would. The rules for effective type
are poorly constructed and don't allow as much freedom as
most people expect.
Hopefully you should see how this is related to the union
DR now, ex:

#include <stdlib.h>
void foo(int* x, float* y)
{
*x = 1;
*y = 2;
}
int main()
{
union { int x; float y; };
foo(&x, &y);
return y;
}

The linked thread in comp.lang.c++ itself links to a DR resolution on
the c++ standard committee's website which IMHO does nothing to
address these problems.

(I haven't tried to read up on any c++ issues.)
In short: We want the compiler to be able to optimize assuming that
sufficiently differently typed pointers do not alias. This allowance
has been phrased as "You may not read an object through an lvalue of a
sufficiently different type." We must also support starting an
object's lifetime by writing to member sub-objects through an lvalue
of that member sub-object, such as:

#include <stdlib.h>
typedef struct T { int x; int y; } T;
int main()
{ T* t;
t = malloc(sizeof(T));
t->x = 1;
return t->x;
}

Taken together, this leads to the union DR (mentioned above), and the
highly related problem that any userland memory allocator appears to
at least run afoul of the union DR, and also possibly have undefined
behavior because a piece of memory may be treated as several
sufficiently different types while in the care of the userland memory
allocator.

Frankly, I don't see how you can get your cake and eat it too. You
could throw out the strict aliasing rule as an optimization allowance,

Again I believe you mean effective type rules.
and then it all works, but that's definitely not preferred. I think
it's unacceptable to say that you can't have userland memory
allocators either.

It's perfectly possible to have user-written memory allocators.
The only restriction is they might not be portable to other
implmentations, and that's why malloc() exists, to be a portable
solution. But nothing in the Standard prevents user-written
memory allocators.
If we could start over, some language-special
syntax which takes a pointer and a type could signal the start and end
of object lifetimes, and with that I think everything would fall into
place - but we can't do that without breaking all existing C code, so
that's out. Something has to give, and I don't see what.

Aliasing is related to, but not totally sychronized with, object
lifetimes. Also there may be, using unions, two objects that
occupy exactly the same memory locations.
PS: There's a minor issue as well, namely the following program:

#include <stdlib.h>
typedef struct T { int x; int y; } T;
typedef struct U { int x; int y; } U;
int main()
{ void* v;
T* t;
U* u;
int* x;
int* y;

v = malloc(sizeof(T) + sizeof(U));
t = v;
u = v;

if (&t->y != &u->y)
return 1;

x = &t->x;
y = &t->y;

*x = 1;
*y = 2;
/* Ok. Do we have a T object or a U object?
Why? All I see are writes through int lvalues. */

x = &u->x; /* UB? */

Gray area but IMO the stronger argument is that this is defined.
return *x; /* UB? */

Here I think the stronger argument is that this may be undefined,
because u->x was never used to store a value (and because the
object at '*t' does not alias with the object at '*u').
}

What do you want to make of it? IIRC, the intent was to disallow
reading a T object through a U lvalue, but I don't quite know what
that means when all of the reads and writes are made through sub-
object fundamental type lvalues, never aggregate lvalues in C. (The
situation is a little different for C++ due to virtual et al.) I think
we could solve this by saying that as long as all of the fundamental
writes are reads are consistent with a single complete object type,
then it's defined behavior. This punts a little bit though, as we need
to define when object lifetimes begins and end, which I expounded upon
at length above.

The rules regarding effective type and the conditions under which
aliasing is allowed or not allowed need to be revisited and written
appropriately. Unfortunately that is quite a difficult task.
 
T

Tim Rentsch

Nick Bowler said:
Note that the effective type rules (i.e., -fstrict-aliasing) do not
prevent you from converting pointers between two types with the same
alignment. The rules only apply when you dereference such a pointer.
So -fstrict-aliasing doesn't violate the above C89 requirements.


The OP's posted code has undefined behaviour in C99 because it violates
the "shall" requirement in 6.5#7. This is a requirement not in C89, but
as I don't have a copy handy, I can't say whether or not gcc's
-fstrict-aliasing option renders the compiler non-conforming to C89.

It would be interesting to know if there is a strictly conforming C89
program that both (a) violates no constraints of C99, and (b) has
undefined behaviour in C99 due to the new rules about effective types.


Yes and no. C99 adopted new wording which comes just shy of officially
blessing this use: in C89, the results of reading a union member other
than the last stored member were undefined.

Implementation defined, not undefined.
C99 drops this text and
instead says that assigning to a union member causes the bytes of the
object representation of other members to become unspecified.

Almost. Bytes of other members _outside the bytes of the member
being assigned_ become unspecified. Bytes of other members that
are inside the region of the member being assigned have well-defined
values (that depend on implementation-defined details, but still
well-defined).
 
S

Seebs

You mean effective type rules. The term 'strict aliasing' is a gcc-ism
and independent of the C Standard.

Whoops. Yeah. I think they actually mean the same thing, but the
term "effective type rules" is a clearer and more specific one.

-s
 
J

Joshua Maurice

Gray area but IMO the stronger argument is that this is defined.


Here I think the stronger argument is that this may be undefined,
because u->x was never used to store a value (and because the
object at '*t' does not alias with the object at '*u').

Indeed, but conceptually that makes no sense. I understand the desire
to say that, but I went to great lengths when I had the early return:
if (&t->y != &u->y)
return 1;

"&t->x" and "&u->x" have equivalent values past that early return, so
I find it difficult to internally justify calling the above undefined
behavior. It just makes no sense within the given framework, and
especially as commonly implemented. I think you'd have to data
dependency analysis in the aliasing analysis in for a compiler to
break the above program, which doesn't seem to be the intent of the
lifetime rules and strict aliasing rules at all.

Also, I think you could use some offsetof manipulation to make this
apparent contradiction even clearer.

PS: anyone know of any real implementations which do aliasing analysis
based on struct types? IIRC, and based on some quick experimentation,
newer versions of gcc only do alias analysis for primitive types.
 
T

Tim Rentsch

Joshua Maurice said:
Indeed, but conceptually that makes no sense.

At some level I agree with you. This also is a gray area;
certainly a case can be made that the behavior here is
defined, and I can't identify any specific Standard
requirement that isn't met. The question is whether an
implementation is allowed to "know" that objects of the two
struct types cannot overlap (absent a visible union type
that includes them both, which is the case here). I think
there are reasonable arguments on both sides. Certainly
this issue merits clarification.
I understand the desire
to say that, but I went to great lengths when I had the early return:
if (&t->y != &u->y)
return 1;

"&t->x" and "&u->x" have equivalent values past that early return, so
I find it difficult to internally justify calling the above undefined
behavior. It just makes no sense within the given framework,

An implementation is _allowed_ to make use of the identity
'&t->y == &u->y' for code following the return statement,
but certainly it isn't _required_ to. And we may expect
that, in the presence of undefined behavior (assuming that
such exists here), an implementation might very well deduce
several consequences that are not logically consistent.
That is the nature of undefined behavior - something doesn't
make sense.
and especially as commonly implemented.

What implementations do does not affect whether the behavior
is undefined. Certainly it may affect what behavior we
expect to see, but not whether the behavior is defined.
I think you'd have to data
dependency analysis in the aliasing analysis in for a compiler to
break the above program, which doesn't seem to be the intent of the
lifetime rules and strict aliasing rules at all.

First: ITYM effective type rules. "Strict aliasing" is a
gcc-ism, and not synonymous with (and probably not the same
as) the effective type rules.

Second: in fact, I think it is precisely the point of
effective type rules to enable certain kinds of code
transformations, and those code transformations may very
well depend on data flow analysis as well as type
information. So I think I draw the opposite conclusion
about intent -- effective type rules are provided so
that code transformations can be done that otherwise
might change the semantics of a naive translator.
Also, I think you could use some offsetof manipulation to make this
apparent contradiction even clearer.

Here I'm not sure what contradiction you're talking about.
The questions are gray, neither black nor white, and could
be made either. It is only when an issue resolves to both
black _and_ white that there is a contradiction. The issue
I see is that the matter is unclear, not that there is (yet)
any contradiction. Making the behavior here undefined might
seem counter-intuitive, or stupid, but as far as I know it
wouldn't result in any inconsistency.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,952
Messages
2,570,111
Members
46,696
Latest member
PasqualeBe

Latest Threads

Top