Effective types of union members

T

Tim Rentsch

Francis Moreau said:
Tim Rentsch said:
[...]
I don't think is can be so simple. What about:

u.b = 1.23;
int *ip = &u.a;
int i = *ip;

Is the access via *ip valid under strict aliasing rules? This is the
example in the gcc man page that is declared invalid, yet it seems to me
your requirements: "the lvalue is of type int, and the effective is also
of type int".

Do you mean under effective type rules, or under strict aliasing
rules? As far as I know 'strict aliasing' is defined by GNU/FSF
as part of gcc, and is more restrictive than the effective type
rules of C99 -- in other words, under 'strict aliasing' rules some
programs that are defined under the ISO standard would behave
other than as the ISO standard requires. So it's important
to know which one you are asking about.

Ok so it looks like we agree that the example given by GCC's man page
which is the same as the above one _is_ defined by the standard as a
valid alias case.

However it looks like the Standard doesn't handle the union special case
correctly when it comes to aliasing, since unions remove all
optimisations based on type:

union u { int a; double b; };

/* at that point int can alias double */
int *i;
double *d;

void foo(int *pi, double *pd) { ... }

int main (void)
{
union u u;
i = &u.a;
d = &u.b;
foo(i, d);
}

So 'strict aliasing' seems the correct thing to do.

Two comments:

1. In the context of discussing standard C, I
think it's better not to use the term "strict
aliasing" since it does not have a well-defined
meaning (not counting the meaning that "it's
whatever gcc takes it to mean").

2. The question you bring up is one where the
Standard itself is unclear. There are at least
four different questions, namely, (a) What
does the Standard actually say about this?;
(b) What do people generally take it to mean?;
(c) What did the members of WG14 intend the
Standard to require in this case (and that
is probably different for different members)?;
and (d) What semantics are most in keeping
with how the Standard describes the language
otherwise? As far as I know none of these
questions has a clear answer, so when you
ask about what is "the correct thing to do",
all I can say is it's not clear, even though
I don't know which question you are trying
to answer.

By the way, your example doesn't say whether
the complete union type is visible at the
point where the function foo() is defined,
and this will make a difference, at least in
some cases, in what behavior people expect for
this example.
 
J

Johannes Schaub (litb)

Ben said:
It's purpose is to explain the behaviour of accesses that *are*
permitted. There is no doubt at all that u.a is allowed after setting
u.b so the meaning needs to be explained.


I don't know what you mean by "broken". Are you saying that the rules
are wrong, i.e. that the example is indeed UB but that you think it
should not be? If you mean that the situation could be clearer, then I
agree.

I think the aliasing rules don't really say what the committee wants it to
say regarding unions.
I presume that "accessed by" (in 6.5 p7) includes the use of u even
though the full expression (u.b) has a non-union type. I don't see any
other sane way to read it.

You can read it as allowing to copy the value of objects that are contained
as membes in an union, struct or array (as elements), by simply copying that
aggregate or union, for example.

If we suppose that it includes the use of the union, behavior would still be
undefined, because we will then have two types that we access the object by:

- The union lvalue
- The double/int lvalue

So reading by the other member will use an union lvalue (so to speak),
right. But the rule says "An object shall have its stored value accessed
only by an lvalue expression that has one of the following types". So it is
not enough that we use the union lvalue. But we will also have to comply for
the double/int lvalue. And then it gets to undefined behavior.
 
J

Johannes Schaub (litb)

Ben said:
It's purpose is to explain the behaviour of accesses that *are*
permitted. There is no doubt at all that u.a is allowed after setting
u.b so the meaning needs to be explained.


I don't know what you mean by "broken". Are you saying that the rules
are wrong, i.e. that the example is indeed UB but that you think it
should not be? If you mean that the situation could be clearer, then I
agree.

I think the aliasing rules don't really say what the committee wants it to
say regarding unions.
I presume that "accessed by" (in 6.5 p7) includes the use of u even
though the full expression (u.b) has a non-union type. I don't see any
other sane way to read it.

You can read it as allowing to copy the value of objects that are contained
as membes in an union, struct or array (as elements), by simply copying that
aggregate or union, for example.

If we suppose that it includes the use of the union, behavior would still be
undefined, because we will then have two types that we access the object by:

- The union lvalue
- The double/int lvalue

So reading by the other member will use an union lvalue (so to speak),
right. But the rule says "An object shall have its stored value accessed
only by an lvalue expression that has one of the following types". So it is
not enough that we use the union lvalue. But we will also have to comply for
the double/int lvalue. And then it gets to undefined behavior.

This of course is only true if we don't give the members of the union an own
effective type, but if the effective type of the accessed object is tracked
by the "active member". This is what the C committee seem to say is the case
in their analysis of the DR.
 
F

Francis Moreau

Ben Bacarisse said:
[...]
This is a known issue. The aliasing rules in presence of unions is broken.

I don't know what you mean by "broken". Are you saying that the rules
are wrong, i.e. that the example is indeed UB but that you think it
should not be?

I think it's broken because as the rules stand, the example is defined
but it should not.
If you mean that the situation could be clearer, then I agree.

I agree.
That's not (exactly) what it says and since the devil is in the details,
the wording matters.

Well, so far, the explanation you gave (thanks for that) on how it's
undefined behaviour doesn't rely on any parts of the standard. Instead
it makes assumptions on what the standard wanted to mean.
I presume that "accessed by" (in 6.5 p7) includes the use of u even
though the full expression (u.b) has a non-union type. I don't see any
other sane way to read it.

Because there's no sane way to read it.
 
F

Francis Moreau

Tim Rentsch said:
Francis Moreau said:
Tim Rentsch said:
[...]


I don't think is can be so simple. What about:

u.b = 1.23;
int *ip = &u.a;
int i = *ip;

Is the access via *ip valid under strict aliasing rules? This is the
example in the gcc man page that is declared invalid, yet it seems to me
your requirements: "the lvalue is of type int, and the effective is also
of type int".

Do you mean under effective type rules, or under strict aliasing
rules? As far as I know 'strict aliasing' is defined by GNU/FSF
as part of gcc, and is more restrictive than the effective type
rules of C99 -- in other words, under 'strict aliasing' rules some
programs that are defined under the ISO standard would behave
other than as the ISO standard requires. So it's important
to know which one you are asking about.

Ok so it looks like we agree that the example given by GCC's man page
which is the same as the above one _is_ defined by the standard as a
valid alias case.

However it looks like the Standard doesn't handle the union special case
correctly when it comes to aliasing, since unions remove all
optimisations based on type:

union u { int a; double b; };

/* at that point int can alias double */
int *i;
double *d;

void foo(int *pi, double *pd) { ... }

int main (void)
{
union u u;
i = &u.a;
d = &u.b;
foo(i, d);
}

So 'strict aliasing' seems the correct thing to do.

Two comments:

1. In the context of discussing standard C, I think it's better not to
use the term "strict aliasing" since it does not have a well-defined
meaning (not counting the meaning that "it's whatever gcc takes it to
mean").

2. The question you bring up is one where the Standard itself is
unclear. There are at least four different questions, namely, (a)
What does the Standard actually say about this?; (b) What do people
generally take it to mean?; (c) What did the members of WG14 intend
the Standard to require in this case (and that is probably different
for different members)?; and (d) What semantics are most in keeping
with how the Standard describes the language otherwise? As far as I
know none of these questions has a clear answer, so when you ask about
what is "the correct thing to do", all I can say is it's not clear,
even though I don't know which question you are trying to answer.

All questions seems interesting, but I would add (e) When will the
Standard be fixed ?
By the way, your example doesn't say whether the complete union type
is visible at the point where the function foo() is defined, and this
will make a difference, at least in some cases, in what behavior
people expect for this example.

The union type is visible where the function foo() is defined otherwise
the comment would make no sense.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,575
Members
47,207
Latest member
HelenaCani

Latest Threads

Top