Is the aliasing rule symmetric?

  • Thread starter Johannes Schaub (litb)
  • Start date
J

Johannes Schaub (litb)

Posting my SO question to usenet:

Hello all. I had a discussion with someone on IRC and this question turned
up. We are allowed by the Standard to change an object of type `int` by a
`char` lvalue.

int a;
char *b = (char*) &a;
*b = 0;

Would we be allowed to do this in the opposite direction, if we know that
the alignment is fine?

The issue I'm seeing is that the aliasing rule does not cover the simple
case of the following, if one considers the aliasing rule as a non-symmetric
relation

int a;
a = 0;

The reason is, that each object contains a sequence of `sizeof(obj)`
`unsigned char` objects (called the "object representation"). If we change
the `int`, we will change some or all of those objects. However, the
aliasing rule only states we are allowed to change a `int` by an `char` or
`unsigned char`, but not the other way around. Another example

int a[1];
int *ra = a;
*ra = 0;

Only one direction is described by 3.10/15 ("An aggregate or union type that
includes..."), but this time we need the other way around ("A type that is
the element or non-static data member type of an aggregate...").

Is the other direction implied?
 
J

James Kanze

Posting my SO question to usenet:
Hello all. I had a discussion with someone on IRC and this
question turned up. We are allowed by the Standard to change
an object of type `int` by a `char` lvalue.
int a;
char *b = (char*) &a;
*b = 0;

Up to a point. Formally, you have undefined behavior is you try
to access the value of a after this. Practically, the value of
a will depend on the architecture: setting a to 42, then doing
the modification above, will give different results on a Sparc
than on a PC.
Would we be allowed to do this in the opposite direction, if
we know that the alignment is fine?

Formally, no. Practically, if the alignment and the size are
sufficent, you should get the same results as a memcpy.
The issue I'm seeing is that the aliasing rule does not cover the simple
case of the following, if one considers the aliasing rule as a non-symmetric
relation
int a;
a = 0;

I'm not sure I understand.
The reason is, that each object contains a sequence of `sizeof(obj)`
`unsigned char` objects (called the "object representation").

I'm not sure that "contains" is the right word. Each complete
object lives in a sequence of sizeof(obj) bytes of raw memory.
If we change the `int`, we will change some or all of those
objects. However, the aliasing rule only states we are allowed
to change a `int` by an `char` or `unsigned char`, but not the
other way around. Another example

The standard doesn't quite say that. It says that access to the
stored value of an object must be through an lvalue of the
object type, or through an lvalue of a char or unsigned char
type (plus a few other cases which don't concern us here).
int a[1];
int *ra = a;
*ra = 0;
Only one direction is described by 3.10/15 ("An aggregate or union type that
includes..."), but this time we need the other way around ("A type that is
the element or non-static data member type of an aggregate...").
Is the other direction implied?

Certainly not. On some rare machines, int's may have trap
values, and the values in the bytes in an array of unsigned char
may correspond to one of those values. Replace int by float,
and most of the machines I know do have trap values.

There is one important exception: if you memcpy bytes out of
some type, you can memcpy those bytes back into an object of
that type, and you're guaranteed to get the same value, i.e.:

float in = 3.14159;
float out;
unsigned char buf[sizeof(float)];
memcpy(buf, &in, sizeof(float));
memcpy(&out, buf, sizeof(float));
std::cout << out << std::endl;

is guaranteed to output 3.14159, but

float out;
unsigned char buf[sizeof(float)];
random_fill(buf, buf + sizeof(float));
memcpy(&out, buf, sizeof(float));
std::cout << out << std::endl;

is undefined behavior, and might even crash.
 
J

Joshua Maurice

Posting my SO question to usenet:

Hello all. I had a discussion with someone on IRC and this question turned
up. We are allowed by the Standard to change an object of type `int` by a
`char` lvalue.

    int a;
    char *b = (char*) &a;
    *b = 0;

Would we be allowed to do this in the opposite direction, if we know that
the alignment is fine?

The issue I'm seeing is that the aliasing rule does not cover the simple
case of the following, if one considers the aliasing rule as a non-symmetric
relation

    int a;
    a = 0;

The reason is, that each object contains a sequence of `sizeof(obj)`
`unsigned char` objects (called the "object representation"). If we change
the `int`, we will change some or all of those objects. However, the
aliasing rule only states we are allowed to change a `int` by an `char` or
`unsigned char`, but not the other way around. Another example

    int a[1];
    int *ra = a;
    *ra = 0;

Only one direction is described by 3.10/15 ("An aggregate or union type that
includes..."), but this time we need the other way around ("A type that is
the element or non-static data member type of an aggregate...").

Is the other direction implied?

I've done my best to educate myself on this topic, and as far as I can
tell, there is no consensus on some of the finer details. I've had a
thread up on comp.std.c++ for a few weeks no, with no replies.

In short, this is what I suggest:

When:
1- a constructor is called on a piece of memory (such as with operator
new, and placement new), or
2- you make a write to a piece of storage through an lvalue of a type
X with trivial initialization,
this ends the lifetime of any object which occupied that storage (and
any sub-object thereof), and it starts the lifetime of a new object of
type X at that storage. (Do not confuse "ending lifetime" with
"calling destructor". The destructors will not be called, but the
lifetimes of the objects will still end.) In practice, this is
basically required for all C code to work.

Any attempt to read an object through an lvalue of a type
"sufficiently different" than the effective type of the object (such
as reading an int object through a short lvalue) is undefined
behavior. An explicit exception is that you're always allowed to read
from or write to an object through a char or unsigned char lvalue.
Actually, this char and unsigned char allowance may be restricted to
POD types - I'm not quite sure, and no one else is either.

Thus, the following program does not have undefined behavior:
int main()
{ int a;
short* b = reinterpret_cast<short*>(&a);

a = 1;
*b = 2;
a = 3;
return a;
}

The following program does have undefined behavior.
int main()
{ int a;
short* b = reinterpret_cast<short*>(&a);

a = 1;
*b = 2;
//a = 3;
return a; //reading short object through int lvalue
}

Thus, try not to think about it as an aliasing rule. Think about it as
a rule which restricts the types of lvalues for which you may legally
access objects. This then allows a compiler to do optimization based
on aliasing analysis.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,816
Latest member
nipsseyhussle

Latest Threads

Top