strict aliasing rules in ISO C, someone understands them ?

N

nicolas.riesch

I try to understand strict aliasing rules that are in the C Standard.
As gcc applies these rules by default, I just want to be sure to
understand fully this issue.

For questions (1), (2) and (3), I think that the answers are all "yes",
but I would be glad to have strong confirmation.

About questions (4), (5) and (6), I really don't know. Please help ! !
!

--------

The Standard says (
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf chapter 6.5
):

An object shall have its stored value accessed only by an lvalue
expression that has one of
the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type
of the object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members
(including, recursively, a member of a subaggregate or contained
union), or
- a character type.


***** Question (1) *****

Let's have two struct having different tag names, like:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

The compiler is free to assume that p1 and p2 point to different memory
locations and don't alias.
Two struct having different names are considered to be different types.

In the standard, we read the wording "effective type of the object"
many times.

This "effective type of the object" may be an "int", "double", etc, but
may also be a "struct" type, right ???

And I suppose it may also be an "array" type or an "union" type as
well, is it correct ???


***** Question (2) *****

In the little program that follows, the line "printf("%d\n", *x);"
normally returns 123,
but an optimizing compiler can return garbage instead of 123.
Is my reasoning correct ???

On the other side, the line "printf("%d\n", p1->i);" always returns 999
as expected, right ???

----

#include <stdio.h>
#include <stdlib.h>

struct s1 { int i; double f; };


int main(void)
{
struct s1* p1;
int* x;

p1 = malloc(sizeof(*p1));
p1->i = 123; // object of type 'struct s1' contains 123

x = &(p1->i);

printf("%d\n", *x); // I try to access a value stored in an
object of type 'struct s1'
// through *x which is of type 'int'.
// I think this is not allowed by the
standard !

*x = 999; // I store 999 in *x, which is of type 'int'

printf("%d\n", p1->i); // I access a value stored in *x which is of
type 'int'
// by *p1 ( as p1->i is a shortcut for
(*p1).i )
// which is of type 'struct s1',
// but contains a member of type 'int'.
// I think this is allowed by the standard.


return 0;
}


***** Question (3) *****

The Standard forbids ( if I am not mistaken ) pointer of type "struct A
*" to access data written by a pointer of type "struct B *", as the are
different types.

This means that the common usage of faking inheritance in C like in
this code sniplet is now utterly wrong, is it correct ???


--- myfile.c ---

#include <stdio.h>
#include <stdlib.h>

typedef enum { RED, BLUE, GREEN } Color;

struct Point { int x;
int y;
};

struct Color_Point { int x;
int y;
Color color;
};

struct Color_Point2{ struct Point point;
Color color;
};

int main(int argc, char* argv[])
{

struct Point* p;

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point));
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;

printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point" object using a "struct Point*" pointer is
forbidden by the Standard ???


struct Color_Point2* my_color_point2 = malloc(sizeof(struct
Color_Point2));
my_color_point2->point.x = 100;
my_color_point2->point.y = 200;
my_color_point2->color = RED;

p = (struct Point*)my_color_point2;

printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point2" object using a "struct Point*" pointer is
forbidden by the Standard ???


p = &my_color_point2->point;

printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???


return 0;
}


Is the line "p = (struct Point*)my_color_point" also a case of what is
called "type-punning" ???


***** Question (4) *****

In the Standard, chapter 6.5.2.3, it is written:

One special guarantee is made in order to simplify the use of unions:
if a union contains
several structures that share a common initial sequence (see below),
and if the union
object currently contains one of these structures, it is permitted to
inspect the common
initial part of any of them anywhere that a declaration of the complete
type of the union is
visible. Two structures share a common initial sequence if
corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence
of one or more
initial members.

I find this statement completely obscure.

Let's have:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

A compiler is free to assume that *p1 and *p2 don't alias.

If we just put a union declaration like this before this code, then it
acts like a flag to the compiler, indicating that pointers to "struct
s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
to the same location.

union p1_p2_alias_flag { struct s1 st1;
struct s2 st2;
};

There is no need to use "union p1_p2_alias_flag" for accessing data,
and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
anywhere else.
I mean, it is possible to access data using directly p1 and p2.

Do you agree, everybody ???


***** Question (5) *****

This question is really hard.

Let's have this code sniplet:

---------
#include <stdio.h>

int main (void)
{

struct s1 {int i;
};

struct s1 s = {77};

unsigned char* x = (unsigned char*)&s;
printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
// Standard says data stored in "struct s1" type can be read by pointer
to "char"

x[0] = 100; // here, I write data in "char" objects !!!
x[1] = 101;
x[2] = 102;
x[3] = 103;

printf("%d\n", s.i); // but data stored in "char" objects cannot be
read by pointer to "struct s1" ???

return 0;
}
-----------

For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
(int)x[3]);", I can rewrite the Standard clause like this:

An object [ here, s of type "struct s1" ] shall have its stored value
accessed only by an lvalue expression that has one of
the following types:
[ blah blah blah ]
- a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
it is our case, so everything is OK so far !


But what about the line "printf("%d\n", s.i);" ??????
I read the Standard again and again, but I cannot express how is can
work.
If I rewrite the Standard clause, it gives:

An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
stored value accessed only by an lvalue expression that has one of
the following types:
- a type compatible with the effective type of the object, [ this is
not our case ]
- a qualified version of a type compatible with the effective type
of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to the
effective type of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object, [ still not our
case ]
- an aggregate or union type that includes one of the aforementioned
types among its members [ we read through "s" which is of type "struct
s1", but it does not contain a member of type "char" ]
(including, recursively, a member of a subaggregate or contained
union), or
- a character type. [ definitely not our case ]

We see that none of these conditions applies in our case.

Where is the flaw in my reasoning ???
Does the last "printf" line of this code sniplet work or not ??? and
why ???


***** Question (6) *****

I often see this code used with socket programming:

struct sockaddr_in my_addr;
...
bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));

The function bind(...) needs a pointer to "struct sockaddr", but
my_addr is a "struct sockaddr_in".
So, in my opinion, the function bind is not guaranteed to access safely
the content of object my_addr.

Someone knows why this code is not broken ( or if it is ) ???
 
C

Christian Bau

***** Question (2) *****

In the little program that follows, the line "printf("%d\n", *x);"
normally returns 123,
but an optimizing compiler can return garbage instead of 123.
Is my reasoning correct ???

On the other side, the line "printf("%d\n", p1->i);" always returns 999
as expected, right ???

----

#include <stdio.h>
#include <stdlib.h>

struct s1 { int i; double f; };


int main(void)
{
struct s1* p1;
int* x;

p1 = malloc(sizeof(*p1));
p1->i = 123; // object of type 'struct s1' contains 123

x = &(p1->i);

printf("%d\n", *x); // I try to access a value stored in an
object of type 'struct s1'
// through *x which is of type 'int'.
// I think this is not allowed by the
standard !

*x = 999; // I store 999 in *x, which is of type 'int'

printf("%d\n", p1->i); // I access a value stored in *x which is of
type 'int'
// by *p1 ( as p1->i is a shortcut for
(*p1).i )
// which is of type 'struct s1',
// but contains a member of type 'int'.
// I think this is allowed by the standard.


return 0;
}

This is all ok. The only unusual thing with structs is that there can be
padding, and that storing into any struct member could modify any
padding in the struct. If there is padding between int i and double f,
then p1->i = 123 could modify the padding, while *x = 999 couldn't.

***** Question (3) *****

The Standard forbids ( if I am not mistaken ) pointer of type "struct A
*" to access data written by a pointer of type "struct B *", as the are
different types.

This means that the common usage of faking inheritance in C like in
this code sniplet is now utterly wrong, is it correct ???


--- myfile.c ---

#include <stdio.h>
#include <stdlib.h>

typedef enum { RED, BLUE, GREEN } Color;

struct Point { int x;
int y;
};

struct Color_Point { int x;
int y;
Color color;
};

struct Color_Point2{ struct Point point;
Color color;
};

int main(int argc, char* argv[])
{

struct Point* p;

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point));
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;

This is undefined behavior. There is no guarantee that my_color_point is
correctly aligned for a pointer of type (struct Point *).
printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point" object using a "struct Point*" pointer is
forbidden by the Standard ???

Yes. There is an exception: If the compiler has seen a declaration of a
union with members of type "struct Point" and "struct Color_Point", then
accessing the common members initial members of both structs is legal;
even writing to a member of one struct and reading as a member of
another struct.
struct Color_Point2* my_color_point2 = malloc(sizeof(struct
Color_Point2));
my_color_point2->point.x = 100;
my_color_point2->point.y = 200;
my_color_point2->color = RED;

p = (struct Point*)my_color_point2;

Yes, you can always cast a pointer to struct to a pointer of the first
member.
printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point2" object using a "struct Point*" pointer is
forbidden by the Standard ???

That's fine.
p = &my_color_point2->point;

printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???


return 0;
}

Is the line "p = (struct Point*)my_color_point" also a case of what is
called "type-punning" ???


***** Question (4) *****

In the Standard, chapter 6.5.2.3, it is written:

One special guarantee is made in order to simplify the use of unions:
if a union contains
several structures that share a common initial sequence (see below),
and if the union
object currently contains one of these structures, it is permitted to
inspect the common
initial part of any of them anywhere that a declaration of the complete
type of the union is
visible. Two structures share a common initial sequence if
corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence
of one or more
initial members.

I find this statement completely obscure.

Let's have:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

A compiler is free to assume that *p1 and *p2 don't alias.

Exactly.
If we just put a union declaration like this before this code, then it
acts like a flag to the compiler, indicating that pointers to "struct
s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
to the same location.

union p1_p2_alias_flag { struct s1 st1;
struct s2 st2;
};

There is no need to use "union p1_p2_alias_flag" for accessing data,
and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
anywhere else.
I mean, it is possible to access data using directly p1 and p2.

Yes, that is right.

***** Question (5) *****

This question is really hard.

Let's have this code sniplet:

---------
#include <stdio.h>

int main (void)
{

struct s1 {int i;
};

struct s1 s = {77};

unsigned char* x = (unsigned char*)&s;
printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
// Standard says data stored in "struct s1" type can be read by pointer
to "char"

That is if sizeof (int) >= 4, which is nowhere guaranteed.

x[0] = 100; // here, I write data in "char" objects !!!
x[1] = 101;
x[2] = 102;
x[3] = 103;

printf("%d\n", s.i); // but data stored in "char" objects cannot be
read by pointer to "struct s1" ???

Assuming that sizeof (int) == 4, you have changed exactly every bit in
the representation of x. If the representation is not a trap
representation, you are fine. And it is even ok if for example the
result after storing three bytes, combined with the last remaining byte
of the number 77 were a trap representation, because you never access
that value.


return 0;
}



For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
(int)x[3]);", I can rewrite the Standard clause like this:

An object [ here, s of type "struct s1" ] shall have its stored value
accessed only by an lvalue expression that has one of
the following types:
[ blah blah blah ]
- a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
it is our case, so everything is OK so far !


But what about the line "printf("%d\n", s.i);" ??????
I read the Standard again and again, but I cannot express how is can
work.

If the bytes stored are a valid representation of an int, then that is
what it prints. If not, it is undefined behavior. A specific compiler
might guarantee that int's have no trap representations.
***** Question (6) *****

I often see this code used with socket programming:

struct sockaddr_in my_addr;
...
bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));

The function bind(...) needs a pointer to "struct sockaddr", but
my_addr is a "struct sockaddr_in".
So, in my opinion, the function bind is not guaranteed to access safely
the content of object my_addr.

Someone knows why this code is not broken ( or if it is ) ???

Depends on the declarations of the types involved. And remember that the
C Standard is not the only standard. For example, C Standard doesn't
guarantee that 'a' + 1 == 'b', but if your C implementation uses ASCII
or Unicode for its character set, then the ASCII standard or the Unicode
standard would give you that guarantee.

In your case, it could be that POSIX guarantees that the code is
correct. So it will work on any implementation that conforms to the
POSIX standard (no matter whether it conforms to the C Standard or not),
even though it might not work on an implementation that conforms to the
C Standard but not to POSIX.
 
J

Jack Klein

I try to understand strict aliasing rules that are in the C Standard.
As gcc applies these rules by default, I just want to be sure to
understand fully this issue.

For questions (1), (2) and (3), I think that the answers are all "yes",
but I would be glad to have strong confirmation.

About questions (4), (5) and (6), I really don't know. Please help ! !
!

--------

The Standard says (
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf chapter 6.5
):

An object shall have its stored value accessed only by an lvalue
expression that has one of
the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type
of the object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members
(including, recursively, a member of a subaggregate or contained
union), or
- a character type.


***** Question (1) *****

Let's have two struct having different tag names, like:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

The compiler is free to assume that p1 and p2 point to different memory
locations and don't alias.
Two struct having different names are considered to be different types.

In the standard, we read the wording "effective type of the object"
many times.

This "effective type of the object" may be an "int", "double", etc, but
may also be a "struct" type, right ???

And I suppose it may also be an "array" type or an "union" type as
well, is it correct ???
Yes.

***** Question (2) *****

In the little program that follows, the line "printf("%d\n", *x);"
normally returns 123,
but an optimizing compiler can return garbage instead of 123.

No, an optimizing compiler must still output "123" for this line.
Is my reasoning correct ???

On the other side, the line "printf("%d\n", p1->i);" always returns 999
as expected, right ???

----

#include <stdio.h>
#include <stdlib.h>

struct s1 { int i; double f; };


int main(void)
{
struct s1* p1;
int* x;

p1 = malloc(sizeof(*p1));
p1->i = 123; // object of type 'struct s1' contains 123

x = &(p1->i);

printf("%d\n", *x); // I try to access a value stored in an
object of type 'struct s1'
// through *x which is of type 'int'.
// I think this is not allowed by the
standard !

The effective type of *p1 is 'struct s1'. The effective type of s1.i
is 'int'. 'x' is a pointer to int, and you have initialized it with a
pointer to an int. This is perfectly legal.

Since the int contains the value 123, and 'x' quite properly points to
that int, *x must retrieve the int value 123. It can't do anything
else.
*x = 999; // I store 999 in *x, which is of type 'int'

printf("%d\n", p1->i); // I access a value stored in *x which is of
type 'int'
// by *p1 ( as p1->i is a shortcut for
(*p1).i )
// which is of type 'struct s1',
// but contains a member of type 'int'.
// I think this is allowed by the standard.


return 0;
}


***** Question (3) *****

The Standard forbids ( if I am not mistaken ) pointer of type "struct A
*" to access data written by a pointer of type "struct B *", as the are
different types.

This means that the common usage of faking inheritance in C like in
this code sniplet is now utterly wrong, is it correct ???


--- myfile.c ---

#include <stdio.h>
#include <stdlib.h>

typedef enum { RED, BLUE, GREEN } Color;

struct Point { int x;
int y;
};

struct Color_Point { int x;
int y;
Color color;
};

struct Color_Point2{ struct Point point;
Color color;
};

int main(int argc, char* argv[])
{

struct Point* p;

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point));
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;

printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in

This is undefined behavior, pure and simple. It works on many
implementations, but is not guaranteed at all.

[snip]
Is the line "p = (struct Point*)my_color_point" also a case of what is
called "type-punning" ???

Type punning is not a term defined by the standard, but I would say
that the act of assigning the pointer via a cast is not type punning.
Accessing a member of the foreign structure type through the pointer
is.
***** Question (4) *****

In the Standard, chapter 6.5.2.3, it is written:

One special guarantee is made in order to simplify the use of unions:
if a union contains
several structures that share a common initial sequence (see below),
and if the union
object currently contains one of these structures, it is permitted to
inspect the common
initial part of any of them anywhere that a declaration of the complete
type of the union is
visible. Two structures share a common initial sequence if
corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence
of one or more
initial members.

I find this statement completely obscure.

Let's have:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

A compiler is free to assume that *p1 and *p2 don't alias.

If we just put a union declaration like this before this code, then it
acts like a flag to the compiler, indicating that pointers to "struct
s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
to the same location.

union p1_p2_alias_flag { struct s1 st1;
struct s2 st2;
};

There is no need to use "union p1_p2_alias_flag" for accessing data,
and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
anywhere else.
I mean, it is possible to access data using directly p1 and p2.

It seems unlikely that a compiler could find a way to prevent it from
working in general, even if the implementer tried, but such behavior
would not render the compiler non-conforming.

On the other hand, since your structure only contains a single member,
and the first member always begins at the same address as the
structure itself, this particular usage can't fail.

Still, the behavior is undefined. Which means the language standard
places no requirements on it at all.
Do you agree, everybody ???


***** Question (5) *****

This question is really hard.

Let's have this code sniplet:

---------
#include <stdio.h>

int main (void)
{

struct s1 {int i;
};

struct s1 s = {77};

unsigned char* x = (unsigned char*)&s;
printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
// Standard says data stored in "struct s1" type can be read by pointer
to "char"

x[0] = 100; // here, I write data in "char" objects !!!
x[1] = 101;
x[2] = 102;
x[3] = 103;

The standard does not say that you can do this. You are assuming that
sizeof(int) is at least 4, and there are implementations where that is
not true. Accessing, let alone writing to, x[1], x[2], or x[3] might
be outside the bounds of the int and the struct, producing undefined
behavior.
printf("%d\n", s.i); // but data stored in "char" objects cannot be
read by pointer to "struct s1" ???

return 0;
}

No, the point is that accessing s.i, an int, after storing data into
that memory using a different object type, is undefined. You might
have created a bit pattern that does not represent a valid value for
the int, called a trap representation.
-----------

For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
(int)x[3]);", I can rewrite the Standard clause like this:

An object [ here, s of type "struct s1" ] shall have its stored value
accessed only by an lvalue expression that has one of
the following types:
[ blah blah blah ]
- a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
it is our case, so everything is OK so far !

I have worked on a platform where sizeof(int) is 1, and several where
sizeof(int) is 2. I have never worked on a platform where sizeof(int)
is 3, but C allows it. On any of these platforms you would be
invoking undefined behavior.
But what about the line "printf("%d\n", s.i);" ??????

Even assuming that sizeof(int) >= 4 on your implementation, you have
to understand that all types, other than unsigned char, can have trap
representations, that is bit patterns that do not represent a valid
value for the type. By writing arbitrary bit patterns into an int,
you may have created an invalid bit pattern in that int. When you
access that invalid bit pattern as an int, the behavior is undefined.
I read the Standard again and again, but I cannot express how is can
work.
If I rewrite the Standard clause, it gives:

An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
stored value accessed only by an lvalue expression that has one of
the following types:
- a type compatible with the effective type of the object, [ this is
not our case ]
- a qualified version of a type compatible with the effective type
of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to the
effective type of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object, [ still not our
case ]
- an aggregate or union type that includes one of the aforementioned
types among its members [ we read through "s" which is of type "struct
s1", but it does not contain a member of type "char" ]
(including, recursively, a member of a subaggregate or contained
union), or
- a character type. [ definitely not our case ]

We see that none of these conditions applies in our case.

The standard provides a specific list of what is allowed. Lists like
this are always exhaustive. That means anything on the list is
specifically undefined.
Where is the flaw in my reasoning ???

There is no flaw in your reasoning, the code produces undefined
behavior.
Does the last "printf" line of this code sniplet work or not ??? and
why ???

There is no question of "work". Whatever it does is just as right or
wrong as anything else that might happen as far as the language is
concerned. That's what undefined behavior means. The C standard does
not know or care what happens.
***** Question (6) *****

I often see this code used with socket programming:

struct sockaddr_in my_addr;
...
bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));

The function bind(...) needs a pointer to "struct sockaddr", but
my_addr is a "struct sockaddr_in".
So, in my opinion, the function bind is not guaranteed to access safely
the content of object my_addr.

Someone knows why this code is not broken ( or if it is ) ???

That depends on the definition of 'struct sockaddr_in'. If its first
member is a 'struct sockaddr', the code is legal and well defined
because a pointer to a structure can always be converted to a pointer
to its first member. If not, then the code produces undefined
behavior if the called function actually uses the pointer to access
members of a 'struct sockaddr'.

You use terms like "broken" and "work", which do not really apply as
far as undefined behavior in C is concerned. They are subjective
terms at best. Code is "broken" if it does not do what you want, you
consider it to "work" if it does. If it produces undefined behavior,
it may "work" on one compiler but be "broken" on another, and both
compilers can be standard conforming.
 
S

S.Tobias

Christian Bau said:
[snip]
***** Question (5) *****

This question is really hard.

Let's have this code sniplet:

---------
#include <stdio.h>

int main (void)
{

struct s1 {int i;
};

struct s1 s = {77};

unsigned char* x = (unsigned char*)&s;
printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
// Standard says data stored in "struct s1" type can be read by pointer
to "char"

That is if sizeof (int) >= 4, which is nowhere guaranteed.

x[0] = 100; // here, I write data in "char" objects !!!
x[1] = 101;
x[2] = 102;
x[3] = 103;
Let's suppose that we copy value from another int:
int i = 42;
unsigned char *y = (void*)&i;
assert(sizeof(int) == 4);
x[0] = y[0];
//...etc.
Storing values through character lvalues did not change the effective
type of the struct, or it's member, therefore it's okay (compiler must
reread the value from memory).

Effective type for declared objects is always the declared type.
Effective type for allocated objects is the last imprinted by
storing a value, by copying (memcpy, memmove, char array), or, if
none, is the type of the lvalue it is accessed with.
Assuming that sizeof (int) == 4, you have changed exactly every bit in
the representation of x. If the representation is not a trap
representation, you are fine. And it is even ok if for example the
result after storing three bytes, combined with the last remaining byte
of the number 77 were a trap representation, because you never access
that value.

(all agreed)

[snip]
For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
(int)x[3]);", I can rewrite the Standard clause like this:

An object [ here, s of type "struct s1" ] shall have its stored value
accessed only by an lvalue expression that has one of
the following types:
[ blah blah blah ]
- a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
it is our case, so everything is OK so far !


But what about the line "printf("%d\n", s.i);" ??????
I read the Standard again and again, but I cannot express how is can
work.

It means this: struct s1 object can be legally accessed with a character
lvalue (including writing data to the struct). Since it's legal,
the compiler must take it into consideration when later accessing
struct s1. Either it can prove that character lvalues did not refer
to the struct object, or it must re-read the struct value from memory.

This is not the case with other types:
assert(sizeof(int) == sizeof(short))
int i = 42;
short *ps = &i; //assume that alignment is the same
*ps = 54; //this access is UB; since it is not legal to access int object
//with short lvalue, compiler need not assume that object `i'
//was actually changed
printf("%d\n", i); //may print cached value 42
//(the Std says it can do or not do virtually anything)

For another example: when a value is stored through `short' lvalue,
the compiler need not assume that `struct s1' object was changed,
because `struct s1' does not contain a `short' member.
 
S

S.Tobias

Christian Bau said:
What's more important: `p1->i' and `p2->i' don't alias, despite that they
have the same type!

However p1 and p2 _may_ point at the same object.
((char*)p1)[0] = 0;
At this point the compiler cannot blindly assume that `*p2' wasn't modified.

(As I said above, they may point to the same location.)

After the compiler sees the union declaration, it is obliged to assume
that `p1->i' and `p2->i' may refer to (alias) the same object.
(However, it still need not assume that expressions `*p1' and `*p2' alias
the same object, since they are incompatible types).
 
D

Dik T. Winter

With a caveat. It is free to assume that as long as nothing is assigned
to either p1 or p2.
> However p1 and p2 _may_ point at the same object.

In that case the compiler can not assume that *p1 and *p2 don't alias.
 
S

S.Tobias

Dik T. Winter said:
struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

A compiler is free to assume that *p1 and *p2 don't alias.
[snip]
However p1 and p2 _may_ point at the same object.

In that case the compiler can not assume that *p1 and *p2 don't alias.

I don't agree, otherwise aliasing rules would have no purpose.
Since `*p1' and `*p2' have incompatible types, the compiler may assume
(act as if) they don't refer to the same object, it doesn't have to prove
that both pointers don't point at the same location.
I believe that the compiler even needn't assume that these two alias
the same object:
*p1
*(struct s2 *)p1
The decision whether to alias or not to alias can be based on
the type of lvalue (mainly).

Can you give an example where `*p1' and `*p2' alias the same object
while the behaviour is defined? (...And where the aliasing is actually
relevant, eg.: `&*p1' and `&*p2' doesn't count.)
Perhaps reading from allocated and separately initialized object, but
this is not a situation when aliasing rules are very important.
 
D

Dik T. Winter

> Dik T. Winter said:
> > In article said:
> > > >> struct s1 {int i;};
> > > >> struct s2 {int i;};
> > > >>
> > > >> struct s1 *p1;
> > > >> struct s2 *p2;
> > > >>
> > > >> A compiler is free to assume that *p1 and *p2 don't alias. > [snip]
> > > However p1 and p2 _may_ point at the same object.
> >
> > In that case the compiler can not assume that *p1 and *p2 don't alias.
>
> I don't agree,

Sorry, I missed that p1 and p2 have different types. Indeed, p1 and p2
_may_ point at the same object, but the only way to let that happen is
by either undefined or implementation defined behaviour. So you were
right.
 
T

Thad Smith

Christian said:
--- myfile.c ---

#include <stdio.h>
#include <stdlib.h>

typedef enum { RED, BLUE, GREEN } Color;

struct Point { int x;
int y;
};

struct Color_Point { int x;
int y;
Color color;
};

struct Color_Point2{ struct Point point;
Color color;
};

int main(int argc, char* argv[])
{

struct Point* p;

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point));
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;


This is undefined behavior. There is no guarantee that my_color_point is
correctly aligned for a pointer of type (struct Point *).

Doesn't the fact that the value of my_color_point was returned by malloc
guarantee correct alignment?

Thad
 
C

Christian Bau

Thad Smith said:
Christian said:
--- myfile.c ---

#include <stdio.h>
#include <stdlib.h>

typedef enum { RED, BLUE, GREEN } Color;

struct Point { int x;
int y;
};

struct Color_Point { int x;
int y;
Color color;
};

struct Color_Point2{ struct Point point;
Color color;
};

int main(int argc, char* argv[])
{

struct Point* p;

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point));
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;


This is undefined behavior. There is no guarantee that my_color_point is
correctly aligned for a pointer of type (struct Point *).

Doesn't the fact that the value of my_color_point was returned by malloc
guarantee correct alignment?

In this case, yes.

If you use

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point) * 2);
++my_color_point;
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;

you get undefined behavior.
 
O

Old Wolf

Christian said:
struct Point { int x;
int y;
};

struct Color_Point { int x;
int y;
Color color;
};
int main(int argc, char* argv[])
{

struct Point* p;
p = (struct Point*)my_color_point;

This is undefined behavior. There is no guarantee that
my_color_point is correctly aligned for a pointer of type
(struct Point *).

I think all structs must have the same alignment requirements.
However there is UB because one struct might have different
padding to the other.
 
T

Tim Rentsch

Jack Klein said:
On 13 Oct 2005 07:39:48 -0700, (e-mail address removed) wrote in
comp.lang.c: [snip]
***** Question (4) *****

In the Standard, chapter 6.5.2.3, it is written:

One special guarantee is made in order to simplify the use of unions:
if a union contains
several structures that share a common initial sequence (see below),
and if the union
object currently contains one of these structures, it is permitted to
inspect the common
initial part of any of them anywhere that a declaration of the complete
type of the union is
visible. Two structures share a common initial sequence if
corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence
of one or more
initial members.

I find this statement completely obscure.

Let's have:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

A compiler is free to assume that *p1 and *p2 don't alias.

If we just put a union declaration like this before this code, then it
acts like a flag to the compiler, indicating that pointers to "struct
s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
to the same location.

union p1_p2_alias_flag { struct s1 st1;
struct s2 st2;
};

There is no need to use "union p1_p2_alias_flag" for accessing data,
and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
anywhere else.
I mean, it is possible to access data using directly p1 and p2.

It seems unlikely that a compiler could find a way to prevent it from
working in general, even if the implementer tried, but such behavior
would not render the compiler non-conforming.

On the other hand, since your structure only contains a single member,
and the first member always begins at the same address as the
structure itself, this particular usage can't fail.

Still, the behavior is undefined. Which means the language standard
places no requirements on it at all.

It isn't clear what behavior you think is undefined, since
what is supposed to be executed is stated only approximately.
However, let's consider a particular example:

struct s1 {int i; int j;};
struct s2 {int x; int y;};

union p1_p2_alias_flag {
struct s1 st1;
struct s2 st2;
};

int
affected_function( struct s1 *p1, struct s2 *p2 ){
p1->j = 3;
p2->y = 4;
return p1->j;
}


There is no undefined behavior in 'affected_function'.
Moreover, there are legal calls to the function that must
return '4' as a value.

Of course, it is possible to choose argument values (such as
NULL) for calls to the function that result in undefined
behavior; but the function must work for the legal cases
when the two pointers point to the same address. And I
think that's what the OP was asking about.
 
N

nicolas.riesch

Thank you very much, all of you, for having taken the time to answer my
quite confused questions.
I understand now that my interpretation of the standard was totally
wrong.

For those who will have problems with these aliasing rules and will
read this thread, this is my final interpretation of the standard.
I hope this time, I have made no mistake ( but else, tell me ).

The Standard says (
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf chapter 6.5
):

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of
the object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members
(including, recursively, a member of a subaggregate or contained
union), or
- a character type.

The wording of these rules is not very clear, but this is my tentative
of explanation.

Vocabulary:
an object is a memory location.
an aggregate is a struct or an array.
a character type can be "char", "signed char", or "unsigned char".


Let's have this code:

struct s1 {int i; double d;};
struct s2 {int i; double d;};
// struct s1 and struct s2 are different types, because their tag
names s1 and s2 are different.

int *pi;
struct s1 *p1;
struct s1 *p1_a;
struct s2 *p2;


1) The "objects" which are mentioned in the Standard are really just
memory locations.
So far, there is NO NOTION OF POINTERS at all.
Pointers are just a means of accessing the objects, no more.
You just take a sheet of paper ( which represents your computer's
memory ), and draw rectangles symbolizing all the objects you work with
in your code.
Let's suppose that in our computer memory, we have one location
containing an int, three instances of struct s1 and two of struct s2.


You should obtain something like this ( rectangles are represented
here by pairs of brackets [...] ) :

[int]
[ struct s1 [int] [double] ]
[ struct s1 [int] [double] ]
[ struct s1 [int] [double] ]
[ struct s2 [int] [double] ]
[ struct s2 [int] [double] ]

So far, we have a visual representation of all the object we work
with.
We have 16 objects on your paper:
- one "int" object
- three "struct s1" objects
- each struct s1 objet contains an "int" object
- each struct s1 objet contains a "double" object
- two "struct s2" objects
- each struct s2 objet contains an "int" object
- each struct s2 objet contains a "double" object

For accessing these objects, we use these pointers in our code:

pi, p1, p1_a, p2

Our work will be now to find for each object (=location) which
pointers may access it.


2) I take the visual representation hereabove, and I just write (obj1)
(obj2) ... to represent the objects so that I can explain more easily.

[int (obj1)]
[ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
[ struct s1 (obj5) [int (obj6)] [double (obj7)] ]
[ struct s1 (obj8) [int (obj9)] [double (obj10)] ]
[ struct s2 (obj11) [int (obj12)] [double (obj13)] ]
[ struct s2 (obj14) [int (obj15)] [double (obj16)] ]

Now, let's take each location one after another and see which
pointers may also access them.

The object (=memory location) obj1 is of type "int".
It can be accessed (= read or modified) by *pi, which is a lvalue of
type "int".
It can also be accessed by p1->i, which is a shortcut for (*p1).i,
and *p1 is of type "struct s1 containing "int" as a member".
It can similarly be accessed by p1_a->i and p2->i.

The object obj2 is of type "struct s1".
It can be accessed by *p1_a which is also of type "struct s1".
It cannot be accessed by *pi which is of type "int".
It cannot be accessed by *p2 which is of type "struct s2".

The object obj3 is of type "int".
It can be accessed by *pi which is of type "int".
It can be accessed by p1->i, which is a shortcut for (*p1).i, and
*p1 is of type "struct s1 containing "int" as a member".
The same way, it can be accessed by p1_a->i.
But it cannot be accessed by p2->i, which is a shorcut for (*p2).i,
because *p2 is of type "struct s2 containing "int" as a member"."
It is not explicitly mentioned in the standard, but if access is
done through a struct, its type must match the type of the container of
the object we want to access.

We can do similar analysis for all the remaining locations obj4,
obj5 ...

Just one word about my misunderstanding of the Standard as I first read
it.
At first, I tried to find directly if two pointers may alias, but I was
the wrong way to do and leads to a dead end.
I understand now that it is easier to think first about MEMORY
LOCATIONS (=objects), AND ONLY THEN think about which pointers may
access this location, by seeing if they comply with the rules of the
Standard, as I just did hereabove.
This gives for each location a set of pointers that may access it, and
the compiler considers each of these sets as pointers that may alias
and access the same object.
This way, the Standard becomes more readable and logical.

In practice, the problem is often not to do this thorough analysis for
each object in memory.
It is more of the kind "I work with this object, can I access it with
this pointer ? and can I also access it with this other pointer ?".
"In particular, if I write data in this object using this pointer, can
this other pointer read these data ?"


*** about type-punning ***

double d = 1.234;
int* i = &d;

printf("%d\n", *i); // WRONG

1.234 is stored in an object of type "double".
We try to access it through *i, which is of type "int".
The result is undefined.

If you want to inspect the content of d ( assuming that a double is 4
bytes long and beeing aware about possible trap representations ), you
can do this:
unsigned char* c = (unsigned char*)&d;
and you can access the data with c[0], c[1], c[2] and c[3].


*** about pointer to char ***

Besides, don't forget that as the Standard rule says, a pointer to char
can access any object of any type !
When the location referenced by a pointer to char is updated, the
compiler must assume that any data stored in any type may have been
modified.

But don't think that this kind of code allows you to bypass the
aliasing rules:

struct A *a;
struct B *b;

b = (struct B*)(char*)a;

This won't make "*b" able to access data in "struct A", because "*b" is
of type "struct B".
It is the type of the dereferenced pointer that matters. The
intermediate casting to "char*" is thus totally useless and won't give
"b" more access possibilities.


*** about inheritance ***

struct Point { int x;
int y;
};


struct Color_Point { int x;
int y;
Color color;
};


struct Color_Point2{ struct Point point;
Color color;
};

struct Point* p;
struct Color_Point* my_color_point;
struct Color_Point2* my_color_point2;

my_color_point = malloc(sizeof(struct Color_Point));
my_color_point2 = malloc(sizeof(struct Color_Point2));

p = (struct Point*)my_color_point; // WRONG
// *p, which is of type "struct Point", cannot access data stored at
location *my_color_point, which is an object of type "struct
Color_Point".

p = &my_color_point2->point; // GOOD
// *p, which is of type "struct Point", can access data stored at
location (*my_color_point2).point, which is also of type "struct
Point".

p = (struct Point*)my_color_point2; // GOOD
// *p, which is of type "struct Point", can access data stored at
location (*my_color_point2).point, which is also of type "struct
Point".
// We see that in fact, this is exactly the same case as the
previous one !
// C gives the guarantee that we can cast the pointer to a struct to
the type of its first member, it gives a pointer to this first member
object.
// Just notice that this guarantee is about alignment, and that the
fact that we can access data stored in an object is granted to us by
the aliasing rules, exactly as in the previous example.


*** final word ***

When working with pointers, there seems to be no need to cast pointers.
( I don't speak here of casting objects, like casting a "double" to an
"int" for instance, which is of course allowed.
It is casting pointers, like "double*" to "int*" or "struct s1*" to
"struct s2*" which is dangerous. )
In fact, every time a pointer is cast to point to a different type, the
alias rules interfere and lead to undefined behaviour.
So, to avoid any aliasing problem, the best way seems never to cast
pointers, with these two exceptions:

a) cast a pointer to char*, so that it can access the byte
representation of the object ( cast to unsigned char* is best ), as
allowed by the aliasing rules.

b) cast of a pointer to struct to a pointer to its first member type,
like in the last example "p = (struct Point*)my_color_point2;".
( but this one is not really necessary, as we can just pass the
address of the first member as in the last example "p =
&my_color_point2->point;", so that a cast is avoided ).


As for pointers to void, such as those returned by malloc, there is no
need to cast them, as pointers to void may be assigned to and from
pointers to any type.


Any suggestion about something I could have missed or misunderstood ?


Best regards
 
N

Netocrat

On Wed, 19 Oct 2005 00:17:33 -0700, nicolas.riesch wrote:

A few corrections but generally what you wrote was accurate.
struct s1 {int i; double d;};
struct s2 {int i; double d;};
// struct s1 and struct s2 are different types, because their tag
names s1 and s2 are different.

They are of the same "effective type".

....
struct s2 *p2; ....
[int (obj1)]
[ struct s1 (obj2) [int (obj3)] [double (obj4)] ] ....
The object obj2 is of type "struct s1". ....
It cannot be accessed by *p2 which is of type "struct s2".

Actually it can be, since s2 has "a type compatible with the effective
type of" s1.
The object obj3 is of type "int". ....
But it cannot be accessed by p2->i, which is a shorcut for (*p2).i,
because *p2 is of type "struct s2 containing "int" as a member"."

Same applies here.

[...]
*** about inheritance ***

struct Point { int x;
int y;
};


struct Color_Point { int x;
int y;
Color color;
};


struct Color_Point2{ struct Point point;
Color color;
};

struct Point* p;
struct Color_Point* my_color_point;
struct Color_Point2* my_color_point2;

my_color_point = malloc(sizeof(struct Color_Point)); my_color_point2 =
malloc(sizeof(struct Color_Point2));

Your analysis being based on malloc'd memory is flawed - malloc'd memory
is properly aligned for any object and until it is written to, it has no
effective type. So let's assume instead that you'd caused the pointers to
reference static objects of the type that they point to or that your code
has written such an object into the malloc'd memory to establish its
effective type.
p = (struct Point*)my_color_point; // WRONG // *p, which is of type
"struct Point", cannot access data stored at location *my_color_point,
which is an object of type "struct Color_Point".

But this code is not accessing data, it's setting a pointer. Since struct
Color_Point's initial elements are those of struct Point in the same
order, it can't have stricter alignment requirements. There's nothing
wrong with the code.

With the above assumption that my_color_point points to an object with the
effective type struct Color_Point, it is "wrong" to try to access p->y,
but not to access p->x. This is because the first member of a structure
is always at the same (initial, unpadded) location, but the second member
may be preceded by an arbitrary amount of padding. In practice it's
unlikely that a compiler that would precede y with different amounts of
padding in each structure type, but in theory it's possible.

[...]
the best way seems never to cast pointers, with these two exceptions:
[to char* so as to access an object's bytes; to a pointer type compatible
with the first member(s) of a struct]

That's good practice (there are occasional other exceptions).

[...]
 
S

S.Tobias

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of
the object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members
(including, recursively, a member of a subaggregate or contained
union), or
This one means that an object of type int may be accessed through
a (bigger) struct type that contains an int member.
struct s { double d; int i; };
void f(int *pi, struct s *ps)
{
*pi;
*ps = /*...*/;
*pi; /* must be re-read from memory */
}
- a character type.

Yes, but remember that this is not the whole story. They are type-based
aliasing rules, and there are other (expression-based) rules, too.

For example:
struct sx { int x; } *px;
struct sy { int y; } *py;
void *pv = malloc(...);
px = pv; py = pv;
*px = ...;
py->y; //BAD, object does not have `y' member

Example 2:
int ai[2][2];
ai[0][2]; //BAD, this is not the same as ai[1][1]

....
struct s1 {int i; double d;};
struct s2 {int i; double d;};
// struct s1 and struct s2 are different types, because their tag
names s1 and s2 are different.

int *pi;
struct s1 *p1;
struct s1 *p1_a;
struct s2 *p2;

For accessing these objects, we use these pointers in our code:
pi, p1, p1_a, p2

Our work will be now to find for each object (=location) which
pointers may access it.
Your hooked onto a bad terminology. What matters is the type of
lvalue. Lvalue is like a window though which you access an object,
a pointer is like an arrow. You don't access objects with pointers.
Pointers merely may be part of expressions that eventually may
be lvalues. Objects are not locations, but are memory ranges.

[int (obj1)]
[ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
[ struct s1 (obj5) [int (obj6)] [double (obj7)] ]
[ struct s1 (obj8) [int (obj9)] [double (obj10)] ]
[ struct s2 (obj11) [int (obj12)] [double (obj13)] ]
[ struct s2 (obj14) [int (obj15)] [double (obj16)] ]

Now, let's take each location one after another and see which
pointers may also access them.

The object (=memory location) obj1 is of type "int".
It can be accessed (= read or modified) by *pi, which is a lvalue of
type "int". Yes.
It can also be accessed by p1->i, which is a shortcut for (*p1).i,
and *p1 is of type "struct s1 containing "int" as a member".
No, `obj1' doesn't have member `i' (in fact, it's not a struct at all).
It can similarly be accessed by p1_a->i and p2->i. Idem.


The object obj2 is of type "struct s1".
It can be accessed by *p1_a which is also of type "struct s1". Right.
It cannot be accessed by *pi which is of type "int".
It can. One of its member (obj3) is `int' type, so that one can
be accessed, which means that the containing object can be accessed
as well (when you access a member, you access the whole object, too).
It cannot be accessed by *p2 which is of type "struct s2".
Indeed, it can't.
The object obj3 is of type "int".
It can be accessed by *pi which is of type "int".
It can be accessed by p1->i, which is a shortcut for (*p1).i, and
*p1 is of type "struct s1 containing "int" as a member".
The same way, it can be accessed by p1_a->i.
Yes. Moreover, it can be accessed with an expression `*p1' (IOW, that
expression may read value, or change the subobject), provided that
`p1' points at the right location (obj2).
But it cannot be accessed by p2->i, which is a shorcut for (*p2).i,
because *p2 is of type "struct s2 containing "int" as a member"."
[assuming that `p2' may point to obj2]
No, this is because `struct s1' (which is the type of obj2) does not have
`s2::i' member (sorry for C++ notation; struct members have their own
namespace for each struct type).
It is not explicitly mentioned in the standard, but if access is
done through a struct, its type must match the type of the container of
the object we want to access.
It is mentioned at the member access operators. If it weren't, nobody
whould argure this.

Just one word about my misunderstanding of the Standard as I first read
it.
At first, I tried to find directly if two pointers may alias, but I was
the wrong way to do and leads to a dead end.
Again: pointers don't alias, lvalues may...
I understand now that it is easier to think first about MEMORY
LOCATIONS (=objects), AND ONLY THEN think about which pointers may
access this location, by seeing if they comply with the rules of the
Standard, as I just did hereabove.
Pointers may or may not point to locations, which is covered by
different rules.
This gives for each location a set of pointers that may access it, and
the compiler considers each of these sets as pointers that may alias
and access the same object.
This way, the Standard becomes more readable and logical.

In practice, the problem is often not to do this thorough analysis for
each object in memory.
It is more of the kind "I work with this object, can I access it with
this pointer ? and can I also access it with this other pointer ?".
"In particular, if I write data in this object using this pointer, can
this other pointer read these data ?"
Again: what matters is the EXPRESSION, not pointers that may be
one of its components.
*** about type-punning ***

double d = 1.234;
int* i = &d; The last one is suspicious.

printf("%d\n", *i); // WRONG
Right. Technically, it's UB.
unsigned char* c = (unsigned char*)&d;
and you can access the data with c[0], c[1], c[2] and c[3]. Right


*** about pointer to char ***

Besides, don't forget that as the Standard rule says, a pointer to char
can access any object of any type !
Pointer to character type may *point to* any object (of any type).
So can pointer to void.
When the location referenced by a pointer to char is updated, the
compiler must assume that any data stored in any type may have been
modified.
No, when an object is modified though an *lvalue of character type*,
then compiler must assume anything might have been modified (unless
it can prove otherwise).
But don't think that this kind of code allows you to bypass the
aliasing rules:

struct A *a;
struct B *b;

b = (struct B*)(char*)a; The struct cast is suspicious.

This won't make "*b" able to access data in "struct A", because "*b" is
of type "struct B".
It is the type of the dereferenced pointer that matters.
More-or-less, yes.

*** final word ***

When working with pointers, there seems to be no need to cast pointers.
( I don't speak here of casting objects, like casting a "double" to an
"int" for instance, which is of course allowed.
It is casting pointers, like "double*" to "int*" or "struct s1*" to
"struct s2*" which is dangerous. )
No, casting is sometimes necessary (where there's no implicit conversion),
and is always safe where conversion is well defined.
In fact, every time a pointer is cast to point to a different type, the
alias rules interfere and lead to undefined behaviour.

No, aliasing rules have to do with lvalues. Period. End of story.

Pointers (in the way you talk about them) are subject to conversion rules.
 
N

Netocrat

On Wed, 19 Oct 2005 00:17:33 -0700, nicolas.riesch wrote:

A few corrections but generally what you wrote was accurate.


They are of the same "effective type".

OK my reading of the standard was incomplete - they're not of the same
effective type after all. Your original statement and the follow-ons that
I mistakenly corrected stand.

[...]
But this code is not accessing data, it's setting a pointer. Since struct
Color_Point's initial elements are those of struct Point in the same
order, it can't have stricter alignment requirements. There's nothing
wrong with the code.

With the above assumption that my_color_point points to an object with the
effective type struct Color_Point, it is "wrong" to try to access p->y,
but not to access p->x.

....but in the context of aliasing, yes, it's not guaranteed that you will
get the expected value when reading p->x.
 
N

nicolas.riesch

S. Thobias said:
[int (obj1)]
[ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
[ struct s1 (obj5) [int (obj6)] [double (obj7)] ]
[ struct s1 (obj8) [int (obj9)] [double (obj10)] ]
[ struct s2 (obj11) [int (obj12)] [double (obj13)] ]
[ struct s2 (obj14) [int (obj15)] [double (obj16)] ]
The object (=memory location) obj1 is of type "int".
It can be accessed (= read or modified) by *pi, which is a lvalue of
type "int".
It can also be accessed by p1->i, which is a shortcut for (*p1).i,
and *p1 is of type "struct s1 containing "int" as a member".
No, `obj1' doesn't have member `i' (in fact, it's not a struct at all).

I was having this example in mind, in fact:

struct s1 mys1;

struct s1* p1 = &mys1;
int* pi = &mys1.i;

(*p1).i = 123;

printf("%d\n", *pi); // we read here the value of *pi, which is of
type "int",
// which has been written in the previous line
// by using *p1 which is of type "struct s1"


Again: pointers don't alias, lvalues may...

Absolutely, I must never forget that.

Pointer to character type may *point to* any object (of any type).
So can pointer to void.

Yes, pointer to void can *point to* any object, but it cannot be
dereferenced, so it cannot *access* it.

No, when an object is modified though an *lvalue of character type*,
then compiler must assume anything might have been modified (unless
it can prove otherwise).

Expressing it this way is better, yes.


And thank you very much for your comment.
I still must read it carefully until I am sure to understand
everything.
 
N

Netocrat

(e-mail address removed) wrote: [...]
It is not explicitly mentioned in the standard, but if access is
done through a struct, its type must match the type of the container of
the object we want to access.
It is mentioned at the member access operators. If it weren't, nobody
whould argure this.

Is this an area where the draft and final version differ? I see no
mention of it in N869's "6.5.2.3 Structure and union members" which is the
section to which I presume you're referring.

[...]
 
N

nicolas.riesch

(e-mail address removed) wrote: [...]
It is not explicitly mentioned in the standard, but if access is
done through a struct, its type must match the type of the container of
the object we want to access.
It is mentioned at the member access operators. If it weren't, nobody
whould argure this.

I see my explanation was unclear.
Here is the reason why obj3 cannot be accessed by p2->i :

p2->i is a shorcut for (*p2).i, and *p2 is of type "struct s2
containing "int" as a member"."

So far, one could think that *p2 could access obj3, as no rule seems to
forbid it.
But the Standard doesn't say that a lvalue complying to these rules CAN
also access the object (it only MAY, and sometimes, it even CANNOT for
other reasons).
Here, the answer is that obj3 is included in a "struct s1", which is in
a different location from any "struct s2" object because they are
different types.
So, a pointer to any "struct s2" OR TO ANY OF ITS MEMBERS cannot access
any location of a "struct s1".
 
S

S.Tobias

Netocrat said:
(e-mail address removed) wrote: [...]
It is not explicitly mentioned in the standard, but if access is
done through a struct, its type must match the type of the container of
the object we want to access.
It is mentioned at the member access operators. If it weren't, nobody
whould argure this.

Is this an area where the draft and final version differ?
In the relevant parts - no. (In p.5 the first sentence has been dropped,
and the rest differs by one letter.)
I see no
mention of it in N869's "6.5.2.3 Structure and union members" which is the
section to which I presume you're referring.
My bad, sorry. It's not explicitly mentioned, but can be derived.
Pp. 3 and 4 refer to a "member of a structure or union object"; it means
the operator (and behaviour) is defined iff the _object_ has the specified
member. (However, I decline to explain what exactly it should be; I think
the Std means the effective type of the object; it's one of the questions
on my list to c.s.c.)

Anyway, if the Std text is not enough, then at least Example 3 shows
the intention; if it were allowed (in the example) to access `t1::m'
with `p2->m' (or vv.), then the second part of the example would
be moot, as well as the "special guarantee" of p. 5 would.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top