[union] Pointers to inherited structs are valid ?

  • Thread starter Maciej Labanowicz
  • Start date
M

Maciej Labanowicz

Hi,

Please analyze following example:

/*--[beg:test.c]-------------------------------------------------*/
01:
02: #include <stdio.h> /* printf */
03: #include <stdlib.h> /* EXIT_SUCCESS */
04:
05: struct a_s { int x; };
06: struct b_s { struct a_s super; int y; };
07: struct c_s { struct b_s super; int z; };
08:
09: union common_u {
10: struct a_s * ptr_a;
11: struct b_s * ptr_b;
12: struct c_s * ptr_c;
13: };
14:
15: int main(void)
16: {
17: struct c_s c;
18: union common_u common;
19:
20: ((struct a_s *)(&c))->x = 5;
21: ((struct b_s *)(&c))->y = 6;
22: c.z = 7;
23:
24: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
25:
26: common.ptr_c = &c;
27: common.ptr_c->z += 10;
28:
29: common.ptr_a->x += 20;
30: common.ptr_b->y += 30;
31:
32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
33:
34: return EXIT_SUCCESS;
35: }
/*--[eof:test.c]-------------------------------------------------*/

/*--[beg:eek:utput]-------------------------------------------------*/
01: x=5,y=6,z=7
02: x=25,y=36,z=17
/*--[eof:eek:utput]-------------------------------------------------*/

There are structs that implements inheritance of members:

a_s
|
+b_s
|
+c_s

So, casts in lines 20,21 are valid in C.

'union common_u' contains pointers to all of those structs.

Line 26 contains assignment of address of 'c' (leaf in the tree) to
union member: ptr_c.

So 'common.ptr_c' pointer is valid (line 27 is correct).

Question is:
Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
common.ptr_b and common.ptr_a (lines: 29,30)

Best Regards
 
B

Barry Schwarz

Hi,

Please analyze following example:

/*--[beg:test.c]-------------------------------------------------*/
01:
02: #include <stdio.h> /* printf */
03: #include <stdlib.h> /* EXIT_SUCCESS */
04:
05: struct a_s { int x; };
06: struct b_s { struct a_s super; int y; };
07: struct c_s { struct b_s super; int z; };
08:
09: union common_u {
10: struct a_s * ptr_a;
11: struct b_s * ptr_b;
12: struct c_s * ptr_c;
13: };
14:
15: int main(void)
16: {
17: struct c_s c;
18: union common_u common;
19:
20: ((struct a_s *)(&c))->x = 5;
21: ((struct b_s *)(&c))->y = 6;
22: c.z = 7;
23:
24: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
25:
26: common.ptr_c = &c;
27: common.ptr_c->z += 10;
28:
29: common.ptr_a->x += 20;
30: common.ptr_b->y += 30;
31:
32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
33:
34: return EXIT_SUCCESS;
35: }
/*--[eof:test.c]-------------------------------------------------*/

/*--[beg:eek:utput]-------------------------------------------------*/
01: x=5,y=6,z=7
02: x=25,y=36,z=17
/*--[eof:eek:utput]-------------------------------------------------*/

There are structs that implements inheritance of members:

a_s
|
+b_s
|
+c_s

So, casts in lines 20,21 are valid in C.

'union common_u' contains pointers to all of those structs.

Line 26 contains assignment of address of 'c' (leaf in the tree) to
union member: ptr_c.

So 'common.ptr_c' pointer is valid (line 27 is correct).

Question is:
Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
common.ptr_b and common.ptr_a (lines: 29,30)

Assuming N1570 is still current in this area, look at footnote 95.

BTW, in the real world this code justifies terminating employment.
 
S

Shao Miller

[...]
04:
05: struct a_s { int x; };
06: struct b_s { struct a_s super; int y; };
07: struct c_s { struct b_s super; int z; };
08:
09: union common_u {
10: struct a_s * ptr_a;
11: struct b_s * ptr_b;
12: struct c_s * ptr_c;
13: };
14:
15: int main(void)
16: {
17: struct c_s c;
18: union common_u common;
[...]
26: common.ptr_c = &c;
27: common.ptr_c->z += 10;
28:
29: common.ptr_a->x += 20;
30: common.ptr_b->y += 30;
31:
32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
33:
34: return EXIT_SUCCESS;
35: }
[...]


Question is:
Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
common.ptr_b and common.ptr_a (lines: 29,30)

You appear to be type-punning the value of the 'ptr_c' member as a
'struct a_s *' on line 29 and as a 'struct b_s *' on line 30.

In C89, we can see this:

"A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type. Similarly. pointers to
qualified or unqualified versions of compatible types shall have the
same representation and alignment requirements. ” Pointers to other
types need not have the same representation or alignment requirements."

so my answer to your question would be "no". However in practice, it's
probably always going to work. In C99, we can see this:

"A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type.39) Similarly, pointers to
qualified or unqualified versions of compatible types shall have the
same representation and alignment requirements. All pointers to
structure types shall have the same representation and alignment
requirements as each other. All pointers to union types shall have the
same representation and alignment requirements as each other. Pointers
to other types need not have the same representation or alignment
requirements.

39) The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions."

But the implementation's actual pointer representation could be
complicated and so there's still no guarantee. If you can dream up a
pointer representation, then you can dream up a counter-example to your
code's portability.

- Shao Miller
 
T

Tim Rentsch

Barry Schwarz said:
On Tue, 1 Jan 2013 03:45:48 -0800 (PST), Maciej Labanowicz


Assuming N1570 is still current in this area, look at footnote 95.

It isn't just that the union member access needs to get the right
bytes -- it is also important that the representations of the
different members agree. That agreement holds under C99 and C11,
but not under C89/C90.
 
T

Tim Rentsch

Maciej Labanowicz said:
Hi,

Please analyze following example:

/*--[beg:test.c]-------------------------------------------------*/
01:
02: #include <stdio.h> /* printf */
03: #include <stdlib.h> /* EXIT_SUCCESS */
04:
05: struct a_s { int x; };
06: struct b_s { struct a_s super; int y; };
07: struct c_s { struct b_s super; int z; };
08:
09: union common_u {
10: struct a_s * ptr_a;
11: struct b_s * ptr_b;
12: struct c_s * ptr_c;
13: };
14:
15: int main(void)
16: {
17: struct c_s c;
18: union common_u common;
19:
20: ((struct a_s *)(&c))->x = 5;
21: ((struct b_s *)(&c))->y = 6;
22: c.z = 7;
23:
24: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
25:
26: common.ptr_c = &c;
27: common.ptr_c->z += 10;
28:
29: common.ptr_a->x += 20;
30: common.ptr_b->y += 30;
31:
32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
33:
34: return EXIT_SUCCESS;
35: }
/*--[eof:test.c]-------------------------------------------------*/

/*--[beg:eek:utput]-------------------------------------------------*/
01: x=5,y=6,z=7
02: x=25,y=36,z=17
/*--[eof:eek:utput]-------------------------------------------------*/

There are structs that implements inheritance of members:

a_s
|
+b_s
|
+c_s

So, casts in lines 20,21 are valid in C.

'union common_u' contains pointers to all of those structs.

Line 26 contains assignment of address of 'c' (leaf in the tree) to
union member: ptr_c.

So 'common.ptr_c' pointer is valid (line 27 is correct).

Question is:
Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
common.ptr_b and common.ptr_a (lines: 29,30)

As a practical matter it should work. Strictly speaking it is
not guaranteed under C89/C90/C95, though it is under C99 and the
current standard, C11.

However, even though you can (most probably) get away with this
approach, code like this should raise a BIG RED FLAG whenever you
see it, especially if you are the one writing it. What you want
to do can easily be done in a way that's completely type safe
(ie, without using either casts or void *), as the printf() call
shows. Why use casting or type punning when not absolutely
necessary? Is there something else about what you're trying to
do that makes a cast-free approach unattractive? If there is,
you probably should ask about that, because it's likely a
different approach would reduce or eliminate that shortcoming,
and give an overall better result.
 
T

Tim Rentsch

Shao Miller said:
[...]
04:
05: struct a_s { int x; };
06: struct b_s { struct a_s super; int y; };
07: struct c_s { struct b_s super; int z; };
08:
09: union common_u {
10: struct a_s * ptr_a;
11: struct b_s * ptr_b;
12: struct c_s * ptr_c;
13: };
14:
15: int main(void)
16: {
17: struct c_s c;
18: union common_u common;
[...]
26: common.ptr_c = &c;
27: common.ptr_c->z += 10;
28:
29: common.ptr_a->x += 20;
30: common.ptr_b->y += 30;
31:
32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
33:
34: return EXIT_SUCCESS;
35: }
[...]


Question is:
Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
common.ptr_b and common.ptr_a (lines: 29,30)

You appear to be type-punning the value of the 'ptr_c' member as a
struct a_s *' on line 29 and as a 'struct b_s *' on line 30.

[in C99 and C11, pointers to struct have the same representation,
but in C89/C90 this guarantee is not present.]

Good to have this pointed out - thank you for tracking it down.

[under the C99 rules --]
But the implementation's actual pointer representation could be
complicated and so there's still no guarantee. If you can dream up a
pointer representation, then you can dream up a counter-example to
your code's portability.

The stipulation that all pointers to structs have the same
representation and alignment requirements means that the
type-punning union member access has to work. That's what
having the same represention means -- that the same object
representation (ie, the same bytes) will have the same value.
Any choice of representations for the two cases that doesn't
produce identical results here means the two representations
are not the same, ie, the implementation is not conforming
(under C99/C11 rules).
 
S

Shao Miller

The stipulation that all pointers to structs have the same
representation and alignment requirements means that the
type-punning union member access has to work. That's what
having the same represention means -- that the same object
representation (ie, the same bytes) will have the same value.
Any choice of representations for the two cases that doesn't
produce identical results here means the two representations
are not the same, ie, the implementation is not conforming
(under C99/C11 rules).

Here are three examples that I would consider to be counter-examples:

1. A 'struct any *' pointer representation that is a simple index.

This could provide a level of indirection into a table. The table
element could have type and bounds information, along with some other
form of address for the pointee. When the representation (a simple
index) is loaded into a 'struct bar *' instead of into a 'struct foo *',
a trap could be generated.

2. A 'struct any *' pointer representation that encodes bounds
information. While the original post "has this covered" because the
bounds of the the original pointee encompass the bounds of the members
and sub-members, it's not safe in the general case. When the
representation is loaded into a 'struct bigger *' instead of a 'struct
smaller *', the bounds mismatch could generate a trap.

3. A 'struct any *' pointer representation that encodes type
information. Maybe for the sole reason of generating a trap when the
representation is loaded into an incompatible pointer type of object.

It seems clear to me that size, alignment, argument promotion (none) and
format of 'struct foo *' and 'struct bar *' must be the same, but I
don't yet understand how that ties into compatible types nor into
defined behaviour, since

"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a representation
and is read by an lvalue expression that does not have character type,
the behavior is undefined. If such a representation is produced by a
side effect that modifies all or any part of the object by an lvalue
expression that does not have character type, the behavior is
undefined.41) Such a representation is called a trap representation."

Why can a valid 'struct foo *' value's representation represent a valid
'struct foo *' value but not a trap for a 'struct bar *'? For example,
it might be useful to trap a 'const struct baz *' representation read
into a 'struct baz *' object. A single bit in the representation would
be sufficient for that. The representation would be the same, wouldn't it?

- Shao Miller
 
S

Shao Miller

Here are three examples that I would consider to be counter-examples:

1. A 'struct any *' pointer representation that is a simple index.

This could provide a level of indirection into a table. The table
element could have type and bounds information, along with some other
form of address for the pointee. When the representation (a simple
index) is loaded into a 'struct bar *' instead of into a 'struct foo *',
a trap could be generated.

2. A 'struct any *' pointer representation that encodes bounds
information. While the original post "has this covered" because the
bounds of the the original pointee encompass the bounds of the members
and sub-members, it's not safe in the general case. When the
representation is loaded into a 'struct bigger *' instead of a 'struct
smaller *', the bounds mismatch could generate a trap.

3. A 'struct any *' pointer representation that encodes type
information. Maybe for the sole reason of generating a trap when the
representation is loaded into an incompatible pointer type of object.

It seems clear to me that size, alignment, argument promotion (none) and
format of 'struct foo *' and 'struct bar *' must be the same, but I
don't yet understand how that ties into compatible types nor into
defined behaviour, since

"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a representation
and is read by an lvalue expression that does not have character type,
the behavior is undefined. If such a representation is produced by a
side effect that modifies all or any part of the object by an lvalue
expression that does not have character type, the behavior is
undefined.41) Such a representation is called a trap representation."

Why can a valid 'struct foo *' value's representation represent a valid
'struct foo *' value but not a trap for a 'struct bar *'? For example,
it might be useful to trap a 'const struct baz *' representation read
into a 'struct baz *' object. A single bit in the representation would
be sufficient for that. The representation would be the same, wouldn't it?

Example #1: "...interchangeability as arguments to functions..."

/* libbaz.h */

typedef void f_baz_callback(structptr_t);

extern void BazFunc(f_baz_callback * Callback, structptr_t StructPtr);

/* libbaz.c */

typedef struct any * structptr_t;
#include "libbaz.h"

void BazFunc(f_baz_callback * callback, structptr_t sptr) {
/*
* 'struct any' is an incomplete object type.
* Trap representations are more limited than if it was a
* a complete object type.
*
* A trap representation for _any_ pointer type could
* still be present. A trapresentation for _any_
* 'struct XXX *' could still be present.
*
* A trapresentation based on bounds could still be present
* if 'sptr' is non-null, but somehow indicates 0 bytes
* of storage, or some other invalid value.
*
* A trapresentation based on lifetime could still be
* present. Same with 'const'-ness.
*
* etc.
*
* foo.c and bar.c have a different type for 'sptr', but
* since the representation is the same, there's no problem.
*/
callback(sptr);
}

/* foo.c */

typedef struct s_foo * structptr_t;
#include "libbaz.h"

struct s_foo {
int i;
};

f_baz_callback foo_callback;
void foo_callback(structptr_t sptr) {
sptr->i = 42;
}

void foo_func(void) {
struct s_foo foo;

BazFunc(foo_callback, &foo);
}

/* bar.c */

typedef struct s_bar * structptr_t;
#include "libbaz.h"

struct s_bar {
double d;
};

f_baz_callback bar_callback;
void bar_callback(structptr_t sptr) {
sptr->d = 3.14159;
}

void bar_func(void) {
struct s_bar bar;

BazFunc(bar_callback, &bar);
}

Example #2: "...and members of unions."

/* libnextgen.h version 1.0 */

struct apple;
struct orange;

union u_dyn_obj {
struct apple * apple;
struct orange * orange;
};

extern void NextGenFunc(union u_dyn_obj * DynamicObject);

/* libnextgen.h version 2.0 */

struct apple;
struct orange;
struct dog;
struct cat;

union u_dyn_obj {
struct apple * apple;
struct orange * orange;
struct dog * dog;
struct cat * cat;
};

extern void NextGenFunc(union u_dyn_obj DynamicObject);

/* user.c */

#include "libnextgen.h"

void UserFunc(void) {
struct apple apple;
union u_dyn_obj dyn_obj;

/*
* It doesn't matter which version of the header we
* were built with, _nor_ which version of the library
* is installed, because the representation (and thus
* size) and alignment are always going to be the same.
*
* We only work with apples and oranges, but 2.0's
* support for dogs and cats doesn't affect us.
*/
dyn_obj.apple = &apple;
NextGenFunc(dyn_obj);
}

Example #3: "... Such a representation is called a trap representation."

/* hmmm1.c */

#include <stdlib.h>
#include <stdio.h>

struct s_smaller {
char arr[4];
};

struct s_bigger {
char arr[sizeof (struct s_smaller)];
double d;
};

int main(void) {
void * storage;
struct s_smaller * smaller;

/* Allocate enough storage for an s_smaller */
storage = calloc(1, sizeof (struct s_smaller));
if (!storage)
return 0;
smaller = storage;

/*
* Problem #3.1: Although the representation is
* the same for both types, the value cannot
* point to an s_bigger due to insufficient storage.
* There's enough storage for arr, but that's
* irrelevant.
*/
(*((struct s_bigger **) &smaller))->arr[0] = 'C';

printf("Result: %s\n", (char *) storage);

return 0;
}

Example #4: "... Such a representation is called a trap representation."

/* hmmm2.c */

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

struct s_smaller {
char arr[4];
};

struct s_bigger {
char arr[sizeof (struct s_smaller)];
double d;
};

union u_of_ptrs {
struct s_smaller * smaller;
struct s_bigger * bigger;
};

void discard_provenance(
union u_of_ptrs * left,
union u_of_ptrs * right,
union u_of_ptrs * combined
);

int main(void) {
void * storage;
union u_of_ptrs first, first_backup, second, third;

/* Allocate enough storage for an s_bigger */
storage = calloc(1, sizeof (struct s_bigger));
if (!storage)
return 0;

/* Plenty of storage for an s_smaller */
first.smaller = storage;

/* Backup */
memcpy(&first_backup, &first, sizeof first_backup);

/* Free some storage */
storage = realloc(storage, sizeof (struct s_smaller));
if (!storage)
return 0;

/* Right amount of storage */
second.smaller = storage;

/* Compare the representations */
if (memcmp(&first_backup, &second, sizeof first_backup))
return 0;

/* Discard any "provenance" for a later test */
discard_provenance(&first_backup, &second, &third);

/*
* Problem #4.1: second.bigger cannot point to an
* s_bigger, as there's insufficient storage.
* There's storage enough for arr, but that's
* irrelevant.
*/
second.bigger->arr[0] = '1';

/*
* Problem #4.2: Same problem with first.bigger, even
* though its "provenance" was from the earlier allocation.
*/
first.bigger->arr[1] = '2';

/*
* Problem #4.3: Same problem with third.bigger, even
* though its "provenance" has been discarded.
*/
third.bigger->arr[2] = '3';

printf("Result: %s\n", (char *) storage);

return 0;
}

void discard_provenance(
union u_of_ptrs * left,
union u_of_ptrs * right,
union u_of_ptrs * combined
) {
unsigned char * lp = (void *) left;
unsigned char * rp = (void *) right;
unsigned char * cp = (void *) combined;
unsigned char * end = (void *) (combined + 1);

while (cp < end)
*cp++ = *lp++ & *rp++;
}

I would certainly appreciate a C99-/C11-conforming implementation that
is able to catch the problems of examples #3 & #4. One way would be to
deem trap representations for one object type and not for another, where
the types are not compatible.

My interpretation of "same representation and alignment requirements"
for struct pointer types is along the lines of:

- If there are padding bits in one, there are padding bits at the
same positions in the other
- If there are parity bits in one, there are parity bits at the same
positions in the other
- If a segment is encoded in one, then it is encoded in the same way
in the other
- If type information is encoded in one, then it is encoded in the
same way in the other
- If bounds information is encoded in one, then it is encoded in the
same way in the other
- If lifetime/duration information is encoded in one, then it is
encoded in the same way in the other
- etc.

Since this interpretation supports the fair examples #1 & #2 as well as
the more contrived examples #3 & #4, I fail to understand the benefit of
adopting a more restrictive interpretation which seemingly prohibits the
problems of #3 and #4 from being caught; perhaps with trap
representations. But perhaps I've misunderstood.

- Shao Miller
 
T

Tim Rentsch

Shao Miller said:
Here are three examples that I would consider to be counter-examples:

1. A 'struct any *' pointer representation that is a simple index.

This could provide a level of indirection into a table. The table
element could have type and bounds information, along with some other
form of address for the pointee. When the representation (a simple
index) is loaded into a 'struct bar *' instead of into a 'struct foo
*', a trap could be generated.

2. A 'struct any *' pointer representation that encodes bounds
information. While the original post "has this covered" because the
bounds of the the original pointee encompass the bounds of the members
and sub-members, it's not safe in the general case. When the
representation is loaded into a 'struct bigger *' instead of a 'struct
smaller *', the bounds mismatch could generate a trap.

3. A 'struct any *' pointer representation that encodes type
information. Maybe for the sole reason of generating a trap when the
representation is loaded into an incompatible pointer type of object.

These ideas aren't consistent with how the Standard uses the
notion of having the same representation in other instances. For
example, an object of type (int) has the same representation and
alignment requirements as an object of type (const int). Yet it's
ridiculous to think that loading an (int) object through a pointer
of type (const int *) might cause a trap when accessing the object
just as a plain int wouldn't, despite the two types being distinct
and not compatible.
It seems clear to me that size, alignment, argument promotion (none)
and format of 'struct foo *' and 'struct bar *' must be the same, but
I don't yet understand how that ties into compatible types nor into
defined behaviour, since

"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.41) Such a representation is called a trap
representation."

Why can a valid 'struct foo *' value's representation represent a
valid 'struct foo *' value but not a trap for a 'struct bar *'? For
example, it might be useful to trap a 'const struct baz *'
representation read into a 'struct baz *' object. A single bit in the
representation would be sufficient for that. The representation would
be the same, wouldn't it?

No. I expect you're thinking of "representation" as more or less
synonymous with "format", but representation means more than that.
The representation of a type is the mapping from the bits (ie, the
byte values of the object representation) to values in the type's
abstract value space, including trap values. If two types have
the same representation, that means the two mappings produce
corresponding values (ie, for each object representatioon) in the
two abstract value spaces. For C, corresponding values are what
would be produced by conversion between the two types in question.
In other words, if types A and B have the same representation,
then copying the bytes (eg, with memcpy()) from an 'A a;' into a
'B b;' must give the same results as 'b = (B) a;'. Any change in
behavior between the two cases means the two representations are
not the same. Accessing via type B using a union member access
works the same way that the memcpy() would.

For pointers, there is the additional concern that the converted
or corresponding value be a non-trap value in the abstract value
space of the new pointer type. However, in the particular example
here (ie, in the original posting, even though since disappeared
in the subthread), we know the pointer conversions have to work
because of the way the particular structs being pointed to are
nested.
 
S

Shao Miller

These ideas aren't consistent with how the Standard uses the
notion of having the same representation in other instances. For
example, an object of type (int) has the same representation and
alignment requirements as an object of type (const int). Yet it's
ridiculous to think that loading an (int) object through a pointer
of type (const int *) might cause a trap when accessing the object
just as a plain int wouldn't, despite the two types being distinct
and not compatible.

Ok. I agree with your example. But 2 points:

- The representation of 'int' is discussed in much greater detail than
the representation of any pointer type. Pointer representations are
much more opaque and free for the implementation to decide upon.

- I don't think it makes practical sense to encode type information in
the padding bits of an 'int', but it certainly seems useful to encode
extra information in a pointer representation, since they are derived
types with abstract values.

Surely if, in

void somefunc(void) {
unsigned char c;
/* ... */
}

'c' is permitted to have a trap representation due to its "provenance,"
then it is especially convenient that pointer representations are
opaque, so "provenance" or other meta-data can be encoded directly. No?
No. I expect you're thinking of "representation" as more or less
synonymous with "format",

Yes, you are right about that.
but representation means more than that.
The representation of a type is the mapping from the bits (ie, the
byte values of the object representation) to values in the type's
abstract value space, including trap values. If two types have
the same representation, that means the two mappings produce
corresponding values (ie, for each object representatioon) in the
two abstract value spaces. For C, corresponding values are what
would be produced by conversion between the two types in question.
In other words, if types A and B have the same representation,
then copying the bytes (eg, with memcpy()) from an 'A a;' into a
'B b;' must give the same results as 'b = (B) a;'. Any change in
behavior between the two cases means the two representations are
not the same.

I'm struggling to reconcile that with C99's 3.17p1 and 6.2.6.1. 3.17p1:

"value
precise meaning of the contents of an object when interpreted as
having a specific type"

I'm missing the part where it's possible for the same object
representation to represent the same value for two incompatible types,
since the value depends on the type.

Regarding conversion, 6.3p2 has that

"Conversion of an operand value to a compatible type causes no change
to the value or the representation."

Why mention both of them instead of simply "representation," if there's
a one-to-one correspondence between representation and value, given
compatible type? (Let alone incompatible types with the same
representation.)

Regarding pointer conversion, 6.3.2.3p1 has that

"For any qualifier q, a pointer to a non-q-qualified type may be
converted to a pointer to the q-qualified version of the type; the
values stored in the original and converted pointers shall compare equal."

Doesn't this explicitly hint that a 'const int *' value's representation
is permitted to be a trap representation for an 'int *', but not the
other way around? It seems convenient that such meta-data can be
directly encoded into the pointer representation, since pointer
representation is so opaque.

There's also p7:

"A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned57) for the pointed-to type, the
behavior is undefined. Otherwise, when converted back again, the result
shall compare equal to the original pointer. ..."

Doesn't this explicitly hint that it's not the most portable idea to do
anything much with a converted pointer other than to eventually convert
it back before using it? If I understand you correctly, there's no
conversion happening, as the value is simply becoming one in a different
type's value space, so there's no problem with p7.

Regarding your equivalence between the 'memcpy' and the cast for two
types with the same representation, 6.5.4p4 has that

"Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. This construction is called a
cast.89) A cast that specifies no conversion has no effect on the type
or value of an expression."

If '(B) a' is already the same value as 'a' due to the types having the
same representation, then there is no conversion, right? If that's the
case, then the type of '(B) a' should be 'A'. Like 3.17p1, type and
value are once again tied together, so it seems to me that incompatible
types can have incompatible values.

HOWEVER, you said _corresponding_values_. So I'd ask: May a value in
the value space for type 'A' not have a corresponding, but invalid value
in the value space for type 'B'? If it may, then I fail to understand
why the original post's code is well-defined in C99 and C11.
Accessing via type B using a union member access
works the same way that the memcpy() would.

I absolutely agree with your equivalence between 'memcpy' and union
members. Also: Re-interpreting the object representation with something
like:

A * ptr;
(*((B **) &ptr));

(where types 'A' and 'B' have the same representation.)
For pointers, there is the additional concern that the converted
or corresponding value be a non-trap value in the abstract value
space of the new pointer type. However, in the particular example
here (ie, in the original posting, even though since disappeared
in the subthread), we know the pointer conversions have to work
because of the way the particular structs being pointed to are
nested.

Ah, that answers my last question, above. But there's a bit of a jump
in the logic that I can't grasp, and that's why the nesting of the
structures in the original example has anything at all to do with the
corresponding pointer value having to work. Yes, I agree that the
original example's bounds are covered because of the nesting, but I
don't understand why that's the only important subject.

To back up a bit from the original example, 'char *' and 'void *' have
the same representation. Would you say that in:


void reinterpret(void) {
void * vp = &vp;
vp = (*((char **) &vp)) + 1;
}

the expression-statement has Standard-defined behaviour? I'm worried
about this example because an implementation might wish to represent
"the stride" of the pointer arithmetic, just as "Multi-Dimensional Array
Simulator"[1] does. Implicit and explicit conversions (like the
promotions, casts, equality and ternary semantics, etc.) seem to offer
all the protection we need, while re-interpretation does not.

- Shao Miller

[1] http://www.iso-9899.info/wiki/Code_snippets
 
S

Shao Miller

I absolutely agree with your equivalence between 'memcpy' and union
members. Also: Re-interpreting the object representation with something
like:

A * ptr;
(*((B **) &ptr));

(where types 'A' and 'B' have the same representation.)

I meant where 'A *' and 'B *' have the same representation.
 
S

Shao Miller

I meant where 'A *' and 'B *' have the same representation.

(And alignment requirements.)

However, please allow me to retract this equivalence with type-punning
via union members and 'memcpy'. After reviewing some discussion with
Mr. Clive Feather, now I'm not sure so... He points out that there is
an effective type involved, but we end up with an lvalue attempting to
access a stored value with that effective type associated, but the
lvalue attempting to access it has a type not permitted by 6.5p7.

- Shao Miller
 
S

Shao Miller

Ok. I agree with your example. But 2 points:

- The representation of 'int' is discussed in much greater detail than
the representation of any pointer type. Pointer representations are
much more opaque and free for the implementation to decide upon.

- I don't think it makes practical sense to encode type information in
the padding bits of an 'int', but it certainly seems useful to encode
extra information in a pointer representation, since they are derived
types with abstract values.

Surely if, in

void somefunc(void) {
unsigned char c;
/* ... */
}

'c' is permitted to have a trap representation due to its "provenance,"
then it is especially convenient that pointer representations are
opaque, so "provenance" or other meta-data can be encoded directly. No?


Yes, you are right about that.


I'm struggling to reconcile that with C99's 3.17p1 and 6.2.6.1. 3.17p1:

"value
precise meaning of the contents of an object when interpreted as
having a specific type"

I'm missing the part where it's possible for the same object
representation to represent the same value for two incompatible types,
since the value depends on the type.

Regarding conversion, 6.3p2 has that

"Conversion of an operand value to a compatible type causes no change
to the value or the representation."

Why mention both of them instead of simply "representation," if there's
a one-to-one correspondence between representation and value, given
compatible type? (Let alone incompatible types with the same
representation.)

Regarding pointer conversion, 6.3.2.3p1 has that

"For any qualifier q, a pointer to a non-q-qualified type may be
converted to a pointer to the q-qualified version of the type; the
values stored in the original and converted pointers shall compare equal."

Doesn't this explicitly hint that a 'const int *' value's representation
is permitted to be a trap representation for an 'int *', but not the
other way around? It seems convenient that such meta-data can be
directly encoded into the pointer representation, since pointer
representation is so opaque.

There's also p7:

"A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned57) for the pointed-to type, the
behavior is undefined. Otherwise, when converted back again, the result
shall compare equal to the original pointer. ..."

Doesn't this explicitly hint that it's not the most portable idea to do
anything much with a converted pointer other than to eventually convert
it back before using it? If I understand you correctly, there's no
conversion happening, as the value is simply becoming one in a different
type's value space, so there's no problem with p7.

Regarding your equivalence between the 'memcpy' and the cast for two
types with the same representation, 6.5.4p4 has that

"Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. This construction is called a
cast.89) A cast that specifies no conversion has no effect on the type
or value of an expression."

If '(B) a' is already the same value as 'a' due to the types having the
same representation, then there is no conversion, right? If that's the
case, then the type of '(B) a' should be 'A'. Like 3.17p1, type and
value are once again tied together, so it seems to me that incompatible
types can have incompatible values.

HOWEVER, you said _corresponding_values_. So I'd ask: May a value in
the value space for type 'A' not have a corresponding, but invalid value
in the value space for type 'B'? If it may, then I fail to understand
why the original post's code is well-defined in C99 and C11.


I absolutely agree with your equivalence between 'memcpy' and union
members. Also: Re-interpreting the object representation with something
like:

A * ptr;
(*((B **) &ptr));

(where types 'A' and 'B' have the same representation.)


Ah, that answers my last question, above. But there's a bit of a jump
in the logic that I can't grasp, and that's why the nesting of the
structures in the original example has anything at all to do with the
corresponding pointer value having to work. Yes, I agree that the
original example's bounds are covered because of the nesting, but I
don't understand why that's the only important subject.

To back up a bit from the original example, 'char *' and 'void *' have
the same representation. Would you say that in:


void reinterpret(void) {
void * vp = &vp;
vp = (*((char **) &vp)) + 1;
}

Since else-thread I'm retracting the union member type-punning
equivalence with this kind of raw re-interpretation, please allow me to
also retract this example and replace it with:

void reinterpret(void) {
union {
void * vp;
char * cp;
} u = { &u };
u.cp = u.cp + 1;
}
the expression-statement has Standard-defined behaviour? I'm worried
about this example because an implementation might wish to represent
"the stride" of the pointer arithmetic, just as "Multi-Dimensional Array
Simulator"[1] does. Implicit and explicit conversions (like the
promotions, casts, equality and ternary semantics, etc.) seem to offer
all the protection we need, while re-interpretation does not.

[1] http://www.iso-9899.info/wiki/Code_snippets
 
T

Tim Rentsch

Shao Miller said:
Ok. I agree with your example. But 2 points:

- The representation of 'int' is discussed in much greater
detail than the representation of any pointer type. Pointer
representations are much more opaque and free for the
implementation to decide upon.

That doesn't change the point I was making.
- I don't think it makes practical sense to encode type
information in the padding bits of an 'int', but it certainly
seems useful to encode extra information in a pointer
representation, since they are derived types with abstract
values.

Even if that's true, it doesn't change what the Standard mandates.
Surely if, in

void somefunc(void) {
unsigned char c;
/* ... */
}

'c' is permitted to have a trap representation due to its
"provenance,"

It isn't. You are either mis-remembering or have misunderstood.
then it is especially convenient that pointer
representations are opaque, so "provenance" or other meta-data can be
encoded directly. No?

Irrelevant. Such a statement might be an argument for changing
a future Standard, but it has no bearing on what is said
in the current Standard.
No. I expect you're thinking of "representation" as more or less
synonymous with "format",

Yes, you are right about that.
but representation means more than that.
The representation of a type is the mapping from the bits (ie, the
byte values of the object representation) to values in the type's
abstract value space, including trap values. If two types have
the same representation, that means the two mappings produce
corresponding values (ie, for each object representatioon) in the
two abstract value spaces. For C, corresponding values are what
would be produced by conversion between the two types in question.
In other words, if types A and B have the same representation,
then copying the bytes (eg, with memcpy()) from an 'A a;' into a
'B b;' must give the same results as 'b = (B) a;'. Any change in
behavior between the two cases means the two representations are
not the same.

I'm struggling to reconcile that with C99's 3.17p1 and 6.2.6.1.
[quoted paragraph snipped]

I'm missing the part where it's possible for the same object
representation to represent the same value for two incompatible
types, since the value depends on the type.

I don't see why you are confused. There is no wording that
forbids it, and it's obviously possible, as 'int' and 'const int'
illustrate. On many machines 'int' and 'long' provide another
example. Or two of the three character types.
Regarding conversion, 6.3p2 has that

"Conversion of an operand value to a compatible type causes no
change to the value or the representation."

Why mention both of them instead of simply "representation," if
there's a one-to-one correspondence between representation and
value, given compatible type? (Let alone incompatible types
with the same representation.)

Do you think the Standard includes a sentence saying compatible
types must have the same representation and alignment requirements?

Incidentally, there isn't a one-to-one correspondence between object
representations and values (necessarily, that is). The mapping is
_from_ object representations _to_ the abstract value space, but it
need not be one-to-one; also, the abstract value space includes
"trap values" which correspond to trap representations but are not
'values' as the Standard normally uses the term.
Regarding pointer conversion, 6.3.2.3p1 has that

"For any qualifier q, a pointer to a non-q-qualified type may be
converted to a pointer to the q-qualified version of the type; the
values stored in the original and converted pointers shall compare
equal."

Doesn't this explicitly hint that a 'const int *' value's
representation is permitted to be a trap representation for an 'int
*', but not the other way around? [snip]

No. Converting a valid 'const int *' to an 'int *' is well-defined
and must succeed.
There's also p7:

"A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned57) for the pointed-to type, the
behavior is undefined. Otherwise, when converted back again, the
result shall compare equal to the original pointer. ..."

Doesn't this explicitly hint that it's not the most portable idea to
do anything much with a converted pointer other than to eventually
convert it back before using it?
No.

If I understand you correctly, there's no conversion happening,
as the value is simply becoming one in a different type's value
space, so there's no problem with p7.

What I think you mean is there is no change to the object
representation (which I didn't say and which doesn't have to
be true). What I said was basically that the result must be the
same whether the object representation changes or not (in cases
where the two types involved have the same representation).
Regarding your equivalence between the 'memcpy' and the cast for two
types with the same representation, 6.5.4p4 has that

"Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. This construction is called
a cast.89) A cast that specifies no conversion has no effect on the
type or value of an expression."

If '(B) a' is already the same value as 'a' due to the types having
the same representation, then there is no conversion, right?

Wrong. Casting always does a conversion, even if the conversion
doesn't change either the value or the object representation.
Assignment also always does a conversion, even if the types are
the same. Furthermore for the case we are discussing, namely two
pointer-to-structure types, if the referenced types are different
then the value spaces of the two pointer types are disjoint, so
it can't be the case that the two values are the same.
If that's the case, then the type of '(B) a' should be 'A'.
Like 3.17p1, type and value are once again tied together, so it
seems to me that incompatible types can have incompatible
values.

This sentence is gibberish.
HOWEVER, you said _corresponding_values_. So I'd ask: May a
value in the value space for type 'A' not have a corresponding,
but invalid value in the value space for type 'B'? If it may,
then I fail to understand why the original post's code is
well-defined in C99 and C11.

I shouldn't have to explain this again. Converting the value
with a cast has to work, because of how the struct's are nested.
Therefore reinterpreting the object representation using a union
member access has to work, because that's what "having the same
representation" means.
I absolutely agree with your equivalence between 'memcpy' and union
members. Also: Re-interpreting the object representation with
something like:

A * ptr;
(*((B **) &ptr));

(where types 'A' and 'B' have the same representation.)

That doesn't work, as I think you pointed out subsequently,
because of effective type rules. Except for that, yes, same
idea.
Ah, that answers my last question, above. But there's a bit of
a jump in the logic that I can't grasp, and that's why the
nesting of the structures in the original example has anything
at all to do with the corresponding pointer value having to
work. Yes, I agree that the original example's bounds are
covered because of the nesting, but I don't understand why
that's the only important subject.

There are two important facts: one, the struct values are
nested appropriately; and two, the pointers to those structs
have the same representation (and alignment requirements).
Therefore the type-punning union member access gets a set
of bits that are both interpreted correctly and valid for
the type in question.
To back up a bit from the original example, 'char *' and 'void
*' have the same representation. Would you say that in:


void reinterpret(void) {
void * vp = &vp;
vp = (*((char **) &vp)) + 1;
}

Again, there is a violation of effective type rules in this case,
but if the analogous thing were done using union member access
then yes it has to work.
the expression-statement has Standard-defined behaviour? I'm
worried about this example because an implementation might wish
to represent "the stride" of the pointer arithmetic, just as
"Multi-Dimensional Array Simulator"[1] does. Implicit and
explicit conversions (like the promotions, casts, equality and
ternary semantics, etc.) seem to offer all the protection we
need, while re-interpretation does not.

You're confusing what you think might be a good idea with
what the Standard mandates. My comments are concerned only
with the latter.
 
S

Shao Miller

Again, there is a violation of effective type rules in this case,
but if the analogous thing were done using union member access
then yes it has to work.

And here is the analogous thing, offered elsethread:

void reinterpret(void) {
union {
void * vp;
char * cp;
} u = { &u };
u.cp = u.cp + 1;
/* Hmm ^^^^ */
}

The 'u.cp' expression marked by the comment (having type 'char *') is an
lvalue whose type is not one of those listed by 6.5p7, but it attempts
to access the value of 'u.vp'. (Doesn't it?) This appears to yield
undefined behaviour, doesn't it? Or would you suggest that the 'u'
sub-expression (having the union type) is the lvalue for purposes of
6.5p7, and that the type of the containing expression 'u.cp' doesn't matter?

- Shao Miller
 
T

Tim Rentsch

Shao Miller said:
And here is the analogous thing, offered elsethread:

void reinterpret(void) {
union {
void * vp;
char * cp;
} u = { &u };
u.cp = u.cp + 1;
/* Hmm ^^^^ */
}

The 'u.cp' expression marked by the comment (having type 'char *') is
an lvalue whose type is not one of those listed by 6.5p7, but it
attempts to access the value of 'u.vp'. (Doesn't it?) This appears
to yield undefined behaviour, doesn't it? Or would you suggest that
the 'u' sub-expression (having the union type) is the lvalue for
purposes of 6.5p7, and that the type of the containing expression
u.cp' doesn't matter?

Look harder. Think more. Write less.
 
S

Shao Miller

Look harder. Think more. Write less.

Please don't resort to this sort of personally-directed nonsense as
you've done before. If you don't have an answer, please simply say so.
If you really think I've missed something, it'd certainly be more
helpful to point it out instead of implying laziness or stupidity.

If you think I write too much, well, I think you write too little
Standard, and too much "Mr. T. Rentsch knows best." Unfortunately, that
doesn't work for me, as your knowledge isn't directly accessible to me.
I'm sorry if that makes our discussions difficult! If you choose to
help me to understand your valuable perspective, I'll be appreciative.

Just in case you're nit-picking an error in the code that hardly seems
relevant to the meat of the question, please allow me to offer the
corrected code:

void reinterpret(void) {
union {
void * vp;
char * cp;
} u;
u.vp = &u;
u.cp = u.cp + 1;
/* Hmm ^^^^ */
}

int main(void) {
reinterpret();
return 0;
}

Otherwise, would anyone else please point out what I might've missed
about whether or not the above example results in undefined behaviour?
The "shall"[6.5p7] is outside of a constraint, so that'd seem to be
undefined behaviour if the lvalue under consideration is 'u.cp'. If the
lvalue is 'u', then its union type _is_ permitted by 6.5p7 (as
acknowledged in a previous post, above), but it'd be good to know
_which_ is the lvalue under consideration.

- Shao Miller
 
S

Shao Miller

It isn't. You are either mis-remembering or have misunderstood.

Committee Discussion in Defect Report #260:

"In addition the C Standard does not prohibit an implementation from
tracking the provenance of the bit-pattern representing a value. An
indeterminate value happening to have a bit pattern that is identical to
a bit pattern representing a determinate value is not sufficient to
allow access to the indeterminate value free from undefined behavior."

That suggests to me that real implementation representatives discussed
it, and some of them must have argued that there is more to object
representation and value than a simple mapping. I suggest that there
are other meta-considerations (such as "indeterminate value"), some of
which are crucial to an implementation that wishes to have "enforceable
coding rules":

http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1663.pdf

'c' above is permitted to have a trap representation without even having
that fact coded into its object representation. If I've misunderstood,
then I apologize. If you have further knowledge of the status of DR
#260, then please share! :)

- Shao Miller
 
S

Shao Miller

void reinterpret(void) {
union {
void * vp;
char * cp;
} u;
u.vp = &u;
u.cp = u.cp + 1;
/* Hmm ^^^^ */
}

int main(void) {
reinterpret();
return 0;
}

Otherwise, would anyone else please point out what I might've missed
about whether or not the above example results in undefined behaviour?
The "shall"[6.5p7] is outside of a constraint, so that'd seem to be
undefined behaviour if the lvalue under consideration is 'u.cp'. If the
lvalue is 'u', then its union type _is_ permitted by 6.5p7 (as
acknowledged in a previous post, above), but it'd be good to know
_which_ is the lvalue under consideration.

Mr. Clive D. W. Feather very kindly gave his valuable time and shared in
agreement about this code.

6.5p7 makes this undefined behaviour, just as it does for the original
post's use of the two different union members, despite the two
pointer-to-structure types having the same representation and alignment
requirements.

The penultimate bullet of 6.5p7 regarding unions is so that the
following code is well-defined:

void reinterpret(void) {
union {
void * vp;
char * cp;
} u, v;
u.vp = &u;

/* Union lvalue on right accesses the stored value */
v = u;
(void) v;
}

int main(void) {
reinterpret();
return 0;
}

I'm glad that if I've lost some marbles, someone else lost the same ones. :)

- Shao Miller
 
T

Tim Rentsch

Shao Miller said:
It isn't. You are either mis-remembering or have misunderstood.

Committee Discussion in Defect Report #260: [snip]

The type unsigned char does not have trap representations. There
are no exceptions. Types that don't have trap representations
never have a trap representation.

In C11, accessing a variable like 'c' above before it has been
initialiized is undefined behavior. But that is because C11
added (relative to, eg, N1256) a specific statement regarding
such cases, stating explicitly that the behavior is undefined;
it has nothing to do with provenance or trap representations.
Indeed, seeing that this proviso was added in C11 makes it
obvious that DR 260 doesn't apply to cases like the example
above, because otherwise there would be no reason to add it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top