Let's consider this function though:
int foo(int* x, short* y)
{
*x = 1;
*y = 2;
return 1;
}
int bar(int* x, short* y)
{
*x = 1;
*y = 2;
return *x;
}
Let's consider functions foo and bar. Let's suppose that x and
y alias in both. For function foo, there is no undefined
behavior even though both alias (at least according to what
appears to be the prominent interpretation of these rules).
I'm not sure about C here, but in C++, there is definitely
undefined behavior in foo if x and y alias. In fact, there
would be undefined behavior even if foo were simply:
void foo(const int* x, const short* y)
{
printf("%d, %d\n", *x, *y)
}
If the two pointers point to the same physical address, there is
no way that the memory they point to can be both an int and
a short. And the C++ standard clearly says:
If a program attempts to access the stored value of an
object through an lvalue of other than one of the following
types the behavior is undefined:
[...]
and short for int or vice versa isn't in the list. And I'm
certain that the intent in C is the same: C definitly allows
trapping representations for integer values, and reading part of
an int as a short could conceivably result in a trapping
representation for a short. (Think of a one's complement
machine which traps on -0.)
The problem becomes more interesting if we replace short with
unsigned char. In that case, my version is legal and defined
behavior: accessing a stored value through an lvalue of char or
unsgiend char type is in the list after the cited paragraph.
(IIRC, in C, this exception only applies to unsigned char; for
some reason, C++ added plain char to the list.) But what about
the original version, which modifies. Is modifying an int
through an unsigned char* undefined behavior? What if it
results in a trapping representation in the int? Or is it just
undefined behavior if you access the int? And of course,
modifying all of the bytes in the int, from a bytewise copy of
another int, has to be fully defined behavior.
Is the following a well-formed C++ program without UB?
#include <cstdlib>
using namespace std;
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
int*x = (int*) p;
*x = 1;
}
Let's hope so
.
Seriously, in C++ at least, a POD "exists" as soon as the memory
for it is allocated. I think the standard could be clearer, but
I'm pretty sure that the intent is that memory allocated with
malloc (or with the operator new function) is potentially an
object of any POD type which fits, and becomes an object of
a specific POD type when it is used as such. The assignment to
*x means that the memory between p and p + sizeof(int) contains
an int object (and that using it as any other type of object is
undefined behavior).
Again, an interesting case is something like:
void f(float v)
{
void* p = malloc(sizeof(float));
memcpy(p, &v, sizeof(float));
void* pf = (float*)p;
printf("%.5f\n", *pf);
}
What is the type of the object starting at p? (IIRC, the
specification of memcpy says that it copies "as if" through
unsigned char*, so we've effectively used the object as an
unsigned char[]. So using it as a float would seem to violate
§3.10/15 (in the C++ standard). IMHO, the above *should* be
legal and well defined; i.e. if I call f(3.14159), the above
should display "3.14159". But I'm not sure that this is the
case as the standard is currently written. §3.8 is very clear
that the lifetime of an object of type T begins when storage
with the proper alignment and size is obtained, but with malloc
or the operator new function (both of which return a void*, and
guarantee alignment sufficient for any type of object), what is
the type of the object whose lifetime has just begun?
What about the following?
#include <cstdlib>
using namespace std;
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
int*x = (int*) p;
*x = 1;
float* y = (float*) p;
*y = 1;
}
I think this is covered by the last bullet in §3.8: "The
lifetime of an object of type T ends when: [...]-- the storage
which the object occupies is reused or released." Although it's
not nearly as clear as it should be, I would consider the last
assignment as "reusing" the storage as a float, so that the int
object at p ceases to exist, and a new object with float type
comes into existance. In the case of memory obtained by means
of malloc or the operator new function, I think the standard
*should* say (but doesn't) that an object of type T only begins
to exist when the raw memory is initialized as a type T, and
that reinitializing it with a different type causes a new object
to begin to exist. In sum, such memory behaves very much like
a union of all types which fit, and you can only access the last
assigned value. (But you also need special wording for cases
where the "object" is initialized using memcpy or something
similar---writing through an unsigned char*. While the current
wording seems defective, it's hard to find adequate wording.)
I've had a thread up on comp.std.c++ for a while now about these
issues, and I've gotten 0 replies. It's quite frustrating.
In short, I would argue that both of the above programs have no UB in C
++, nor their equivalent program in C. You need both programs above to
have no UB in order to have user-space memory allocators in standard
conforming C++. I think that the standard's intent is not to forbid
user-space C++ standard conforming pooling memory allocators.
Let's look at "3.8 Objectlifetime / 1, 2, 4, 5, 6, and 7". Each of
those sections make reference to "reusing storage", something which is
distinct from "releasing the storage". "Reusing the storage" of an
object ends that object's lifetime. What else can this mean besides
the following?
void* p = malloc(sizeof(int) + sizeof(float));
int*x = (int*) p;
*x = 1;
float* y = (float*) p;
*y = 1; /* reuse of storage, the int object's lifetime ends, and
the float object's lifetime begins */
Agreed. The real questions are: what is the type of the object
between the malloc and the *x = 1 statement (since an object
lifetime has apparently begun, according to §3.8), and (more
importantly) what about the case where you initialize using
something like memcpy?
Furthermore, let's look at the rules in "3.8 Object Lifetime". "3.8
Object Lifetime / 1" is actually nonsensical as written. Consider:
void* p = malloc(sizeof(char))
Well, we've allocated storage with proper alignment and type for an
arbitrarily large number of types, and if those types have a trivial
constructor, such as:
struct T1 {};
struct T2 {};
struct T3 {};
//etc.
then an object of each of those types exists at that location. So, an
arbitrarily large number of distinct complete objects coexist in "*p"
according to that reading of the rules, which is entirely
nonsensical.
I see we're thinking along the same lines.
Unfortunately, as I've expounded at length in the thread on
comp.std.c++, the sensible way forward isn't clear. However,
some of the proposed changes to C++0x in 3.10 / 15 are taking
the language in quite the wrong direction IMO.
We need to solve a couple of basic problems. The most important and
basic is: when does the lifetime of a POD class even begin?
That's a good question: do PODs have lifetime? I'd argue yes,
but it's not the lifetime defined in §3.8. Accessing an
uninitialized POD is undefined behavior, and if you can't access
an object, how can you say it exists?
#include <cstdlib>
using namespace std;
struct T1 { int x; int y; };
struct T2 { int x; int y; };
Just a note: the answers in the following may differ between
C and C++. I don't have a copy of the C standard handy to
verify what it says, but it does use a subtly different
definition of type than C++, with terms like "compatible types".
(If memory serves me correctly, I think that if two structs both
have a tag, and the tag is different, then the types are not
compatible, and so the effect is the same here. But I'm far
from sure.)
int main()
{
void* p = 0;
T1 * t1 = 0;
T2 * t2 = 0;
int * x = 0;
if (sizeof(T1) != sizeof(T2))
return 1;
if ( (char*)(& t1->y) - (char*) (& t1) != (char*)(& t2->y) -
(char*) (& t2) )
return 1;
p = malloc(sizeof(T1));
/* Do we have a T1 object here? Presumably no. Otherwise we also
have a T2 object here, and we definitely don't want to start talking
about two distinct complete objects occupying the same storage at the
same time. */
t1 = (T1*) p;
/* T1 object yet? Presumably the answer hasn't changed since the
above comment. */
x = & t1->x;
/* T1 object yet? */
*x = 1;
/* Do we have a T1 object here? Maybe. I just see a write through
an int lvalue. I see no writes nor reads through a T1 lvalue. I see
nothing that favors T1 over T2, besides some sort of data dependency
analysis through the member-of operator. However, there isn't even a
hint of data dependency analysis in the standard with regards to
object lifetime rules. */
x = & t1->y;
*x = 2;
/* Do we have a T1 object here? The answer must be yes, or we'll
never have a T1 object. However, again, I see nothing to favor having
a T1 object over a T2 object besides data dependency analysis through
the member-of operator. */
t2 = (T2*) p;
return t2->y; /* UB? Why? Why is reading "t1->y" not UB, but
reading "t2->y" is UB? In other words, why do we have a T1 object, but
not a T2 object? */
}
Also, what if we used offsetof hackery to initialize both int members
of the T1 object without using a member-of operator on a T1 lvalue?
As far as I can tell, gcc doesn't even bother doing aliasing analysis
on anything besides primitive types, for exactly the reasons outlined
above. They must not have seen a sensible way to differentiate between
T1 and T2, just as I cannot.
Gcc may be basing its decision on the meaning of "compatible
type" in C. Again, purely from memory (perhaps someone from the
C group could confirm), I think that given:
typedef struct { int i; } T1;
typedef struct { int i; } T2;
, in C, T1 and T2 are "compatible types", and behave more or
less as if they were the same type. (In C++, they are two
distinct types.)