I disagree, and I'll elaborate below.
Assume that there is no int object at address 0x0000000C. Do you
believe that the behavior of an access to ptr is well-defined *by
the C standard*?
Assuming you really meant "access to ptr" and not "access to *ptr", my
answer is: No, then no.
The first access is during the 'memcpy'. The implementation defines the
representations of both 'x' and 'ptr'. If they do not have the same
size, there could be an out-of-bounds condition. If they do have the
same size, then this access is fine and we move on.
The second access is during the lvalue conversion that determines the
operand of the unary '*' operator. This second access has
implementation-defined behaviour, since the implementation defines the
mapping from object representations to values. Anything not defined (by
omission) or explicitly defined to be a trap representation would be a
trap representation. In that case, the lvalue conversion would yield
undefined behaviour. Otherwise, the value is a valid value and we get
past that lvalue conversion.
After that point, we can fret about the '*' operator. Whether or not
there's an object at address 0xC is not a consideration until this
point, so my understanding goes.
What I would say is an extremely relevant piece of Standard has
accidentally been snipped:
" 6.2.6.1 General
1 The representations of all types are unspecified except as stated
in this subclause.
2 Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of which
are either explicitly specified or implementation-defined."
Until the indirection in the last line, your example above should have
no different expectation for undefined/defined behaviour when compared to:
#include <string.h>
#include <stdio.h>
int main(void) {
float f;
unsigned int x = 0x4048F5C3;
memcpy(&f, &x, sizeof x);
printf("%f\n", f);
return 0;
}
If you'd say that this is undefined behaviour rather than
implementation-defined behaviour, please do explain why.
A trap representation is a representation such that accessing an object
holding it has undefined behavior. Undefined behavior is behavior that
is not defined by the C standard, regardless of whether the
implementation chooses to define it.
Did you just say that implementation-defined behaviour is a subset of
undefined behaviour? I don't think you did, given that you discuss
"implementation-defined," down below.
I think you think that I'm arguing that there is undefined behaviour and
that it so happens to be defined by Microsoft, so I'm claiming it to be
"well-defined". I'm not! I'm saying that there are parts where the
Standard calls something "implementation-defined" instead of
"undefined", and this is one such instance.
I tried to explain this before with the three paragraphs in response to
Ben about "predictability." "Well-defined" to me means that either the
Standard defines it directly, or defines some of it and defines that an
implementation defines the rest of it. Undefined behaviour doesn't
match either of those, even if the implementation provides definitions.
Was that not clear?
[...]
I don't think there needs to be an inference that were was a dereference
to a null pointer, but other than that, I think you're right. I really
wish I'd said "NULL_CLASS_PTR_DEREFERENCE" in the beginning, so it
wouldn't have been an issue of discussion. The trap values that Geoff
had discussed were not C trap representations, so I was likewise being
loose with "null pointers." I am filled with regret about that. As
we've already discussed, such a pointer value does not compare equal to
a C null pointer, so the debugger could hardly claim that it's a C null
pointer.
And in fact it doesn't claim either that 0x000000C a C null pointer, or
that it's any kind of null pointer.
I don't think it's intentional, but I'm going to share a result of this:
I've been contacted by another regular who has the impression that we've
been arguing for a long time about whether or not 0xC is a null pointer.
while (1) {
You("It's not a null pointer.");
Me("Well I didn't mean a C null pointer. Sorry.");
}
What do you mean by "any kind of null pointer"? After this discussion,
I didn't think it was remotely possible to believe that anything other
than the C definition could be spoken of. Heh.
There are at least 4 IDs that WinDbg uses for an invalid memory access:
1. BAD_PTR_DEREFERENCE
2. NULL_CLASS_PTR_DEREFERENCE
3. NULL_DEREFERENCE
4. STRING_DEREFERENCE
An exception can be analyzed (with 'analyze -v') and a "default bucket
ID" will be chosen by WinDbg. I seldom debug programs where it chooses
#1 or #4. I usually see it choose #2 or #3. A garbage pointer usually
just yields a default bucket ID "progtype_FAULT" (such as "DRIVER_FAULT"
or "APPLICATION_FAULT"), with no mention of "NULL" anywhere.
I disagree with your interpretation of both linked discussions. In both
cases, there was a null pointer dereference in C (or C++) code, which
resulted in an attempted dereference of a non-null but invalid pointer
(similar to 0x0000000C) in the generated machine code.
Machine code *implements* C semantics; it needn't precisely mirror them.
I don't think you understood my interpretation.
Scott: "This is a NULL pointer dereference in NTFS."
Chad: "Once again we are dereferencing a NULL pointer and once again our
program is crashing..."
Chad: "...the real problem was that your pointer was NULL?"
Chad: "So there ya go. A whole lot of null dereferences..."
In none of the cases was 'NULL' dereferenced. In none of the cases was
"null pointer" typed. That is, people don't always say precisely what
they mean. I'll try to do better in the future.
[...]
Considering a Windows pointer with representation 0xC, I meant with
regards to what Ben had asked about and what I had typed later in the
same post: "The C Standard _plus_ the implementation say: It can be
stored, read, passed, discarded, converted, compared, its size
determined, etc. Pretty well anything that doesn't involve using the
pointer for indirect access."
Sure, but the C standard doesn't permit any of those things. It's well
known that an implementation can define behavior that isn't defined by
the C standard. Such behavior is still "undefined behavior" as defined
by 3.4.3.
Discussed above. I'm claiming it's defined by the Standard to be
implementation-defined behaviour, not that it's
undefined-behaviour-relative-to-the-Standard and which happens to be
defined elsewhere. I could be wrong, as always, but I've yet to
understand why that might be.
[...]
int main(void) {
void * vp1 = (void *) 0xC;
void * vp2 = vp1;
return 0;
}
This is implementation-defined, not undefined. If the implementation
does not define the result of the cast to be a trap representation, or
better yet, defines that pointers are implemented as unsigned integers
with all value bits, then there is no undefined behaviour. This is the
case for Windows NT, as far as I'm aware.
No, it's undefined. It may additionally be defined by the
implementation, but it's not implementation-defined.
I don't understand this perspective, given 6.3.2.3p5:
"An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not be
correctly aligned, might not point to an entity of the referenced type,
and might be a trap representation.67)"
The phrase "implementation-defined behavior", as defined by the C
standard, refers *only* to behavior that is explicitly referred to
by the standard as "implementation-defined". It doesn't just mean
"behavior that is defined by the implementation".
Yes, and that's the stuff I'm talking about. (See above.)
In your program, the behavior of the initialization:
void * vp2 = vp1;
is not defined by the C standard. For example, it could cause the
program to crash in a conforming implementation. It is therefore,
by definition, undefined behavior. If an implementation chooses
to define its behavior, that doesn't change any of the above --
and the implementation is not obligated to document its choice.
You're talking about undefined behaviour. I'm talking about
implementation-defined behaviour. This line cannot stand on its own...
We have to ask, "Was the lvalue conversion of 'vp1' defined?"
The answer is, "Yes, because it has a valid value."
Then we ask, "How do we know that?"
The answer is, "Because it was initialized with the valid value that was
the result of the cast."
Then we ask, "How do we know the result of the cast was a valid value?"
The answer is, "Because the result of the cast conversion is
implementation-defined, and Microsoft's implementation defined it to be
a valid value."
If we'd encountered any lack of definitions along the way, there'd be
undefined behaviour. Or, more naturally, we could work from the
beginning, too.
They are pointer values such that accessing them has undefined
behavior. They are neither null pointers, nor pointers to any
object, nor pointers just past the end of any object. It's not 100%
clear to me, from the standard's definition of "trap representation",
that they *must* be trap representations, but I believe that
they are.
Well I suppose it's hard to prove either way, given that it's C90. I
just figured that it is up to the implementation to define which
representations map to which of {value, non-value}. I'm about C99%
sure, so learning otherwise would be a valuable learning experience.
Speculation: Perhaps what bothers you is the idea that "undefined
behavior" and "trap representation" imply "This is evil, don't
touch!!!". They don't. It's perfectly valid for an implementation
to define the behavior of something that has "undefined behavior" in
the standard, or to use a C trap representation for its own purposes.
Good guess, but nope.
Certainly their representation can provide hints to a debugger; we've
seen that demonstrated. That doesn't cause them not to be trap
representations.
Agreed.