C Standard Regarding Null Pointer Dereferencing

S

Shao Miller

I'm sorry you seem to be implying that I haven't thought about this
carefully.  [snip]

Oh no, I didn't mean that at all.  It's obvious you have
thought about it carefully.  If you hadn't, your question
wouldn't have come up.  You see what I'm saying now?
Thanks, Tim. I apologize for misunderstanding. I also agree that if
UB is intended for the case of:

(void)*(char *)0;

then something, somewhere in the text needs to be modified; 6.5.3.2
would be an easy place for it, should it be the case that the wording
is carelessly imprecise. An alternative might be that the wording is
accurate and intended, allowing for no undefined behaviour in the
original post's question. Any attempted use of the _value_ of the
result of '*(char *)0' is certainly undisputedly undefined behaviour,
since there is no defined means to obtain such a value (it is defined
that there is no object to obtain a value from).
 
K

Keith Thompson

Ben Bacarisse said:
Obviously I thought that the paragraph about scalars was not to be taken
as applying recursively. Reading your point below, that now seems to a
daft way to read it, but let me just say why I took to be so (if only
for a few hours!). First, just after the paragraph about scalars, p12
reads:

12 The rest of this subclause deals with initializers for objects that
have aggregate or union type.

which I took to mean "and the previous clauses don't" for no good reason
other than I seem to have a habit of reading more than is intended into
phrases that are simply informative. Second, p20 starts:

20 If the aggregate or union contains elements or members that are
aggregates or unions, these rules apply recursively to the
subaggregates or contained unions.

as if to suggest that "these rules" are applied recursively but only
down to the level of sub-aggregates not the scalars within them. Again,
this is just reading too much into it. That the rules about aggregates
apply recursively, does not mean that the others about scalars don't.

But it would be nice if that section were clearer. You and I
both took a while to figure out how the wording reflects the (in
retrospect fairly obvious) intent.
As I studied 6.7.8, I was relieved to discover this; otherwise a
compiler wouldn't be required to diagnose

double arr[] = { "hello" };

and that would be bad.

Yes, and for that reason alone it seems clear (now) that p11 must apply
to all enclosed scalars as much as to the top-level ones.

But I'm not entirely comfortable interpreting ambiguous wording by
picking the interpretation that doesn't lead to bad consequences.

I think the "applies recursively" wording should be earlier in that
section.
 
B

Ben Bacarisse

Keith Thompson said:
Ben Bacarisse said:
Keith Thompson <[email protected]> writes:
As I studied 6.7.8, I was relieved to discover this; otherwise a
compiler wouldn't be required to diagnose

double arr[] = { "hello" };

and that would be bad.

Yes, and for that reason alone it seems clear (now) that p11 must apply
to all enclosed scalars as much as to the top-level ones.

But I'm not entirely comfortable interpreting ambiguous wording by
picking the interpretation that doesn't lead to bad consequences.

I think the "applies recursively" wording should be earlier in that
section.

Agreed.
 
T

Tim Rentsch

pete said:
I don't know the standard.

You have undefined behavior because
you have a null pointer as an operand of the indirection operator.

This is the relevant text:

6.5.3.2 Address and indirection operators
Constraints
1 The operand of the unary & operator shall be either
a function designator,
the result of a [] or unary * operator,
or an lvalue that designates an object
that is not a bit-field and is not declared
with the register storage-class specifier.

Wrong operator. Indirection is '*', not '&'.
 
T

Tim Rentsch

pete said:
I was discussing '*'.

The following is the original post:

This e-mail is in regards to how a C translator/compiler should handle
the expression:

*(char *)0

I know that, but the paragraph you quoted was talking
about the address operator, not the indirection operator.

Here is the quoted paragraph again:
This is the relevant text:

6.5.3.2 Address and indirection operators
Constraints
1 The operand of the unary & operator shall be either
a function designator,
the result of a [] or unary * operator,
or an lvalue that designates an object
that is not a bit-field and is not declared
with the register storage-class specifier.

This constraint is not about the indirection operator.
 
S

Shao Miller

I found the relevant portion of the text of 'n1256.pdf' which renders:

(void)*(char *)0;

to yield undefined behaviour. It is "Cast operators", 6.5.4,
Semantics 4:

"Preceding an expression by a parenthesized type name converts the
value of the expression to the named type."

It is _here_ that we _require_ a _value_ for the _result_ (of the
expression) that is '*(char *)0'. Why? There are two casts here.
The cast to 'char *' is fine. The cast to void is not, since it
doesn't survive without a value for '*(char *)0'. This is before we
consider the whole as a void expression.

There is no definition for the value of the result of '*(char *)0'.

Unfortunately, we are still left with a tricky case; that of:

*(void *)0;

To summarize some points:
1. The value of the expression '0' is cast to 'void *'
2. The result of [1] has type 'void *'.
3. The result of [1] is a pointer.
4. The result of [1] is a null pointer.
5. The result of [1] is a null pointer constant.
6. The result of [1] is a scalar.
7. The result of [1] has a value (as per the cast conversion).
8. The value of the result of [1] is not assigned to a pointer.
9. We apply the unary '*' operator.
10. We give this operator in [9] the expression '(void *)0' as its
operand.
11. The result of [1] is thus the resulting (let's say, "effective")
operand in [9].
12. The effective operand in [9] does not point to a function.
13. The effective operand in [9] does not point to an object, as per
[4].
14. The result of [9] cannot be an lvalue by [13].
15. The effective operand in [9] has type 'void *', as per [2].
16. The result of [9] is defined to have type 'void', due to [15].
17. No invalid value has been assigned to the pointer, as per [8].
18. The result of [9] is not required to possess a value.
19. No undefined behaviour, thus far.
20. The entirety of '*(void *)0' is an expression.
21. The expression in [20] is a void expression.
22. The void expression in [20] has a non-existent value, congruent
with [18].
23. The void expression in [20] is evaluated, including for its side
effects.
24. The expression in [20] does not access a volatile object.
25. The expression in [20] does not modify an object.
26. The expression in [20] does not modify a file.
27. The expression in [20] does not call a function.
28. The expression statement '*(void *)0;' yields no undefined
behaviour and is fully defined as a legal expression statement.

Opinions, thoughts, pedantry?

Thank you for all of the feedback so far. This subject matter is good
to know in case of writing an implementation.

- Shao Miller
 
B

Ben Bacarisse

pete said:
You have undefined behavior because
you have a null pointer as an operand of the indirection operator.

Whilst I believe that's true in this case it is not (as I am sure you
know) a sufficient condition. For example, both

&*(char *)0

and

sizeof *(char *)0

are explicitly well-defined despite having * applied to a null pointer
operand. This may be the germ of Shao Miller's question. Since a void
expression is evaluated "for its side effects" can the absence of side
effects render the expression in question unevaluated?

For what it's worth, my view is that this wording is there simply to
give permission for an implementation to optimise away such side-effect
free evaluations as the one the OP gave originally. That does not
change the fact that it is undefined.

<snip>
 
T

Tim Rentsch

Shao Miller said:
I found the relevant portion of the text of 'n1256.pdf' which renders:

(void)*(char *)0;

to yield undefined behaviour. It is "Cast operators", 6.5.4,
Semantics 4: [snip elaboration]

The expression '*(char *)0' is undefined behavior if it
is evaluated. Any subsequent cast is irrelevant to the
question about whether the behavior is defined.
 
S

Shao Miller

  &*(char *)0

and

  sizeof *(char *)0

are explicitly well-defined despite having * applied to a null pointer
operand.  This may be the germ of Shao Miller's question.  Since a void
expression is evaluated "for its side effects" can the absence of side
effects render the expression in question unevaluated?
'Twas the original germ, yes. Then we agreed to take the text
literally and that the expression is evaluated. I still didn't see
any UB, because:

A value for the result of application of the unary '*' is not a
requirement of the text for the unary '*' operator in 'n1256.pdf'.

Consider:

(*p).f();

Where '(*p)' yields an lvalue (assuming 'p' points to an object), thus
satisfying the requirement for an object by the membership '.'
operator. But ponder what "value" was involved for '(*p)' in this
case. The aggregate value of the object pointed to by 'p' in its
entirety? I should hardly think so.
For what it's worth, my view is that this wording is there simply to
give permission for an implementation to optimise away such side-effect
free evaluations as the one the OP gave originally.
Agreed.


That does not
change the fact that it is undefined.
As per my post in response to the original post, and due to the cast
when casting to '(void)', I now agree. Not due to application of the
unary '*' operator to a null pointer which has not been assigned a
null pointer value. See also the challenge with:

*(void *)0;
 
S

Shao Miller

The expression '*(char *)0' is undefined behavior if it
is evaluated.  Any subsequent cast is irrelevant to the
question about whether the behavior is defined.
If and only if you do not take the text for the unary '*' operator
literally. That text describes undefined behaviour when a null
pointer value has been assigned to the pointer. Here we have a null
pointer, not a nuller pointer value assigned to a pointer.

We agreed that it's possible that that text might be imprecise, and
might need to be addressed, did we not? But it's also possible that
it's precise, and there is no undefined behaviour until casting to
'(void)'.

Would you agree?
 
T

Tim Rentsch

Shao Miller said:
If and only if you do not take the text for the unary '*' operator
literally. That text describes undefined behaviour when a null
pointer value has been assigned to the pointer. Here we have a null
pointer, not a nuller pointer value assigned to a pointer.

We agreed that it's possible that that text might be imprecise, and
might need to be addressed, did we not? But it's also possible that
it's precise, and there is no undefined behaviour until casting to
'(void)'.

Would you agree?

I don't. The wording could be better, but there is no
doubt about the meaning. The Standard is written in
formal English but it is not a math textbook, and it's
at best a waste of time to read it like one.

If you want to get technical, it can NEVER be the case
that the operand of an indirection operator has been
assigned. In the expression '*p', where p has been
declared to be of some pointer type, the operand 'p'
has already been converted to a value by virtue of
6.3.2.1p2. There is no difference between '*p' and
'*(char*)0' in this regard -- both operate on values,
not objects. So it's completely nonsensical to try to
understand "has been assigned" as applying to one class
of operand expression but not another. They are all
just values.
 
S

Seebs

We agreed that it's possible that that text might be imprecise, and
might need to be addressed, did we not? But it's also possible that
it's precise, and there is no undefined behaviour until casting to
'(void)'.

Would you agree?

No. It is adequately precise, adequately clear, and the undefined behavior
is unambiguous.

-s
 
B

Ben Bacarisse

Shao Miller said:
'Twas the original germ, yes. Then we agreed to take the text
literally and that the expression is evaluated. I still didn't see
any UB, because:

A value for the result of application of the unary '*' is not a
requirement of the text for the unary '*' operator in 'n1256.pdf'.

It says "if it [the operand] points to an object the result is...". If
the pointer does not point to an object the behaviour is undefined by
omission. There is a clarifying clause about invalid pointers but it
adds no new meanings, which is fortunate since it uses the clumsy "if an
invalid value has been assigned to the pointer" phrase. So *E is
defined only when E points to a function or when E points to an object.
To which function or object does (char *)0 point?

I don't understand your text about "a value for the result of [the]
application" not being "a requirement of the text" but I don't think I
need to. *E is defined when E points to a function or an object and I
think your example fails both tests.
Consider:

(*p).f();

Where '(*p)' yields an lvalue (assuming 'p' points to an object), thus
satisfying the requirement for an object by the membership '.'
operator. But ponder what "value" was involved for '(*p)' in this
case. The aggregate value of the object pointed to by 'p' in its
entirety? I should hardly think so.

Why? I should think exactly that.

As per my post in response to the original post, and due to the cast
when casting to '(void)', I now agree. Not due to application of the
unary '*' operator to a null pointer which has not been assigned a
null pointer value.

Let me get this clear. You are saying that (void)*(char *)0 is UB due
to the cast and, presumably, that *(char *)0 is not because there is no
cast? If so, I won't ask you to re-rash the argument -- I'll find it in
my news feed if I want to go look.

As you can see from the above, I disagree.
See also the challenge with:

*(void *)0;

Also undefined for the same reason.
 
S

Shao Miller

I don't.  The wording could be better, but there is no
doubt about the meaning.
After your fine reference to the text below, I'd have to agree.
 The Standard is written in
formal English but it is not a math textbook, and it's
at best a waste of time to read it like one.
I am not aware of anyone who's reading it like a math textbook and I'd
have to agree. It could be worth-while reading its fine detail and
discussing and resolving perceived ambiguities, for the case where one
might be interested in developing a translator for C.
If you want to get technical,
Indeed I did.
it can NEVER be the case
that the operand of an indirection operator has been
assigned.  In the expression '*p', where p has been
declared to be of some pointer type, the operand 'p'
has already been converted to a value by virtue of
6.3.2.1p2.  There is no difference between '*p' and
'*(char*)0' in this regard -- both operate on values,
not objects.  So it's completely nonsensical to try to
understand "has been assigned" as applying to one class
of operand expression but not another.  They are all
just values.
This is to me an extremely valuable reference to the text of
'n1256.pdf'. I agree that with this reference in mind, it's
nonsensical to treat "If an invalid value has been assigned to the
pointer" as being intended to mean anything other than "If the operand
has an invalid value"... If only the text said "operand." It
doesn't. It says "pointer."

Is there any doubt that the operand has a value? We can assign '(char
*)0' or even '(void *)0' to an object. I don't think there's any
doubt that the operand has a value.

This could potentially be a cause for confusion, since sentences 2 and
3 explicitly use "operand" and "points to" and "has type". The next
sentence could very well mean, "if the value of the operand _is_ an
invalid value..." (Emphasis mine.) It could also mean, "if the value
of the operand was an invalid value assigned to the operand..."

Do you understand why I am asking about all of this? In the execution
environment if we attempt to access an object at an invalid location,
it should be undisputed as undefined behaviour. But expression
evaluation != execution. Evaluation of a constant scalar expression
such as '(char *)0' need not be "executed" at all. That is to say,
the text defines an attempted object access to an invalid location as
undefined behaviour. It could even be trapped by the best
implementation. But evaluation of an expression which is an
application of the unary '*' operator does noes necessitate an object
access to any location. If it did, the text should include something
like:

"The result of evaluation of the unary '*' operator shall be the value
of an object pointed to by the operand, if the operand point to an
object."

But that might not be the case. Consider these:

(*p).f();
(*q)->x = 10;
*r = 11;
(*s)();

For 'p', 'q' and 'r', if they point to an object, the result is an
lvalue. It's not a "value". There's no need to "fetch" the "value"
during the indirection at all, is there? Thus we only get undefined
behaviour if they _don't_ point to an object, which is a determination
that might only be possible during execution.

For 's', the indirection is intended to result in a function
designator. Not an lvalue. Not a "value".

It is clear that many people have tied evaluation of the unary '*'
operator to "yielding an object, pointed-to by the operand" in their
thinking. But this is not the case.

Also consider a Turing machine implementation with a tape and a head.
In the 'q' example above, if 'q' were assigned the value '(struct foo
*)0', the head might move to position zero, where "read" and "write"
are invalid. No read nor write is attempted. Then the head moves by
the offset of the 'x' member. At last, we attempt a write when we
assign, assuming that reads and writes are valid at that position.
Why should there be undefined behaviour by moving the head to position
0 any more so than to any other location which is invalid for objects
or for which the validity is not guaranteed?

Does anyone understand why "has been assigned" could be important?

char *p;
*p = 'Y';

If the Turing machine's head attempts to move to the location as per
'p', that location might not be a valid location for the head to move
to. Undefined behaviour. But how can you have _an_expression_ with a
_constant_scalar_value_ at _translation_time_ (let alone during
execution) possibly represent an invalid location for the head to move
to?
 
K

Keith Thompson

Shao Miller said:
If and only if you do not take the text for the unary '*' operator
literally. That text describes undefined behaviour when a null
pointer value has been assigned to the pointer. Here we have a null
pointer, not a nuller pointer value assigned to a pointer.

We agreed that it's possible that that text might be imprecise, and
might need to be addressed, did we not? But it's also possible that
it's precise, and there is no undefined behaviour until casting to
'(void)'.

Would you agree?

I certainly agree that the wording in the description of unary "*"
needs to be improved. On the other hand, I think the intent is
reasonably unambiguous.

6.5.3.2p4:

The unary * operator denotes indirection. If the operand points
to a function, the result is a function designator; if it points
to an object, the result is an lvalue designating the object. If
the operand has type ‘‘pointer to type’’, the result
has type ‘‘type’’. If an invalid value has been assigned
to the pointer, the behavior of the unary * operator is undefined.

The phrase "has been assigned to" makes sense only for a pointer
*object*, but the operand of "*" needn't be an lvalue; it can be any
arbitrary expression of pointer type.

The unqualified word "pointer" very often means "pointer object",
but it can also mean "pointer value"; see, for example, the Standard's
description of the value returned by malloc(). I think the author of
the above paragraph just momentarily failed to make the distinction
properly.

The intent is that applying "*" to an invalid pointer value has
undefined behavior; a null pointer is "invalid" in this context).
I'm sure beyond reasonable doubt that the authors intended this to
apply whether the operand is an lvalue or not. Apart from having
it apply only to lvalues being nonsensical, if that were the intent
it could have been worded much more clearly.
 
S

Shao Miller

No.  It is adequately precise, adequately clear, and the undefined behavior
is unambiguous.
Thank you for your opinion. I would have also appreciated some
reasoning, in order to help to convince myself of this. The opinion
is appreciated regardless of the lack of reasoning. Poll-wise,
dereferencing a null pointer is undefined behaviour during
evaluation. Reason-wise, it's still incomplete, for me.
 
S

Shao Miller

It says "if it [the operand] points to an object the result is...".  If
the pointer does not point to an object the behaviour is undefined by
omission.
But it also defines the result in terms of its type, based on the type
of the operand. In our situation, this is defined. The omission of
when the pointer does not point to an object only impacts the
definition of the result, _when_ that result is an _lvalue_. It
doesn't apply to functions, for example. The "has type" clause covers
all three situations:
1. The operand points to an object
2. The operand points to a function
3. The operand points to neither, but has a pointer type
There is a clarifying clause about invalid pointers but it
adds no new meanings, which is fortunate since it uses the clumsy "if an
invalid value has been assigned to the pointer" phrase.
Whole-heartedly agreed as clumsy, if and only if it's not explicitly
there for a good reason.
 So *E is
defined only when E points to a function or when E points to an object.
To which function or object does (char *)0 point?
Its type is defined when the operand has a pointer type. It's
_further_ defined as an lvalue or a function designator, under certain
_additional_ circumstances.
I don't understand your text about "a value for the result of [the]
application" not being "a requirement of the text" but I don't think I
need to.  *E is defined when E points to a function or an object and I
think your example fails both tests.
Again, what do you mean by '*E'? Do you mean "the value of '*E'" or
"the type of '*E'" or both or neither or more than both?
Why?  I should think exactly that.
So when 'p' points to a structure and we apply '*' to it, you suggest
that evaluation entails the requirement for knowing the value of '*p'
altogether? If so, does that knowledge require fetching the value?
Let me get this clear.  You are saying that (void)*(char *)0 is UB due
to the cast
The cast to 'void', yes, since a cast requires a value.
and, presumably, that *(char *)0 is not because there is no
cast?
There is no cast in evaluation of this expression that requires a
value for '*(char *)0'. One versus two casts, above. Note that the
result of '*(char *)0' has type 'char', and is thus not a void
expression. '*(void *)0' has type 'void', is is thus a void
expression.
 If so, I won't ask you to re-rash the argument -- I'll find it in
my news feed if I want to go look.
I apologize for re-hashing it anyway, should that inconvenience you.
Perhaps it'll help someone else.
As you can see from the above, I disagree.
That's fine, and your attention has been appreciated. I mean it.
Also undefined for the same reason.
Uncertain for me.
 
S

Shao Miller

I certainly agree that the wording in the description of unary "*"
needs to be improved.
Agreed. Thank you.
On the other hand, I think the intent is
reasonably unambiguous.
If I had been able to find a discussion concerning this agreed-upon
ambiguity, I would have said "somewhat unambiguous." This is the
first discussion I am aware of, so I would say "all but entirely
unambiguous." :)
The phrase "has been assigned to" makes sense only for a pointer
*object*, but the operand of "*" needn't be an lvalue; it can be any
arbitrary expression of pointer type.
Agreed. Thanks.
The unqualified word "pointer" very often means "pointer object",
but it can also mean "pointer value"; see, for example, the Standard's
description of the value returned by malloc().
Agreed. Thanks.
I think the author of
the above paragraph just momentarily failed to make the distinction
properly.
Possibly. But the use of "operand" in the other sentences, the use of
"value of" in other parts of the text for other operators, does leave
me uncertain.
The intent is that applying "*" to an invalid pointer value has
undefined behavior; a null pointer is "invalid" in this context).
I'm sure beyond reasonable doubt that the authors intended this to
apply whether the operand is an lvalue or not.  Apart from having
it apply only to lvalues being nonsensical, if that were the intent
it could have been worded much more clearly.
This intent certainly seems likely if we treat the responses to this
discussion as providing evidence for intended meaning to common
interpretation. Thank you for this type of evidence, for what it's
worth.
 
S

Seebs

Thank you for your opinion. I would have also appreciated some
reasoning, in order to help to convince myself of this. The opinion
is appreciated regardless of the lack of reasoning. Poll-wise,
dereferencing a null pointer is undefined behaviour during
evaluation. Reason-wise, it's still incomplete, for me.

I really don't think the problem is reasoning, because you've seen tons
of that, and it's had no effect whatsoever. I don't know what the issue
is; this really is pretty straight forward. '(char *) 0' is a pointer, and
it does not point to a valid object, therefore, '*(char *) 0' is undefined
behavior to evaluate. In the abstract machine, expressions are evaluated,
with a few specialized exceptions (such as sizeof). So whether or not you
cast it, or use the value, the expression *is* evaluated, and evaluating the
expression produces undefined behavior.

I honestly can't see what the confusing part is. The expression is
evaluated, and evaluating the expression yields undefined behavior, therefore
undefined behavior occurs. There's nothing tricky or fancy going on.
The only possible problems with the wording are cases in which it's still
clear what is intended, even if the text is poorly-phrased.

-s
 
S

Seebs

But it also defines the result in terms of its type, based on the type
of the operand. In our situation, this is defined. The omission of
when the pointer does not point to an object only impacts the
definition of the result, _when_ that result is an _lvalue_.

No, it impacts the definition of what happens. There is no definition
given for what happens.

Okay, look at it this way:

*(char *) 0

What is the defined result of evaluating this expression? Show us the
explicit definition of what you get when you evaluate this.

If you can't, it's undefined. Failure-to-define is sufficient to make
behavior undefined; even if you don't think there's an explicit statement that
it's undefined, unless you can provide the definition, it's still undefined.

-s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,099
Messages
2,570,626
Members
47,237
Latest member
David123

Latest Threads

Top