Proposal for Amendment to Section 6.5.3.2, Unary * Operator

S

Shao Miller

A proposal for an amendment of C99's section 6.5.3.2 follows:

--- AMENDMENT ---

6.5.3.2 Address and indirection operators

Constraints

1 <NO AMENDMENT PROPOSED>

2 The operand of the unary * operator shall have pointer type. <NO
AMENDMENT PROPOSED>

3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.

4 Except when the unary * operator and its operand are together the
operand to the unary & operator, the operand of the unary * operator
shall not be a pointer type pointing to an incomplete type.

Semantics

5 <AMENDED FROM POINT 3 TO 5. NO OTHER AMENDMENT PROPOSED>

6 <AMENDED FROM POINT 4 TO 6> The unary * operator denotes
indirection. If the operand points to a function, the result is a
function designator; if it points to an object, the result is an
lvalue designating the object. If the operand has type ``pointer to
type'', the result has type ``type''. If the operand is an invalid
value, the behavior of the unary * operator is undefined.87)

Forward references: storage-class specifiers (6.7.1), structure and
union specifiers
(6.7.2.1). <NO AMENDMENT PROPOSED>

.... ... ... <NO AMENDMENTS PROPOSED BEFORE FOOTNOTE 87>

87) Thus, &*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to ((E1)+(E2)). It is always true that if E is a function
designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P
is an lvalue and T is the name of an object pointer type, *(T)P is an
lvalue that has a type compatible with that to which T points.

Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, an address inappropriately aligned for
the type of object pointed to, and the address of an object after the
end of its lifetime.

--- INTENDED AUDIENCE ---

The intended audience for this proposal are:

- The ISO/IEC JTC1/SC22/WG14 international standardization working
group for the programming language C

- C implementors (developers of C implementations with conformance to
the standard in mind)

- C programmers

--- RATIONALE ---

The previous text in the C99 international standard allowed for
certain language constructs to lead to undefined behaviour which a C
implementation might choose to define. This amendment encourages
implementators and programmers to avoid allowance of and usage of
these constructs, respectively, to help to maximize the portability of
a C program across different implementations.

These language constructs claimed above as leading to undefined
behaviour are perhaps rarely explored or used historically, so there
should be little impact to implementations beyond two additional
constraint violations to diagnose.

--- IMPLICATIONS FOR C IMPLEMENTORS ---

The amendment's constraint 3 requires a diagnostic message given code
like:

int main(void) {
int i = *(int *)0;
return 0;
}

because it is known at translation-time (pre-evaluation) that this is
an attempt to access an object pointed to by a null pointer, which is
erroneous.

The amendment still allows for:

#include <stddef.h>
size_t sz = sizeof *(int *)0;

and:

int *ip = &*(int *)0;

The amendment's constraint 4 requires a diagnostic message given code
like:

void func(void *vp) {
return *vp;
}

and:

int main(void) {
int i = 5;
int *ip = &i;
*(void *)ip;
return 0;
}

and:

#include <stddef.h>
struct s;
struct s *sp = NULL;
size_t sz = sizeof *(struct s *)ip;

because:

- it is intended to disallow the unary-expression of the unary *
operator and its operand to constitute a void expression merely
because the operand has type ``pointer to void'' (satisfying
constraint 2 and pre-amendment semantic point 4, sentence 3).

- it is intended to disallow the unary-expression of the unary *
operator and its operand to produce a result which has an incomplete
type.

The amendment still allows for:

int main(void) {
int i = 5;
void *vp = &i;
int *ip = &*vp;
return 0;
}

and:

#include <stddef.h>
struct s;
struct s *sp = NULL;
int main(void) {
void *vp = &*sp;
return 0;
}

where it is intended to allow the semantics to be followed so that the
value of a pointer to an incomplete type can be assigned to another
pointer, by not evaluating either the unary & or unary * operator.

The amendment's semantic point 6 removes the mention of an assignment
and should have no implications.

--- IMPLICATIONS FOR C PROGRAMMERS ---

Any code applying the unary * operator to a null pointer constant, or
cast thereof, outside of as an operand to sizeof or unary & will cause
a diagnostic message as a constraint violation.

Any code applying the unary * operator to a pointer to an incomplete
type outside of as an operand to unary & will cause a diagnostic
message as a constraint violation.

Any code that assumes there must be an invalid value assigned to some
pointer in order for the behaviour to be undefined for applying the
unary * operator to any operand is erroneous code, by this amendment.

--- CONSTRUCTIVE FEEDBACK ---

If you please! Thank you!

- Shao Miller
 
S

Shao Miller

Gah.  "Implementators" is a silly typo.
Double-gah. The code example:

#include <stddef.h>
struct s;
struct s *sp = NULL;
size_t sz = sizeof *(struct s *)ip;

should instead read:

#include <stddef.h>
struct s;
struct s *sp = NULL;
size_t sz = sizeof *sp;
 
M

Marcin Grzegorczyk

Shao said:
[...]
4 Except when the unary * operator and its operand are together the
operand to the unary& operator, the operand of the unary * operator
shall not be a pointer type pointing to an incomplete type.

Instead of "pointer type pointing to an incomplete type", you could just
say "pointer to an incomplete type". In addition, recently there has
been a consensus that completeness is a scoped property of a type
(see N1439), and the current C1X draft has adopted the change, so it
should really be "pointer to an incomplete object type".
[...]
87) Thus,&*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to ((E1)+(E2)). It is always true that if E is a function
designator or an lvalue that is a valid operand of the unary&
operator, *&E is a function designator or an lvalue equal to E. If *P
is an lvalue and T is the name of an object pointer type, *(T)P is an
lvalue that has a type compatible with that to which T points.

Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, an address inappropriately aligned for
the type of object pointed to, and the address of an object after the
end of its lifetime.

Perhaps that's because it's getting late here, but I fail to see any
difference between the above wording and that of footnote 87 in N1256.
 
S

Shao Miller

Shao said:
[...]
4 Except when the unary * operator and its operand are together the
operand to the unary&  operator, the operand of the unary * operator
shall not be a pointer type pointing to an incomplete type.

Instead of "pointer type pointing to an incomplete type", you could just
say "pointer to an incomplete type".  In addition, recently there has
been a consensus that completeness is a scoped property of a type
(see N1439), and the current C1X draft has adopted the change, so it
should really be "pointer to an incomplete object type".
Thanks for the constructive feedback, Marcin!

I will have to look for this C1X draft. Is 'void' considered an
incomplete object type in C1X, do you know? I will try to find the
draft you've kindly referred to and find out, but if it's readily
available in your memory, please do offer it. :)
[...]
87) Thus,&*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to ((E1)+(E2)). It is always true that if E is a function
designator or an lvalue that is a valid operand of the unary&
operator, *&E is a function designator or an lvalue equal to E. If *P
is an lvalue and T is the name of an object pointer type, *(T)P is an
lvalue that has a type compatible with that to which T points.
Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, an address inappropriately aligned for
the type of object pointed to, and the address of an object after the
end of its lifetime.

Perhaps that's because it's getting late here, but I fail to see any
difference between the above wording and that of footnote 87 in N1256.

Excellent catch. The amendment should have:

87) <NO AMENDMENT PROPOSED FOR FOOTNOTE>

My sincere apologies for the time spent comparing. :(

And another thank-you.
 
T

Tim Rentsch

Shao Miller said:
A proposal for an amendment of C99's section 6.5.3.2 follows:
[snip unaffected passages.]

3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.

Worse than pointless.

4 Except when the unary * operator and its operand are together the
operand to the unary & operator, the operand of the unary * operator
shall not be a pointer type pointing to an incomplete type.

A waste of time, or (more likely) worse.
 
S

Shao Miller

Shao Miller said:
A proposal for an amendment of C99's section 6.5.3.2 follows:
[snip unaffected passages.]
3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.

Worse than pointless.
Are you suggesting that an implementation should be allowed to choose
to define the behaviour upon application of unary '*' to a null
pointer constant expression during translation? Are you suggesting
that there's no value in issuing a diagnostic message to the
programmer that they are trying to do something which is silly? I'm
sure you have good reason(s) for saying this. Would you kindly
share? Is there already a diagnostic message that covers this?
A waste of time, or (more likely) worse.
Is there already a diagnostic message that covers this? Are there
circumstances in which it might be useful to apply unary '*' to an
incomplete type?
 
M

Marcin Grzegorczyk

Shao said:
[...]
I will have to look for this C1X draft. Is 'void' considered an
incomplete object type in C1X, do you know?

Yes; the wording of 6.2.5p19 is unchanged from N1256 except that
"incomplete type" has been replaced with "incomplete object type".
I will try to find the
draft you've kindly referred to and find out, but if it's readily
available in your memory, please do offer it. :)

All the documents published by WG14 can be easily found on their
website: <http://www.open-std.org/jtc1/sc22/wg14/www/documents>
 
E

Eric Sosman

Shao Miller said:
A proposal for an amendment of C99's section 6.5.3.2 follows:
[snip unaffected passages.]
3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.

Worse than pointless.
Are you suggesting that an implementation should be allowed to choose
to define the behaviour upon application of unary '*' to a null
pointer constant expression during translation?

Yes, of course. An implementation is always permitted to
provide its own definitions for behaviors the Standard leaves
undefined. In fact, it may be impossible for an implementation
to avoid doing so: If your program indulges in undefined behavior,
it will do *something* on any concrete implementation, and that
something, whatever it is, serves as a definition.

For example, many implementations define the behavior upon an
attempted dereference of NULL as "a SIGSEGV signal occurs."
 
I

Ike Naar

3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.

The "unary * operator shall not be a null pointer constant expression"
seems already covered by the standard:

An integral constant expression with the value 0, or such an
expression cast to type void * , is called a null pointer constant. If
a null pointer constant is assigned to or compared for equality to a
pointer, the constant is converted to a pointer of that type. Such a
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.

This says a null pointer constant (`` 0 '', `` 1/42 '', `` '-'-'-' '', etc)
is converted to a pointer when it is assigned to or compared for equality
to a pointer, not when it is the operand of an ``*'' operator.
So `` * 0 '' applies the ``*'' operator to an integer, and that's a
violation of the constraint

The operand of the unary * operator shall have pointer type.

A little experiment with three available compilers shows that
all of them reject the construct `` * 0 '':
invalid type argument of `unary *' (GNU)
cannot dereference non-pointer type (Sun)
error: operand of "*" must be a pointer (Comeau)
 
L

lawrence.jones

In comp.std.c Shao Miller said:
Are you suggesting that an implementation should be allowed to choose
to define the behaviour upon application of unary '*' to a null
pointer constant expression during translation?

Many implementations currently allow dereferencing a null pointer (at
least for reading) to access the data at address 0. Although no C
object can be located there, the data might still be interesting. Other
implementations trap when a null pointer is dereferenced (at least for
writing) and some programs depend on that behaviour to produce a trap.
Why should such implementations be prohibited from allowing such
accesses in the most obvious manner? Particularly since dereferencing a
null pointer constant by accident is exceedingly unlikely.
 
K

Keith Thompson

The "unary * operator shall not be a null pointer constant expression"
seems already covered by the standard:

An integral constant expression with the value 0, or such an
expression cast to type void * , is called a null pointer constant. If
a null pointer constant is assigned to or compared for equality to a
pointer, the constant is converted to a pointer of that type. Such a
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.

This says a null pointer constant (`` 0 '', `` 1/42 '', `` '-'-'-' '', etc)
is converted to a pointer when it is assigned to or compared for equality
to a pointer, not when it is the operand of an ``*'' operator.
So `` * 0 '' applies the ``*'' operator to an integer, and that's a
violation of the constraint

The operand of the unary * operator shall have pointer type.

A little experiment with three available compilers shows that
all of them reject the construct `` * 0 '':
invalid type argument of `unary *' (GNU)
cannot dereference non-pointer type (Sun)
error: operand of "*" must be a pointer (Comeau)

(void*)0 is also a null pointer constant.

Shao is suggesting, I think, that *(void*)0 should be a constraint
violation, not merely undefined behavior.
 
I

Ike Naar

[email protected] (Ike Naar) said:
Shao Miller said:
3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.

The "unary * operator shall not be a null pointer constant expression"
seems already covered by the standard:
[snip]
So `` * 0 '' applies the ``*'' operator to an integer, and that's a
violation of the constraint
[snip]

(void*)0 is also a null pointer constant.

Shao is suggesting, I think, that *(void*)0 should be a constraint
violation, not merely undefined behavior.

Okay.

Looking at the other half of Shao's amendment #3:
3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

According to the current standard, *(void*)0 and *(T*)0 for type T
are merely undefined behaviour, not a constraint violation.
Implementations are still allowed to define these constructs in
some way for their own purpose, and there exist implementations
that use this freedom, e.g. to implement the offsetof macro.

Wouldn't amendment #3 break such implementations?
 
S

Shao Miller

A proposal for an amendment of C99's section 6.5.3.2 follows:
[snip unaffected passages.]
3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.
Worse than pointless.
Are you suggesting that an implementation should be allowed to choose
to define the behaviour upon application of unary '*' to a null
pointer constant expression during translation?

     Yes, of course.  An implementation is always permitted to
provide its own definitions for behaviors the Standard leaves
undefined.  In fact, it may be impossible for an implementation
to avoid doing so: If your program indulges in undefined behavior,
it will do *something* on any concrete implementation, and that
something, whatever it is, serves as a definition.

     For example, many implementations define the behavior upon an
attempted dereference of NULL as "a SIGSEGV signal occurs."
Right. Absolutely. But why should anyone attempt to dereference a
null pointer _constant_ via unary '*'? The proposal concerned
constraints whose violations can be determined at translation-time.
Null pointer dereferencing at run-time should not be any different.

It seems to me that one would only wish to dereference a null pointer
constant via unary '*' if they were expecting an object at address
0... Such as the interrupt vector table on x86, for instance, without
mapping it somewhere else. If this is a reason why this isn't
_already_ a constraint, so be it. :)
 
S

Shao Miller

[email protected] (Ike Naar) said:
3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.
The "unary * operator shall not be a null pointer constant expression"
seems already covered by the standard:
 [snip]
So `` * 0 '' applies the ``*'' operator to an integer, and that's a
violation of the constraint
 [snip]
(void*)0 is also a null pointer constant.
Shao is suggesting, I think, that *(void*)0 should be a constraint
violation, not merely undefined behavior.

Okay.

Looking at the other half of Shao's amendment #3:> 3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.

                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

According to the current standard, *(void*)0 and *(T*)0 for type T
are merely undefined behaviour, not a constraint violation.
Implementations are still allowed to define these constructs in
some way for their own purpose, and there exist implementations
that use this freedom, e.g. to implement the offsetof macro.

Wouldn't amendment #3 break such implementations?
Yes, Keith is right. '0' is already a constraint violation, since
it's not a pointer. '0' (and its friends that you listed) cast to any
pointer type would be against the proposed constraint, since it's
silly to write code that knowingly attempts this... Right?

GCC (as an easily accessible example) implements 'offsetof' using the
binary '->' membership operator, but not unary '*' (if I recall
correctly). Perhaps you are quite right and an implementation might
use '&(*(type *)0).member' rather than '&((type *)0)->member'. But
then, if they allow for the dereference there... Isn't that a can
o'worms against application of unary '*' to a null pointer in the rest
of a programmer's code?

Of course, a switch to turn off conformance (due to special
circumstances) might be an option, too.
 
S

Shao Miller

Many implementations currently allow dereferencing a null pointer (at
least for reading) to access the data at address 0.  Although no C
object can be located there, the data might still be interesting.  Other
implementations trap when a null pointer is dereferenced (at least for
writing) and some programs depend on that behaviour to produce a trap.
Why should such implementations be prohibited from allowing such
accesses in the most obvious manner?  Particularly since dereferencing a
null pointer constant by accident is exceedingly unlikely.
Many implementations currently allow dereferencing a null pointer (at
least for reading) to access the data at address 0. Although no C
object can be located there, the data might still be interesting. Other
implementations trap when a null pointer is dereferenced (at least for
writing) and some programs depend on that behaviour to produce a trap.
Why should such implementations be prohibited from allowing such
accesses in the most obvious manner? Particularly since dereferencing a
null pointer constant by accident is exceedingly unlikely.
Aha. Do you mean as 'char' or such, Larry? As in, not a C object of
the program nor of the implementation, but of the external
environment.

I agree that dereferencing a null pointer constant by accident is
unlikely.

But for such interesting scenarios, could switching off conformance
for this constraint be an acceptable price to pay for a programmer?
Or is undefined behaviour better here; allowing the implementation to
make the choice and define the behaviour? The proposal has
portability in mind, but perhaps this rare circumstance _needs_ to be
un-portable?

Some additional thank-yous are in order. Thanks Tim, Eric, Ike,
Keith, Larry!
 
R

Richard Kettlewell

Many implementations currently allow dereferencing a null pointer (at
least for reading) to access the data at address 0. Although no C
object can be located there, the data might still be interesting.
Other implementations trap when a null pointer is dereferenced (at
least for writing) and some programs depend on that behaviour to
produce a trap. Why should such implementations be prohibited from
allowing such accesses in the most obvious manner? Particularly since
dereferencing a null pointer constant by accident is exceedingly
unlikely.

It's perhaps also worth mentioning that any given implementation is
already free to warn about *NULL without any change to the standard at
all, just as they warn about all sorts of other things.

Perhaps Shao should be lobbying implementors instead?
 
J

James Kuyper

On Aug 1, 11:08�pm, (e-mail address removed) wrote: ....
Aha. Do you mean as 'char' or such, Larry? As in, not a C object of
the program nor of the implementation, but of the external
environment.

"char" has nothing to do with it; It could just as easily be *(struct
tm*)0 that is the "interesting data" stored in that memory location.

Also, it's entirely possible that it is a block of memory used by the
implementation, so long as it's a block of memory not accessible to C
programs with defined behavior. That excludes not only the objects
defined by the C program, but also those objects defined by the C
standard library whose addresses are made available to the program, such
as the char array whose address is returned by asctime(). That's because
a null pointer must not compare equal to any of those addresses.

Also, keep in mind that the "implementation" includes everything needed
to make a C program behave in the manner required by the standard - it
therefore includes not only the compiler and the linker, but also the
operating system and even the hardware that all of that code is running
on. Therefore, memory used by the operating system or the hardware also
counts as memory used by the implementation.

....
But for such interesting scenarios, could switching off conformance
for this constraint be an acceptable price to pay for a programmer?
Or is undefined behaviour better here; allowing the implementation to
make the choice and define the behaviour? The proposal has
portability in mind, but perhaps this rare circumstance _needs_ to be
un-portable?

No, the whole point is of making it undefined is to avoid forcing such
implementations to be non-conforming.

This is one of the common reasons for behavior not being defined by the
C standard: there exist multiple different behaviors, each of which is
the most reasonable behavior in at least one environment, so the
committee decided not mandate any one of those behaviors. If the list of
reasonable behaviors is small enough, or at least easily described, the
standard may leave the behavior unspecified, but require that it to be
one of the items on such a list. However, when the reasonable range of
behaviors is sufficiently varied or sufficiently extreme, the standard
goes one step further and makes the behavior undefined, as it does in
this case.

A fundamental objective of the C standard is to be flexible enough to
allow fully conforming implementations just about everywhere; as a
result, there are in fact fully conforming C implementations for just
about every platform (at least, for C89 - fully conforming
implementations of C99 are much rarer). A great many of the languages
that are not as flexible, are implemented by relying upon C in some way,
precisely because they can rely upon C being available. One cost of this
flexibility is that the programmer cannot count on the behavior of
certain constructs being the same on all implementations of C.

However, that just means that such constructs should only be used in
code that's intended to be implementation-specific; and even then, such
constructs should only be used if more portable ways of achieving the
same objective are either not available or unacceptably inefficient.
 
E

Eric Sosman

A proposal for an amendment of C99's section 6.5.3.2 follows:
[snip unaffected passages.]
3 Except when the unary * operator and its operand are not evaluated,
the operand of the unary * operator shall not be a null pointer
constant expression, or such an expression cast to any pointer type.
Worse than pointless.
Are you suggesting that an implementation should be allowed to choose
to define the behaviour upon application of unary '*' to a null
pointer constant expression during translation?

Yes, of course. An implementation is always permitted to
provide its own definitions for behaviors the Standard leaves
undefined. In fact, it may be impossible for an implementation
to avoid doing so: If your program indulges in undefined behavior,
it will do *something* on any concrete implementation, and that
something, whatever it is, serves as a definition.

For example, many implementations define the behavior upon an
attempted dereference of NULL as "a SIGSEGV signal occurs."
Right. Absolutely. But why should anyone attempt to dereference a
null pointer _constant_ via unary '*'?

To invoke the behavior a particular implementation defines
for that construct, of course.
The proposal concerned
constraints whose violations can be determined at translation-time.
Null pointer dereferencing at run-time should not be any different.

"Violation?" It is not a "violation" to invoke undefined
behavior, it's just venturing beyond the Standard's own guarantees.
The fact that the Standard is silent on some point does not imply
that all other standards and system definitions are.
It seems to me that one would only wish to dereference a null pointer
constant via unary '*' if they were expecting an object at address
0... Such as the interrupt vector table on x86, for instance, without
mapping it somewhere else. If this is a reason why this isn't
_already_ a constraint, so be it. :)

Your example, it seems to me, answers your "why should anyone?"
question pretty well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,114
Members
46,702
Latest member
VernitaGow

Latest Threads

Top