Proposal for Amendment to Section 6.5.3.2, Unary * Operator

A

Andrew Smallshaw

Many implementations currently allow dereferencing a null pointer (at
least for reading) to access the data at address 0. Although no C
object can be located there, the data might still be interesting. Other
implementations trap when a null pointer is dereferenced (at least for
writing) and some programs depend on that behaviour to produce a trap.

That was precisely the reason that came to mind for me: deliberately
triggering an exception as a kind of "hardcoded breakpoint" to
trace through interesting areas of code without the need to set
large numbers of breakpoints manually in the debugger. There are
cleaner ways of doing this of course, such as assertions, but not
all platforms trigger a breakpoint on a failed assert(). It seems
to me that (in C at least) attempting to second guess what the
programmer may or may not want to do is generally a bad idea.
 
S

Shao Miller

"char" has nothing to do with it; It could just as easily be *(struct
tm*)0 that is the "interesting data" stored in that memory location.
Agreed. I should have said "a C type" instead of "'char' or
such". :) Sorry. What I was trying to ask of Larry was how "Many
implementations currently allow...to access the data at address 0" if
"no C object can be located there". I think he meant "by using a C
type,"; that we pretend/act-as-if there's a C object there. 'char'
was just an example of a C type.
Also, it's entirely possible that it is a block of memory used by the
implementation, so long as it's a block of memory not accessible to C
programs with defined behavior. That excludes not only the objects
defined by the C program, but also those objects defined by the C
standard library whose addresses are made available to the program, such
as the char array whose address is returned by asctime(). That's because
a null pointer must not compare equal to any of those addresses.
Right.

Also, keep in mind that the "implementation" includes everything needed
to make a C program behave in the manner required by the standard - it
therefore includes not only the compiler and the linker, but also the
operating system and even the hardware that all of that code is running
on. Therefore, memory used by the operating system or the hardware also
counts as memory used by the implementation.
Whatever provides the abstract machine; sure. :)
...


No, the whole point is of making it undefined is to avoid forcing such
implementations to be non-conforming.
But that is actually a "yes" and not a "no." :)
This is one of the common reasons for behavior not being defined by the
C standard: there exist multiple different behaviors, each of which is
the most reasonable behavior in at least one environment, so the
committee decided not mandate any one of those behaviors. If the list of
reasonable behaviors is small enough, or at least easily described, the
standard may leave the behavior unspecified, but require that it to be
one of the items on such a list. However, when the reasonable range of
behaviors is sufficiently varied or sufficiently extreme, the standard
goes one step further and makes the behavior undefined, as it does in
this case. Right.


A fundamental objective of the C standard is to be flexible enough to
allow fully conforming implementations just about everywhere; as a
result, there are in fact fully conforming C implementations for just
about every platform (at least, for C89 - fully conforming
implementations of C99 are much rarer). A great many of the languages
that are not as flexible, are implemented by relying upon C in some way,
precisely because they can rely upon C being available. One cost of this
flexibility is that the programmer cannot count on the behavior of
certain constructs being the same on all implementations of C.
Right. C has a great Standard. :)
However, that just means that such constructs should only be used in
code that's intended to be implementation-specific; and even then, such
constructs should only be used if more portable ways of achieving the
same objective are either not available or unacceptably inefficient.
Agreed.

So then, the proposal seems to have negative merit by your post, since
the cost of removing implementations' freedoms (via conformance) on
the matter does not outweigh the gain for programmers of their code
being well-defined by the Standard (portable), where that code
involves the current subject matter. That is, such code needs to be
un-portable. Fair enough.

Thanks, James. :)
 
S

Shao Miller

     "Violation?"  It is not a "violation" to invoke undefined
behavior, it's just venturing beyond the Standard's own guarantees.
The fact that the Standard is silent on some point does not imply
that all other standards and system definitions are.
Perhaps I wasn't clear, sorry. I was talking about the constraints of
the proposal. There would be no undefined behaviour, granting the
constraints of the proposal, so there would be no invocation for
undefined behaviour. I was not suggesting that there is currently any
constraint violation by attempting these things. There isn't, and
that's the point of the proposal.
     Your example, it seems to me, answers your "why should anyone?"
question pretty well.
Thanks, Eric. :)
 
S

Shao Miller

That was precisely the reason that came to mind for me: deliberately
triggering an exception as a kind of "hardcoded breakpoint" to
trace through interesting areas of code without the need to set
large numbers of breakpoints manually in the debugger.  There are
cleaner ways of doing this of course, such as assertions, but not
all platforms trigger a breakpoint on a failed assert().  It seems
to me that (in C at least) attempting to second guess what the
programmer may or may not want to do is generally a bad idea.
Thanks, Andrew. :)

I get the impression that multiple posters believe that:
- Dereferencing a null pointer or a 'void *' has its uses
- Those uses are implementation-specific
- Dereferencing a null pointer or a 'void *' is thus deliberately
undefined

Fair enough. :)
 
S

Shao Miller

They are trying to do something which _you think_ is silly. The very
few times that I have written code like that I had very good reasons
to do it. It is very unlikely that anyone would write that kind of
code not knowing what they are doing.
Wrong. I don't think it's silly at all. If you read other posts of
mine in comp.lang.c or other posts in this very thread ("x86 interrupt
vector table"), I think you'll reconsider this statement.

But now it has been established by multiple posters why this does not
belong in the C Standard. It was possible that posters might have
said, "Yeah! Great proposal." There was only one way to find out.
 
S

Shao Miller

Let's say you write code for MacOS 9 and you want to make sure that
you detect when your code dereferences null handles. What would you
do?
I don't see what this has to do with the bit of post you've included.

Translation-time: Null pointer constants -> integer constant
expression
Run-time: Null pointer -> The value of an object yielding a null
pointer

int main(void) {
char byte_at_zero;
/* Translation-time null pointer dereference */
byte_at_zero = *(char *)0;
return 0;
}

int main(void) {
int x = 0;
char *cp = (char *)x;
char byte_at_zero;
/* Run-time null pointer dereference */
byte_at_zero = *cp;
return 0;
}

See how the latter requires a little extra effort to dereference a
null pointer while the former would issue a diagnostic message by the
proposed amendment?
If you don't know, then you don't know enough to figure out the
negative consequences of your proposal.
Do you understand the above implications of the proposed constraints?

In your question "What would you do?", _when_ do you wish to determine
if your code dereferences null pointers? _Before_ execution or
_after_ execution has initiated?
 
S

Seebs

They are trying to do something which _you think_ is silly. The very
few times that I have written code like that I had very good reasons
to do it. It is very unlikely that anyone would write that kind of
code not knowing what they are doing.

Exactly. "Undefined" is precisely right; it should not be a constraint
violation, nor should it have defined behavior. An implementation is welcome
to document what they choose to do in the case of a particular kind of
undefined behavior, and some do to good ends.

-s
 
S

Shao Miller

It wasn't about dereferencing null pointers, it was about
dereferencing null _handles_ in MacOS 9 and earlier.
Well if you really think that it would be such a terribly good idea
for me to put my reading glasses on, then I shall oblige. :) You
could have written "null jellyfish" and it probably still would have
read as "null pointers" at the time.
A "handle" in
that OS is a pointer to a pointer to the memory that you want to
access. Dereferencing a null handle would first read a pointer from
location zero in memory, then dereference that pointer. And since
memory location zero was not protected in any way, and contained zero
bytes, reading from a null handle would read garbage data, while
writing through a null handle would like crash your system.
Right.

Executing a simple statement * (long *) NULL = 0xdeadbeef;
Ugh. Well that's perfectly fair.
would
rectify the situation. Reading through a null handle would first read
the value 0xdeadbeef, converted to a pointer, and dereferencing that
pointer would give a bus error - exactly what you want. And writing
through a null handle would also give a bus error, crashing the
application but leaving the system running - exactly what you want.
What I think you are saying is that the proposal would require major
code changes, where:

* (long *) NULL = 0xdeadbeef;

would require replacement with something like:

typedef long *handle;
handle safety_handle = NULL;
/* ... */
*safety_handle = 0xdeadbeef;

Which would obviously be a nuisance on established codebases.
You are proposing, for no good reason whatsoever, a change that makes
this illegal. As I said, you didn't understand the consequences.
It's totally logical for you to believe that. I have to disagree,
nonetheless. Please do not equate the proposal and my arguments with
a personal preference nor with a lack of understanding of the
consequences. It's been a good sharing of ideas, in my opinion.

Thank you for demonstrating your case for why this proposal does not
belong in any standard of C, Christian. :)
 
S

Shao Miller

Is there already a diagnostic message that covers this?  Are there
circumstances in which it might be useful to apply unary '*' to an
incomplete type?

Sure we can throw away the amended point #3, since we can agree that:

Real C programs depend on the undefined behaviour of "dereferencing"
a null pointer constant cast to a complete object type under certain
circumstances.

But as for the amended point #4, does anyone have thoughts that they
might like to share?

Consider the following:

#include <stdlib.h>

int main(void) {
void *vp;
size_t sz = sizeof *vp;
return 0;
}

Would it be useful for the Standard to define this as a constraint
violation? Could there be any code anywhere that depends on
"dereferencing" a pointer to an incomplete object type?

Thanks for thinking about it.
 
B

Ben Bacarisse

Shao Miller said:
Consider the following:

#include <stdlib.h>

int main(void) {
void *vp;
size_t sz = sizeof *vp;
return 0;
}

Would it be useful for the Standard to define this as a constraint
violation? Could there be any code anywhere that depends on
"dereferencing" a pointer to an incomplete object type?

It already is, isn't it? (6.5.3.4 p1).

Incidentally, I would not describe the above as dereferencing a pointer
since the * is not evaluated.
 
S

Shao Miller

It already is, isn't it?  (6.5.3.4 p1).

How do you know the type of '*vp'? How do you know that it's an
incomplete object type, exactly? Perhaps we have undefined behaviour
before we reach the relevance of 6.5.3.4 p1?

As an easily accessible example, GCC issues a warning for
"dereferencing `void *' pointer" in:

#include <stdlib.h>

int main(void) {
void *vp = NULL;
*vp;
return 0;
}

But could GCC and C, in general, benefit from an explicit constraint
violation? Are there any disadvantages to the amended constraint #4,
as modified by Mr. M. Grzegorczyk's suggestion? Note that GCC aborts
translation with "dereferencing pointer to incomplete type" for:

#include <stdlib.h>

int main(void) {
struct foo;
struct foo *fp = NULL;
*fp;
return 0;
}

Obviously GCC is != every C implementation, but does it make sense
that these two incomplete types can be treated differently by an
implementation?
Incidentally, I would not describe the above as dereferencing a pointer
since the * is not evaluated.

They were two separate questions and I agree with you. The second
question was not intended to describe the code example. Sorry about
the confusion.

Thanks as always, Ben!
 
B

Ben Bacarisse

Shao Miller said:
How do you know the type of '*vp'? How do you know that it's an
incomplete object type, exactly?

From you previous posting I can't see why you are asking this. The
answer is "by looking at the declaration of vp".
Perhaps we have undefined behaviour
before we reach the relevance of 6.5.3.4 p1?

What happens before evaluating the sizeof expression that might be UB?

<snip a different question about void pointers>
 
N

Nick Keighley

Sure we can throw away the amended point #3, since we can agree that:

  Real C programs depend on the undefined behaviour of "dereferencing"
a null pointer constant cast to a complete object type under certain
circumstances.

how can you "depend on undefined behaviour"? It's semantic nonsense.
But as for the amended point #4, does anyone have thoughts that they
might like to share?

Consider the following:

#include <stdlib.h>

int main(void) {
  void *vp;
  size_t sz = sizeof *vp;
  return 0;

}

Would it be useful for the Standard to define this as a constraint
violation?

well, its an error. Isn't it alread a CV?
 Could there be any code anywhere that depends on
"dereferencing" a pointer to an incomplete object type?

there's all sorts of wierd and wrong code out there
 
J

James Kuyper

how can you "depend on undefined behaviour"? It's semantic nonsense.

What I think he means is "depend upon definitions of the behavior from
other sources when the behavior is undefined according to the C
standard", which is, of course, quite different from what he actually said.

A program can depend upon such definitions, and many (most?) do, one way
or another. Those definitions can come from other standards, or be
specific to a particular platform or implementation.
 
M

Marcin Grzegorczyk

Shao said:
[...]
As an easily accessible example, GCC issues a warning for
"dereferencing `void *' pointer" in:

#include<stdlib.h>

int main(void) {
void *vp = NULL;
*vp;
return 0;
}

GCC treats pointers to void in a somewhat unusual way -- almost like
pointers to char, and even allows arithmetic on them as an extension.
[...] Note that GCC aborts
translation with "dereferencing pointer to incomplete type" for:

#include<stdlib.h>

int main(void) {
struct foo;
struct foo *fp = NULL;
*fp;
return 0;
}

Obviously GCC is != every C implementation, but does it make sense
that these two incomplete types can be treated differently by an
implementation?

Apparently it did make sense to some GCC developer.

Keep in mind that the difference between undefined behaviour and a
constraint violation is really only that the latter *requires* a
diagnostic. A constraint violation does not prevent a successful
translation; footnote 8 in 5.1.1.3 makes that pretty clear.

Thus, even if dereferencing a pointer to an incomplete type were made a
constraint violation (which I agree might be a good idea; I think most
implementations do issue a diagnostic in this case), it would not make a
difference to GCC. It would still be allowed to compile your first
example, as long as it produced some diagnostic message - which it
already does.
 
S

Shao Miller

Ben said:
From you previous posting I can't see why you are asking this. The
answer is "by looking at the declaration of vp".

While I believe that we're in agreement anyway, I asked about '*vp' and
not 'vp'. What I think you are suggesting is that the type of 'vp' is
enough to determine the type of '*vp'. Surely the type of an expression
is a translation-time property as I've previously argued and you seem to
agree...

But it's been argued (for unary '*') that not matching "If the operand
points to a function...if it points to an object..." yields undefined
behaviour. So might it be useful to define a constraint violation
instead of undefined behaviour? That's what I'm asking, even before we
get to 'sizeof'.

An example would be an implementation taking '*vp' and spontaneously
changing the type of the expression to 'char' without a diagnostic,
which would be a legitimate type for 'sizeof', but surely not a Good
Thing... Or is it?
What happens before evaluating the sizeof expression that might be UB?

Ah, see above.
<snip a different question about void pointers>

Actually, it wasn't. It was about an incomplete object type; a
'struct'. I asked because it's another incomplete object type for
consideration of the amended point #4.
 
S

Shao Miller

Marcin said:
Shao said:
[...]
As an easily accessible example, GCC issues a warning for
"dereferencing `void *' pointer" in:

#include<stdlib.h>

int main(void) {
void *vp = NULL;
*vp;
return 0;
}

GCC treats pointers to void in a somewhat unusual way -- almost like
pointers to char, and even allows arithmetic on them as an extension.
Right.
[...] Note that GCC aborts
translation with "dereferencing pointer to incomplete type" for:

#include<stdlib.h>

int main(void) {
struct foo;
struct foo *fp = NULL;
*fp;
return 0;
}

Obviously GCC is != every C implementation, but does it make sense
that these two incomplete types can be treated differently by an
implementation?

Apparently it did make sense to some GCC developer.

Right.

Keep in mind that the difference between undefined behaviour and a
constraint violation is really only that the latter *requires* a
diagnostic. A constraint violation does not prevent a successful
translation; footnote 8 in 5.1.1.3 makes that pretty clear.

Agreed; that's the point of the amended point #4.
Thus, even if dereferencing a pointer to an incomplete type were made a
constraint violation (which I agree might be a good idea; I think most
implementations do issue a diagnostic in this case), it would not make a
difference to GCC. It would still be allowed to compile your first
example, as long as it produced some diagnostic message - which it
already does.

Agreed. I don't mind the translation. But the diagnostic sure seems
sensible.
 
S

Shao Miller

James said:
What I think he means is "depend upon definitions of the behavior from
other sources when the behavior is undefined according to the C
standard", which is, of course, quite different from what he actually said.

"Quite different"?

If every instance of "undefined behavior" was replaced with "the
behavior is undefined according to the C standard" in this newsgroup,
how much longer would it take to read posts?

"Definitions of the behavior from other sources"? It might be that the
only reason a real C program works is because a programmer has noted
that an implementation consistently translates a certain form of
undefined behaviour in a certain way; there might be no source code for
the implementation and the implementation mightn't document its choices.

If it's still confusing, Mr. N. Keighley, simply see other posters'
posts in this thread. There are real C programs which depend on
application of unary '*' to a null pointer constant cast to a complete
object type. Such application is UB. Thus, real C programs depend on
UB. Or, (hopefully equivalently) such application yields behaviour
which is undefined according to the C Standard. Thus, real C programs
depend on behavior which is undefined according to the C Standard.
A program can depend upon such definitions, and many (most?) do, one way
or another. Those definitions can come from other standards, or be
specific to a particular platform or implementation.

Agreed. Thanks for attempting to offer clarification, Mr. J. Kuyper.
Why "quite different," I do not understand but do not need to.
 
S

Shao Miller

Shao said:
If every instance of "undefined behavior" was replaced with "the
behavior is undefined according to the C standard" in this newsgroup,
how much longer would it take to read posts?

Please allow me to attempt to compact two potential responses into one post:

A: You mean "newsgroups"

B: Congratulations.
 
B

Ben Bacarisse

Shao Miller said:
While I believe that we're in agreement anyway, I asked about '*vp'
and not 'vp'.

And I answered about *vp. If you'd asked about the type of 1 + x I'd
said you look at the declaration of x. The same is true of *vp -- it's
the declaration of vp that answers your question (unless I've
misunderstood the question, but it seem clear enough).
What I think you are suggesting is that the type of
vp' is enough to determine the type of '*vp'. Surely the type of an
expression is a translation-time property as I've previously argued
and you seem to agree...

Yes, though I remember the history rather differently.
But it's been argued (for unary '*') that not matching "If the operand
points to a function...if it points to an object..." yields undefined
behaviour. So might it be useful to define a constraint violation
instead of undefined behaviour? That's what I'm asking, even before
we get to 'sizeof'.

Then it was a bad idea to start with an example that is already a
constraint violation. I have no opinion on any other constraints you'd
like to add, at least not unless you define what and when they might
occur.
An example would be an implementation taking '*vp' and spontaneously
changing the type of the expression to 'char' without a diagnostic,
which would be a legitimate type for 'sizeof', but surely not a Good
Thing... Or is it?

No, and it would not be a C compiler if it did. I am no sure there is
any value in having C include extra constraints for non-C compilers to
ignore.

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top