Hi!
There is a system where 0x0 is a valid address, but 0xffffffff isn't.
How can null pointers be treated by a compiler (besides the typical
"solution" of still using 0x0 for "null")?
Ignoring all the debate that has been triggered by your list
(which I have snipped), here is the answer I think you may be
looking for.
Suppose you are the compiler-writer for this machine. Suppose
further that you have decide to use 0xffffffff (all-one-bits)
as your internal representation for the null pointer, so that:
char *p = 0;
use(*p);
will trap at runtime, even though 0 is a valid address. How will
you, as the compiler-writer, achieve this?
The answer lies in your code generator. At any point in dealing
with the conversion of C source code to machine-level instructions,
you *always* know the type(s) of all the operand(s) of every
operator. This is of course absolutely necessary on most machines.
Consider, for instance, something like:
sum = a + b;
If a and b are ordinary "int"s, you probably need to generate an
integer-add instruction with integer operands:
ld r1, ... # load variable "a" into integer register
ld r2, ... # load variable "b" into integer register
add r0,r1,r2 # compute integer sum, reg+reg -> reg
st r0, ... # store sum back to memory
while if "a" and "b" are ordinary "double"s, you probably need to
generate a double-add instruction with double operands:
ldd f2, ... # load double "a" into f2/f3 register pair
ldd f4, ... # load double "b" into f4/f5 register pair
addd f0,f2,f4 # compute double-precision sum
std f0, ... # store sum back into memory
If one operand is an "int" and one is a "double", you have to
convert the int to a double and do the addition as two doubles,
and so on. The only way to know which instructions to generate is
to keep track of the types of all the operands.
So, now you have a chunk of C source level code that includes
the line:
p = 0;
where "p" has type "char *", i.e., a pointer type. The operand on
the right-hand-side of the assignment is an integer *and* is a
constant (you must keep track of this, too, but of course you will,
in order to do constant-folding). So you have an assignment that
has "integer constant zero" as the value to be assigned. Inside
the compiler, you check, and you GENERATE DIFFERENT CODE!
if (is_integer_constant(rhs) && value(rhs) == 0)
generate_store(lhs, 0xffffffff);
else
...
and thus, what comes out in the machine code is:
mov r0, -1 # set r0 to 0xffffffff
st r0, ... # store to pointer "p"
Likewise, in places where you have a comparision or test that
might be comparing a pointer to integer-constant-zero, you check
for this in the compiler, and generate the appropriate code:
is_null = false;
if (is_pointer(lhs) && is_integer_constant(rhs) && value(rhs) == 0) {
is_null = true;
ptr_operand = lhs;
} else if (is_pointer(rhs) &&
is_integer_constant(lhs) && value(lhs) == 0) {
is_null = true;
ptr_operand = rhs;
}
if (is_null)
generate_compare(ptr_operand, 0xffffffff);
else
...
There is only one place this goes wrong, and that is:
extern void somefunc(int firstparam, ...);
...
somefunc(3, ptr1, 0, ptr3); /* where "0" is meant to be a null pointer */
Here, inside your compiler, you see that the second parameter is
an integer constant zero, so you check the function prototype to
see if you need a pointer in this position. All you have is the
literal "..." part, so you must assume that this is really an
integer here, not a pointer. You pass zero (0x00000000) instead
of 0xffffffff. But this source code call is wrong! The programmer
*should* have used a cast:
somefunc(3, ptr1, (char *)0, ptr3);
In this version, you have a cast of an integer constant zero to a
pointer type, which produces 0xffffffff as appropriate, and only
then looks at the prototype. As before, there is no extra information
given by the prototype, but now you pass 0xffffffff as desired.
Now, given that I believe you have indeed correctly identified how
to do this inside the compiler:
- AFAIK I can identify contexts where `0' is used as a pointer and use
the numeric value 0xffffffff rather then 0x0. Is this correct?
Yes.
In particular, should `void* p;' initialize p to "null pointer" rather
then "zero" (so it has to be placed in ".data" rather then ".bss" in
terms of typical implementations if "null pointer" is not represented
as all bits 0)?
If "p" has static duration, yes. If "p" has automatic duration it
does not have to have any useful value upon creation. Note that
this also applies to structures containing pointers:
struct S { int a; long *b; double c; void *d; };
static struct S x;
will have to put x in a data segment in order to set x.b and x.d
to 0xffffffff internally. Unions may also contain pointers, but
in C89 only the first element is initialized, so only if the first
element is a pointer will you have to do this. (C99 offers designated
initializers, but those just fall out naturally.)
(You may, depending on implementation, want to have an "all one
bits" segment that you place either before or after your "bss"
segment. This will handle pointers that are not members of
structures.)
Worse, should `memset(&p, 0, sizeof(void*))' set p to the
"null pointer" rather then "zero"?
No.
Should casts from int to void* convert (int)0 (bits: 0x0) to (void*)0
(bits: 0xffffffff)?
If the (int)0 is an integer *constant*, yes (because semantically,
a cast is just an assignment to an unnamed temporary, except that
an actual temporary would be an lvalue and a cast produces an rvalue).
If the int that happens to contain zero is *not* a constant, this
is up to you -- but I would not. This allows programmers to write:
int zero = 0;
char *p = (char *)zero;
... now use *p to access hardware location 0 ...
I know that this topic has been discussed a lot. That's even one of the
reasons I'm not sure what the real answers are - I remember too many of
them and can't tell the right ones from the wrong ones...
The usual answer is to skip all of this and simply make sure that
hardware-location-zero is occupied, so that no *C* object or function
actually has address zero. Of course, this does not trap erroneous
null-pointer dereferences, but C implementations are rarely kind
to programmers that way. We seem to prefer to drive our race cars
without seatbelts.