C
Chris Torek
so, <char** cp> can be stored in <void* vp>.
Yes.
I thought a ** (pointer to pointer) needs a ** type to store
Ideally, given a value of some type T, the best place to store
it is an object declared with that type. Given:
typedef /* insert some data type here */ T;
/*
eg,
typedef double T;
or typedef short T;
or typedef char **T;
or typedef struct foo *T;
or even
typedef void (*T)(void);
*/
you would then do:
T var;
...
var = value;
to store the value of type T in a variable of type T.
Sometimes, however, we have -- for whatever reason -- a variable
(or other object, if "variable" refers only to named objects)
with the "wrong" type. For instance:
double var;
...
var = some_integer_valued_function();
What is required is that the object into which we save this value
be "sufficiently capacious" to store any possible value. So
this question:
but here we are storing a ** into a single *. are they of same size ?
is not quite aimed correctly. If they were the same size, that
might be good enough, because at least the "place we are saving
the value" has enough *bits* to store the value. But "same size"
is not really good enough -- an int and a float are often the
same size:
#include <stdio.h>
int main(void) {
printf("%d %d\n", (int)sizeof(int), (int)sizeof(float));
return 0;
}
This will print "4 4" on many machines (I have one here that does
.... well, actually, I have quite a few, but will use just one of
them). But we cannot store just any float in an int, nor vice
versa:
#include <stdio.h>
#include <limits.h>
int main(void) {
float f;
int i;
f = 2109876543;
i = 3.5;
printf("%d (should be %d); %f (should be %f)\n",
(int)f, 2109876541, (double)i, 3.5);
return 0;
}
On my machine, I get:
% ./t
2109876480 (should be 2109876541); 3.000000 (should be 3.500000)
So we see that on my machine, even though int and float are the
same size, they cannot store all the same values. The "int" variable
loses the fraction, while the "float" variable sometimes loses the
last few digits of any "too-big" integer (certain powers of two
are saved correctly, but other values are rounded to the nearest
power of two).
If we were to store "int"s in "double"s, though, at least on this
particular machine, all "int" values would fit without rounding
problems. So a "double" suffices to store any "int" value: it has
enough bits *and* it never "corrupts" any of the original bits.
Even though assigning INT_MAX to a "double" changes the set of bits
stored (see <http://web.torek.net/torek/c/numbers.html> for a
discussion of representations), we can later convert the double
back to an int, and we always get the original number back.
What happened back in the 1980s was that the ANSI C committee --
the people involved in X3J11 -- were at some meeting(s), and had
some discussion(s), in which various members said they wanted to
come up with a new data type. This new type should be "big enough"
to store any valid data-pointer type.
On most machines, all data pointer types are the same, but on a
few -- especially back then -- there were some that had more than
one "kind" of pointer, having the bits moved around (byte vs word
pointers) on some machines, or even different sizes of pointers on
others. But even on these "oddball" machines, there was always
some way to store the "worst case" pointers. Even if the compiler
might have to have a special struct or union type internally, there
would be *some* way to gather up all the information from any
"specific" pointer type -- a "T *" for some data-type T -- into a
"generic" form.
What the X3J11 (ANSI C) committee folks needed to do, then, was
require implementations to provide a "generic data pointer" type,
along with pairs of chunks of implementation-specific code. One
chunk of each pair would take a *specific* pointer value, gather
up everything about it, and store it into this "generic pointer".
The other chunk would take one of these "generic pointers", having
been constructed earlier from some specific pointer, and convert
it back to the appropriate specfiic pointer.
For discussion, let us call the generic pointer type "genptr_t",
for the moment. The ANSI folks might even have considered using
a name like this (I was not at the meetings, I only read some
of the eventual paperwork, so I do not know if they did).
On machines with multiple "flavors" of pointers, these "chunks
of code" (as I am calling them here) were usually one pair of
instructions. For instance, a machine with both byte and word
pointers will usually be able to point to any individual byte
with a byte pointer, so it can use "byte pointer" for genptr_t.
Then, given a value of type "T *", the value itself is either
already a byte pointer, or is a word pointer. So:
T *orig;
genptr_t save_it;
... something that sets "orig" ...
save_it = (genptr_t)orig;
needs to use "convert word pointer to byte pointer" if "T *" is
a word pointer. If "T *" is already a byte pointer, it just
uses an ordinary "copy" instruction (often spelled "move", but
it copies, rather than emptying out the original ).
To restore the original value:
... code that modifies "orig" ...
orig = (T *)save_it;
we just need to reverse the process, either with another copy, or
with a "convert byte pointer to word pointer" instruction. Note
that the compiler did (and does) not have to know whether "save_it"
originally came from a byte or word pointer. You, the C programmer,
tell the compiler which kind to change it *to*. (In this particular
case, you supply the type with a cast.) It is *your* responsibility
to make sure that whatever value is in save_it "came from" a value
of the right type.
On machines with only one "flavor" of pointer, like the x86 for
instance, any pointer is already suitable for a genptr_t, so we
only need to use the machine's "copy value" ("mov") instruction.
So this makes writing a C compiler for these machines even
easier. All the "genptr_t" does is leave room for the oddball
machines: it costs nothing on the x86.
The weirdest part of this whole story, though, is the spelling that
the X3J11 committee folks came up with for "genptr_t". Instead of
having some <stdwhatever.h> file provide it, or having the compiler
pre-load it using that name, they decided to have the compiler
pre-load it, but using the spelling "void *".
So "void *" is really a genptr_t -- a generic pointer. This is
a very special pointer, in two ways:
- it is always big enough to hold *any* data pointer (not
necessarily a function pointer, though), and
- you can convert "void *" to and from "T *", where T is any
data type, without using a cast. (Other pointer conversions
require using casts.)
That second property is probably what convinced the committee
to spell the type "void *", making it built-in to the compilers.
Note, by the way, that "void *" is itself a data type, so one
can point to a "void *", giving a "void **". This is just a
regular old (non-generic) pointer type, though: it points
specifically to a "void *", never to any other type.