So the implementors did it this way just because the compiler should
be able to distinguish between pointers to structures and structures?
The answer ultimately boils down to "Dennis Ritchie did it this way
because he liked it that way", more or less.
(i.e if i have a pointer to a structure called myStruct, and try
to access it members with the dot operator - the compiler would
treat it like a structure.)
There *are* languages that have both structured types and pointers
(including pointers to structured types), *and* do precisely this.
(The one I used myself is Mesa.) If C did this, one would be able
to write:
struct S {
...
int member;
...
};
...
struct S var;
struct S *ptr;
...
ptr = &var;
...
use(ptr.member); /* NOT ALLOWED IN C, but OK in Mesa */
The compiler would see that "ptr" has type "pointer to struct S",
and would treat the "." operator as meaning "follow the pointer to
the struct, then access the named member." But C does not do this:
the "." operator *demands* that its left hand operand have some
struct (or union) type -- such as "struct S" -- and only then does
the "access the named member" part. C then has a second operator,
"->", that *demands* that its left hand operand have some pointer
to struct or union type -- such as "pointer to struct S" -- and
only then does the "follow the pointer to access the named member"
part.
The two actions are slightly different: one follows a pointer and
accesses a named member, while the other simply access a named
member. But both have a common element ("access named member")
and it is clear which action to perform: "if pointer to struct S,
follow then access; if struct S, just access; if neither pointer
to struct S nor actual struct S, error". As such, two separate
operators are not *required*. They just happen to be what Dennis
did, probably because he liked it that way.
But since pointers are variables,
Pointers *can be* (stored in) variables. Pointers themselves are
just values of type "pointer to <some other type>". Once you store
such a pointer in a variable, that variable is just like any other
variable. In particular, you can take its address:
struct S *ptr;
int error;
extern int new_S(struct S **);
...
error = new_S(&ptr);
Here new_S() takes the address of a variable of type "pointer to
struct S", and fills in the variable. It needs the address of that
variable because C passes arguments by value (every time, even for
arrays: it is just that the "value" of an array is quite peculiar).
The address of "ptr", &ptr, has type "pointer to (pointer to struct
S)", which in C is spelled "struct S *".
Note that new_S has to follow the address you gave it, to find the
actual object of type "pointer to struct S" to fill in:
int new_S(struct S **retp) {
struct S *tmp;
tmp = malloc(sizeof *tmp);
if (tmp == NULL)
return FAILED_TO_ALLOCATE_MEMORY;
tmp->member = 42;
*retp = tmp;
return OK;
}
We could replace each "tmp" with "*retp":
*retp = malloc(sizeof **retp);
if (*retp == NULL)
return FAILED_TO_ALLOCATE_MEMORY;
(*retp)->member = 42;
*retp = *retp; /* obviously redundant! */
but I prefer the version with "tmp", all the *retp's are repetitive
and because (*retp)->member is awkward. Note, however, that these
two versions do something slightly different too: if new_S returns
FAILED_TO_ALLOCATE_MEMORY, the version with "tmp" leaves *retp
unchanged. The version without it has set *retp to NULL. If the
call looks like:
error = new_S(&ptr);
then one version leaves "ptr" unchanged, and possibly uninitialized,
on error, while the other leaves it set to NULL. You can of course
modify the version with "tmp":
if (tmp == NULL) {
*retp = NULL;
return FAILED ...
or even just:
*retp = tmp = malloc(sizeof *tmp);
if (tmp == NULL) ...
tmp->member = 42;
/* *retp is already set */
return OK;
(which is probably how I would write it, if I wanted "ptr" changed
even on failure).
There are a number of key things to note here:
- Each pointer has a type. Each object (roughly, "variable")
also has a type.
- Any pointer variable can store a pointer value of the type
given by its declaration.
- Pointer values coming from malloc() are "special": if you
supply the correct size, they are valid for use as an object
of that size. For this reason, they have the funky/bizarre
type "void *", which gets converted automatically by assignments.
The objects allocated by malloc() also have a special lifetime:
they last until explicitly free()d. All other C objects have
either "static" duration -- they live as long as the program
runs -- or "automatic", allocated by entry to their "{}" block
and deallocated when the block ends.
- The "&" operator takes the name of an object (such as an
ordinary value) and produces a value of pointer type, pointing
to the object. The type of this pointer depends on the type
of the object, and basically just has "pointer to" shoved in
front of the English-language expansion of the C type. If
the object has type "T", the pointer has type "pointer to T".
- The "*" operator follows a pointer to whatever object it
points to. It must point to some object for this to work.
Naturally enough, the object should have the type indicated by
the pointer: following a "pointer to T" gives you an object
of type "T". (I am deliberately ignoring pointers to
functions here.)
- A valid pointer value is either NULL or the address of some
object somewhere.
Everything else simply follows from this. In new_S(), "retp" names
an object of type "pointer to pointer to struct S". Assuming retp
has a valid value, it points to an object of type "pointer to struct
S" (or in C, "struct S *"). The object at *retp need not have a
valid value to start with, because we are going to overwrite it.
(Some programmers prefer to make sure it has a valid value anyway,
e.g., by setting it to NULL when first creating it. But new_S()
does not depend on this.)
The call to malloc() asks for enough bytes for a "struct S". This
either succeeds, getting a valid "struct S" that will exist until
explicitly free()d, or fails and returns NULL. The return value
has type "void *", but we stick this into "tmp", which has type
"struct S *", thus converting it. We then check to see if the
result was NULL (or, in the last version of new_S(), copy tmp to
*retp first, then check for NULL). If tmp is NULL, we return a
failure error code, for whoever called new_S() to deal with.
If malloc() succeeded, on the other hand, we use "tmp" -- which has
type "pointer to struct S" and now points to a valid object -- with
the "->" operator, to get to "tmp->member". This is just shorthand
for (*tmp).member. The *tmp action follows the pointer in tmp
-- which as we know, is a valid pointer pointing to a real object
-- and accesses the object. The object has type "struct S", so
this gives us the "struct S" that the "." operator requires, and
the ".member" part then accesses the structure's "member" field,
which we set to 42 (The Answer, according to the Hitchhiker's Guide
series).
Last, we set *retp (if we have not already done so) and return OK,
meaning "new_S succeeded". This tells our caller that it is OK to
use *retp, which in the example call, names the same object as
"ptr":
error = new_S(&ptr);
If "retp" is &ptr, then "*retp" must be "ptr". The * and &
effectively cancel each other out. So if error == OK, meaning
"nothing went wrong", ptr is now set to a valid value from malloc(),
and ptr->member is set to 42.
(A function like "new_S" practically begs for one named "release_S"
or "discard_S" or "delete_S" or some such. In this case, such a
function need only call free(), but with more complicated data
structures, you might do more in the S-destruction function. Note
also that the names "S_new" and "S_delete" might be better in some
ways: all "struct S" operations might have names that start with S_.)