well, a whole lot, not too much of it especially good, about whether
C has "pass by reference".
The key item that C lacks -- but C++ and Pascal have, that make
those two languages actually possess pass-by-reference -- is an
"implicit dereference".
In Pascal, we can write (where T is some type, either one predefined
by the language or one we have defined ourselves):
procedure foo(var n : T); begin
... operate on n ...
end;
In C++, we can write:
void foo(T & n) {
... operate on n ...
}
In both cases, we cannot, in foo(), "re-bind" n to refer to some
other variable. It always and forever -- or at least until foo()
returns, so that it ceases to exist -- refers to the parameter
supplied by the caller. This is a useful and common property of
language-provided pass-by-reference, but (I claim) not actually
necessary.
More importantly, in both cases, in order to change the caller's
variable, we do something like this in foo():
n = 42; /* for C++; in Pascal, use n := 42 */
Note that we do not adorn the variable name "n" with any special
punctuators or operators here. We use the same syntax we would
use if n were a local variable. (I include this second sentence
because in some languages, for which one might argue whether they
possess by-reference arguments, even ordinary local variables
require some sort of punctuation in order to assign values to
them.)
In both cases, the caller then does nothing to distinguish the
by-reference call from a by-value call:
foo(x); /* modifies x */
bar(x); /* does not modify x */
This means the programmer must know whether any given function or
procedure modifies any of its arguments. (It is not clear to me
that this last is a *required* property. If there were a language
in which the declaration and/or definition of foo() itself made it
clear that a parameter n was by-reference, but also forced callers
to acknowledge the by-reference call, would the language meet the
definition of "by reference" calls? I am not aware of any such
language, and without one, I think the question is moot.)
In C, by contrast, in order for a funtion foo() to change its
caller's variable, we must adorn not only the call:
foo(&x); /* permits foo to modify x */
but more importantly, we must decorate every use of "n" within
foo() itself:
void foo(T * n) {
... operate on *n ...
}
If, within foo(), we attempt to operate on "n" instead of "*n",
we "re-bind" n to point to some other object of type T. This is
impossible in Pascal and C++ (and Fortran at least through F77,
for that matter, although Fortran allows value-result instead
of by-reference).
Mr Dionne appears to believe that C's arrays are an exception, and
are passed by reference. While C's arrays *are* exceptional, this
is not where the exception occurs. C remains pass-by-value. The
gimmick is that the "value" of an array is a pointer to the array's
first element. In:
void f(void) {
char buf[100];
char *p;
p = buf;
we attempt to copy the "value" of buf -- an array -- to the pointer
variable p. The "value" of the array is a pointer to the array's
first element, so this sets p to point to &buf[0]. Likewise, if we
go on to call a function g() and pass the "value" of buf:
g(buf);
...
}
then g() receives, as its value, a pointer to the array's first
element -- a pointer to buf[0]. Within g(), as for any simulation
of by-reference in C, we have to use the unary "*" operator in
order to write to buf[0]:
void g(char * ptr) {
* ptr = 42;
...
}
and if we fail to prefix "ptr" with the unary "*" operator when
assigning, we will "re-bind" it so that it no longer points to
"buf" at all:
ptr = "oops";
Now *ptr is no longer 42 (in any current character set anyway):
ptr now points to the first element of an anonymous (unnamed) array
of 5 "char"s that are set to {'o','o','p','s','\0'} and must not
be changed (they may or may not actually be read-only, but the
effect of attempting to change them is undefined).
The sneaky part of this is that, given a pointer to buf[0] -- and
no reassignment of ptr itself, i.e., leave out the "oops" line --
we can use that pointer to access buf[1] through buf[99] as well.
That is why, in f() and g() both, we can do:
p[23] = 'x'; /* in f() */
or:
ptr[24] = 'y'; /* in g() */
None of this has much to do with calling mechanisms. The array /
pointer interconversion rules in C happen long before any concerns
with parameter-passing. It is merely the case that, because calls
*are* by-value and the "value" of an array is a pointer to its
first element, function calls are a very common point at which
the interconversion occurs.
The other syntactic gimmick that trips people up is that C allows
the programmer to lie about certain parameter types. If some
function receives a pointer of type "pointer to T", the programmer
is allowed to declare the formal parameter *as if* it had type
"array N of T". (The array size can be included or omitted.
Except for the new C99 meaning for "static", the size is not
used.) Hence, we can write the following:
#include <stdio.h>
void h(char paramvar[100]) {
char localvar[100];
printf("sizeof paramvar: %lu\n",
(unsigned long)sizeof paramvar);
printf("sizeof localvar: %lu\n",
(unsigned long)sizeof localvar);
}
Actually *executing* this function, however, is instructive:
the output is typically:
sizeof paramvar: 4 (or sometimes 2 or 8; rarely, 1)
sizeof localvar: 100 (always exactly 100)
In addition, we can demonstrate that "paramvar" actually has type
"char *" -- not "char [100]" -- by re-binding it:
void h(char paramvar[100]) {
char localvar[100];
char other[100];
printf("sizeof paramvar: %lu\n",
(unsigned long)sizeof paramvar);
printf("sizeof localvar: %lu\n",
(unsigned long)sizeof localvar);
paramvar = localvar; /* OK */
localvar = other; /* ERROR */
}
A diagnostic is required (and occurs) for the assignment to localvar,
because arrays are not modifiable lvalues (although the form of
the error message is not dictated by the standard and it is often
a bit peculiar). No diagnostic is required, and generally none
occurs, for the assignment to paramvar, because it is not an array;
its type is "char *", not "char [100]".
The Standard actually says that the type is to be rewritten by the
compiler:
... A
declaration of a parameter as ``array of type'' shall be
adjusted to ``pointer to type,'' and a declaration of a
parameter as ``function returning type'' shall be adjusted
to ``pointer to function returning type,'' as in 6.2.2.1.
(C99 draft, section 6.7.1). (The second part of the sentence above
means that you can define a function that takes another function,
instead of using the function-pointer syntax:
void operate(T f(args));
and
void operate(T (*f)(args));
mean the same thing in a prototype, and either version can be used
in the definition of the function "operate".)
Finally, it is worth pointing out that, in languages that have or
allow by-reference function calls, it may (or may not, depending on
the language definition) be possible to discover whether the compiler
actually uses by-reference, or simulates it with value-result. The
following C++ program fragment (though using printf() instead of cout)
illustrates a feature of by-reference:
int globalvar;
void f(int& x) {
x = 42;
printf("globalvar is now %d\n", globalvar);
x = 43;
}
int main(void) {
globalvar = 1;
f(globalvar);
printf("globalvar is now %d\n", globalvar);
}
The output of this program *must* be:
globalvar is now 42
globalvar is now 43
If this program is converted to Fortran 77 in the obvious way,
the effect of the program becomes undefined. Actually running it
may produce:
globalvar is now 1
globalvar is now 43
In this case, the compiler used value-result. This value-result
mechanism can be simulated in C as:
int globalvar;
void f(int *xp) {
int x = *xp;
x = 42;
printf("globalvar is now %d\n", globalvar);
x = 43;
*xp = x;
}
int main(void) {
globalvar = 1;
f(&globalvar);
printf("globalvar is now %d\n", globalvar);
}
The difference between by-reference and value-result is that, in
by-reference every occurrence of the apparently-ordinary variable
within a given function/procedure is -- at least before optimization
-- turned into an appropriate dereference. In the value-result
case, however, the value is passed to the callee, and the callee
returns a new result to be stored back in the original variable.
The effect is that there is a single dereference at entry to the
function/procedure, and a second one at exit. (Whether the copy
occurs in the callee, as in the C simulation, or the caller, is up
to the compiler -- the difference is not generally detectable.)