(It is indeed undefined behavior, and it does in fact break on
x86 architecture systems that use the "RET N" instruction form
for fixed-argument functions. What happens, of course, is that
the "RET 12" or whatever in the fixed-argument callee removes the
wrong number of bytes from the caller's stack frame, so that
after calling the function, all your local variables go kaboom.
Depending on whether the caller uses ENTER/EXIT and/or a frame
pointer, the caller itself may also not be able to return.)
Of course. That's pretty much the reason for existence (or however the
French would say that...) of the "cdecl" calling convention. I would
imagine that the original authors of C invented cdecl (though they may not
have called it that) precisely so that printf() could be implemented.
Notes for the prissy:
1) I say "cdecl", using that as a term in and of itself, to avoid
any implication that the cdecl calling convention is part of or
gets any funding from or is in any way associated with the
C language (the holy grail of this ng).
2) It is probably OT to even mention calling conventions, but
I personally think it is OK to discuss cdecl as long as we make
it clear, as we must, that it is technically OT.
To me, the word "cdecl" refers to the program for turning C
declarations into pseudo-English and vice versa
... as in:
% cdecl
cdecl> explain int (*foo)(arg)
declare foo as pointer to function (arg) returning int
cdecl> declare p as pointer to array 3 of pointer to function (args) returning pointer to pointer to char
char **((*p)[3])(args)
(Note: my copy of cdecl disappeared long ago and the above is made
up from memory.)
That said, there are some fundamental errors above.
C grew out of the B -> NB (New B) -> C progression, as Dennis
Ritchie has noted. The first C compilers were done on the PDP-11.
(See
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html for details.)
The PDP-11 has a hardware-denoted "stack pointer" register ("sp")
with "push" and "pop" semantics implied by its subroutine call
instruction. (As with many such machines, there are separate "user"
and "kernel"/"system" stack pointers as well, controlled by the
"PSW" or Processor Status Word. As such, one can talk about the
"usp" vs the "ssp": user and system stack pointers. But this is
invisible to ordinary user programs.)
The "most natural" design of assembly code on the PDP-11 has the
stack pointer adjusted by the caller. The reason is that if you
are going to put arguments on the stack, then call a function, you
end up with a stack on which the return address is at the bottom,
and a "return from subroutine" instruction pops it automatically,
but leaves the arguments on the stack. Unlike the x86, there is
no "return and pop more arguments" instruction -- you must use a
separate pop (or add), and of course, this has to go in the caller,
because the "ret" has returned.
Given these "machine facts", as it were, the obvious, simple, easy
way to handle subroutine calls in a language that supports recursion
and uses the machine's stack is to evaluate the arguments right to
left, pushing each value as soon as it is computed, and then emit
the "call subroutine" instruction followed by the "pop as many bytes
as we pushed" instruction. Variadic routines are easy to write as
well. If you take the address of the stack element just above the
return-address, you have a pointer to the first argument. Each
subsequent argument is the next two bytes (because all arguments
are "int"s -- early C did not have "long" and nobody actually used
floating-point
). You could even write an "nargs" function:
just trace up the call stack, look at the instruction following the
call, and figure out how many bytes it pops (and divide by two since
two PDP-11 bytes gives you one PDP-11 int).
Of course, "nargs" stops working once you support "long" and
floating-point ("double") arguments, which use 4 and 8 bytes (2
and 4 ints) respectively. (It also stops working when you get a
big separate-I&D PDP-11 and are unable to read instructions
because they are in the wrong address space.)
The problems came in when C started growing up and trying to move
out of the restricted "everything is a PDP-11" world. Suddenly
there were Honeywell machines with 9-bit bytes, the Interdata with
32-bit "int"s, machines on which "char" should properly be unsigned,
and so on. And of course, there were machines like the Intel x86,
with its "RET 12" instructions that both returned *and* popped
arguments off the caller's stack -- but required knowing for sure
exactly how many bytes to pop.
The ANSI C89 folks decided to allow Microsoft and other vendors
to use the "RET 12" style instructions, by the simple means of
requiring that a prototype be in scope before calling variadic
functions like printf(). The C compiler could assume that a call
like:
f(1, 2L);
that passed 6 bytes called a non-variadic function f() that ended
with a "RET 6". On the other hand, given:
int g(int, ...);
g(1, 2L);
that same C compiler would know that g() used a plain "RET"
instruction, and pop the 6 bytes itself.
Fortunately, the Intel x86 CPU remained obscure and was never used
in any popular hardware, so nobody actually has to declare their
variadic functions in advance.
(OK, I *am* kidding: fortunately,
C compiler vendors used the slower ret-and-separate-pop instruction
sequence, so that C programmers could get away with sloppy code.
And as it turns out, ret-and-separate-pop is now often faster
anyway.)