JKop said:
[snip]
I'm currently reading Bjarne's book, 3rd Edition, I'm on page 277 so far. He
alluded earlier in the book that he would explain this later in the book,
but what the heck I thought it would be worth a bit of a debate.
It's a great book, although never would I recommend it to a beginner.
Agreed.
I wasn't actually sure of whether a pointer would have to be used. Like
consider:
int main()
{
int j;
j = 4;
}
No pointer is needed to set it to 4. I was thinking that maybe a pointer
wouldn't be needed either when you pass by reference to another function?
Well. Yes. Technically there is no requirement to use a hidden pointer.
In practice the reference has to be implemented somehow. Things are easy
if the function gets inlined: The compiler can record the fact that there
are 2 names refering to the same memory location and use that. If the
function gets not inlined then some other mechnism has to take place.
Currently all compilers use hidden pointers. I don't even know if there
is another mechanism available that can do the same thing. But I think
that the technique of hidden pointers is the simplest one (both, in
implementing it into a compiler and at runtime) and it would be hard
to beat it.
Anwyay, I haven't a clue how function calls operate under the hood, I hear
people pandy the term "stack" around, but haven't a clue what it means!
Something to do with how or where the local variables are stored?
That could become a lengthy explanation right now. Most of it is really OT
and some assumptions will be made. So everybody not really interested, stop
reading right now
In the following I assume you have never looked at a CPU at the instruction
level, so be patient
Also: different CPU's work differently in the internals,
so what I'd like to present is what I encountered during my studies, tests and
implementations.
In the internals of a CPU there are various registers. A register is nothing more
then a memory cell inside the CPU chip, where it can be electrically connected to
other parts of the CPU (such as the arithmetic unit). Some of those registers have
a special purpose, some are just used to give the program a way to park values
and be able to access them quickly.
One of those special registers is the PC (PC == Programm Counter). In it the memory
address is stored from where the next instruction from memory is read and executed.
Another one is the SP (SP == Stack Pointer). It contains a memory address. There are
special CPU instructions which use the SP, namely PUSH and POP. A PUSH operation stores
a value at the address given by SP and decrements the value stored in the SP. A POP
operation does the opposite: increment the SP and fetch the value from memory given
by the new content of SP. Both operations (+some additional ones), paired with the SP
form, what is generally referred to as 'The stack'. It is a simple way of 'dynamic
memory allocation' which grows and shrinks very easily, without much overhead.
Typically the memory layout of a program out work looks like this
+----------------+---------------------+-----------------------------+-----------------+
| Code section | Data section | free Memory pool | Stack section |
+----------------+---------------------+-----------------------------+-----------------+
^
|
SP |
+-------|-+
| o |
+---------+
with the SP pointing at the last available memory cell (Yes, the SP can be thought of as a
pointer which is located inside the CPU).
Now back to the PC. It also is logically a pointer. The CPU uses it when the next instruction
cycly begins. The PC contains the address of the next instruction to execute (which is just
a number, as anything at this level). OK. So the byte gets loaded and fed into the decoding
logic, which figures out what parts of the CPU need to be connected with other parts based
in the individual bits in that instruction byte. At the same time the PC gets incremented
by 1. (If the instruction is a multibyte instruction, the logic detects this and loads the
next view bytes to get at the complete instruction sequence, each time incrementing the PC.
So how is a JUMP instruction implemented? Simple. After decoding the instruction it is executed
and the jump target (the address of where to jump to) is loaded. This address is simply loaded
into the PC. In this way the next instruction that the CPU will execute is the one given by
the jump target. Simple, isn't it (At the CPU level things get incredible simple, as long as
no caching or pipelining comes into play).
OK. What about a CALL instruction? (JUMP is the equivalent of goto while CALL is a function
call). Well, the CPU has knowledge of parameters, this has to be implemented in other stages.
I'll ignore that for now and come back to it later. Nevertheless a CALL instruction is still
nothing more then a goto (a JUMP) with the added benefit that somewhere it is stored where
the goto came from, such that it is possible to return to that location later. Now where
is this information stored. You guessed it: on the stack. When a CALL instruction is executed
things start out the very same is in a JUMP case: the address is loaded. But instead of just
assigning this address to the PC, the previous content of PC is stored with a push on the stack.
Only after that the PC gets it's new value. When the CPU then executes the correspconding RETURN
instruction things go the other way round: a number is fetched from the stack (pop) and stored
in the PC.
Lets say the CPU has this instruction sequence (all numbers are hex)
0000 LD SP,FFFF ; set the stack pointer to ehe end of memory
0003 CALL 0007 ; call the function at memory location 0010
0006 HALT ; halt the CPU, the CPU effectivly freezes
...
0007 LD B,AA ; load register B with AA
0009 RET
(LD == Load)
The CPU The memory
+-----------------------+
| PC: 0000 | 0000
| SP: 0000 | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: 00 | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | | |
....+----+----+----+----+----+---+----+----+----+
You see the CPU on the left side, with the memory (Ram) on the right side. The Ram has already
been loaded with the abov program (02 is the opcode for Load, CD for CALL, C9 for RETurn, etc.)
The CPU starts executing. The first instruction is fetched from memory location 0000 (the content
of PC equals 0000), which is 02, a Load register SP instruction. The logic inside the CPU knows
that the next 2 Bytes after that 02 contain the value with which to load the SP. So this value
is also fetched SP loaded with it. After that instruction has executed, the situation is
The CPU The memory
+-----------------------+
| PC: 0003 | 0000
| SP: FFFF | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: 00 | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | | |
....+----+----+----+----+----+---+----+----+----+
The CPU starts it next cycle by loading the instruction based on the value in PC (which is 0003).
Loading from the memory at address 0003 brings up the OpCode of CD, which is a call instruction.
Again: The logic inside the CPU loads the next 2 bytes also from Ram (because they contain the
call target address). Due to this loading the PC has increased to a value of 0006
intermediate state !
The CPU The memory
+-----------------------+
| PC: 0006 | 0000
| SP: FFFF | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: 00 | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | | |
....+----+----+----+----+----+---+----+----+----+
but since this is a CALL instruction, the value of the PC is pushed onto the stack: The value
of the PC is stored at the memory address contained in SP
The CPU The memory
+-----------------------+
| PC: 0006 | 0000
| SP: FFFF | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: 00 | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | 06 | 00 |
....+----+----+----+----+----+---+----+----+----+
and the SP is decremented
The CPU The memory
+-----------------------+
| PC: 0006 | 0000
| SP: FFFD | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: 00 | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | 06 | 00 |
....+----+----+----+----+----+---+----+----+----+
Only after that, the previously read call target address value is assigned to the PC (that was
0007)
The CPU The memory
+-----------------------+
| PC: 0007 | 0000
| SP: FFFD | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: 00 | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | 06 | 00 |
....+----+----+----+----+----+---+----+----+----+
Instruction finished. Next instruction. Again: Use PC as the address from where to load the next
instruction: 0007 and the instruction is 08, which is a Load register B instruction.
Decoding, ... you already know what is going on, it is always the same cycle. After that
instruction the situation looks like this:
The CPU The memory
+-----------------------+
| PC: 0009 | 0000
| SP: FFFD | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: AA | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | 06 | 00 |
....+----+----+----+----+----+---+----+----+----+
Next instruction from memory address 0009 -> C9, which is a return instruction.
Return takes the Satck pointer SP, increments it and fetches the value from there. In our
example the SP contains FFFD, thus the poped values are 00 06, and the SP gets incremented
to FFFF
The CPU The memory
+-----------------------+
| PC: 0009 | 0000
| SP: FFFF | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: AA | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | 06 | 00 |
....+----+----+----+----+----+---+----+----+----+
The read value is assigned to the PC
The CPU The memory
+-----------------------+
| PC: 0006 | 0000
| SP: FFFF | +----+----+----+----+----+----+----+----+----+----+--...
| | | 02 | FF | FF | CD | 07 | 00 | 76 | 08 | AA | C9 |
| Register B: AA | +----+----+----+----+----+----+----+----+----+----+
+-----------------------+
FFFF
....+----+----+----+----+----+---+----+----+----+
| | | | | | | | 06 | 00 |
....+----+----+----+----+----+---+----+----+----+
and when the CPU starts it next instruction cycle, program execution continues at memory address
0006
(the content of PC).
So this is basically how 'function calling' works at the CPU level and what role the Stack has.
I did a Z80 emulation some time ago (to help in programming a small home built microcomputer, which
controls same hardware. Here are the corresponding functions in that emulation ( I leave out some
of the low level functions, I guess you can deduce what they do from their names).
class CPU {
public:
void Run();
protected:
void PushStack( WORD Adr );
WORD PopStack();
void JMP(), CALL(), RET();
BYTE m_Ram[0x10000];
WORD m_PC;
WORD m_SP;
BYTE m_RegB;
BYTE m_RegC;
};
void CPU::Run( CZ80CpuDoc* pDoc )
{
BYTE OpCode;
while( true ) {
OpCode = m_Ram[m_PC++];
(this->*m_Fnktions[OpCode])();
}
}
void CPU::JMP()
{
WORD Adr;
Adr = FetchAdress();
m_PC = Adr;
}
void CPU::RET ()
{
m_PC = PopStack();
}
void CPU::CALL ()
{
WORD Adress = FetchAdress();
PushStack( m_PC );
m_PC = Adress;
}
void CPU:
ushStack( WORD Adr )
{
BYTE byte1, byte2;
byte1 = Adr & 0xFF;
byte2 = (Adr >> 8) & 0xFF;
m_Ram[--m_SP] = byte2;
m_Ram[--m_SP] = byte1;
}
WORD CPU:
opStack ()
{
BYTE byte1, byte2;
WORD adr;
byte1 = m_Ram[m_SP++];
byte2 = m_Ram[m_SP++];
adr = To16( byte2, byte1 );
return adr;
}
But we are not finished right now. You will say: All fine, but what about parameters?
Well. The CPU doesn't know anything about parameters. It is the compilers job to translate
parameter passing into something that, when executed, gives the illusion that some parameters
have been passed. There are multiple ways to do that. 2 commonly used are: Pass by register
and pass by pushing it on the stack.
Pass by register is simple: The opcodes simply load one of the registers with the required value
and the function is built in such a way as to expect the value in exactly that register. But
the number of registers inside the CPU is usually limited, so only a handfull of parameters
can be passed that way! What to do then?
Eg. Pass by pushing onto the stack.
The CPU has other instructions, which Push and Pop values from and to the stack (just as the PC
was pushed during the CALL instruction):
LD B, AA // load register with required value (may be a constant or a load
// from a specific memory cell, or not done at all eg. if the passed
// value is the result of a computation and already in that register
PUSH B // push value on the stack ( Memory[SP] <- value; SP-- )
CALL 0100 // call the function
POP B // pop the pushed argument from above from the stack
...
0100 // this is the function and we know:
// if we do a pop right now, we get at the return address
// but the second pop, will give a hand at the passed argument.
// this also means, we can use the value of the stack pointer SP
// to access the passed argument.
... // this function could also decide to allocate some local variables
// one way would be, to just decrease the stack pointer and thus
// reserve some memory cells.
//
// the memory layout then looks like this
//
// ...+----+----+----+----+----+----+----+----+----+....
// | | | | | | | | | |
// ...+----+----+----+----+----+----+----+----+----+....
// ^ | | |----| | |
// | +-------------+ | +----|
// | local variables | |
// | return address |
// SP |
// passed arguments
//
// when the function does this, it is clear, that it has to
// correct the SP back to the location of the return address
// before executing the RET instruction. But since the function
// knows how many bytes it has moved the SP to the left, it is also
// able to move the SP back to the right.
...
RET
There is much more to say about this. But then this reply would break all records in length.
So please let me stop right now, and suggest to continue your studies by some literature
about assembly (which would also have the benefit that it is tailored to your specific
type of CPU).
Basically: At the CPU level, things get really simple. After all it is just a machine at work.