Steven T. Hatton said:
Is there anything that gives a good description of how source code is
converted into a translation unit, then object code, and then linked. I'm
particularly interested in understanding why putting normal functions in
header files results in multiple definition errors even when include guards
are used.
This is what I posted some time ago. It may be of some help
to you.
--------
First of all let me introduce a few terms and clearify
their meaning:
source code file The files which contains C or C++
code in the form of functions and/or
class definitions
header file Another form of source file. Header files
usually are used to seperate the 'interface'
description from the actual implementation
which resides in the source code files.
object code file The result of feeding a source code file through
the compiler. Object code files already contain
machine code, the one and only language your computer
understands. Nevertheless object code at this stage
is not executable. One object code file is the direct
translation of one source code file und thus usually
lacks external references, eg. the actual implementation
of functions which are defined in other source code files.
library file a collection of object code files. It happens frequently that
a set of object code files is always used together. Instead
of always listing all those object code files during the
link process it is often possible to build a library from
them and use the library instead. But there is no magic
with a library. A library can be seen as some repository
where one can deposit object code files such that the library
forms a collection of them.
compiling the process of transforming the source code files into
object code file. C and C++ define the concept of 'translation
unit'. Each translation unit (normally: one single source code
file) is translated independently of all other translation units.
linking the process of combining multiple object code files and libraries
into an executable. During the linking process all external references
of one object code file are examined and the linker tries to find
modules which satisfy those external references.
In practice the whole process works as follows:
Say you have 2 source files (with errors, we will return to them later)
main.c
******
int main()
{
foo();
}
test.c
******
void foo()
{
printf( "test\n" );
}
and you want to create an executable. The steps are
as in the graphics:
main.c test.c
+----------------+ +-----------------------+
| | | |
| int main() | | void foo() |
| { | | { |
| foo(); | | printf( "test\n" ); |
| } | | } |
+----------------+ +-----------------------+
| |
| |
v v
********** **********
* Compiler * * Compiler *
********** **********
| |
| |
| |
main.obj v test.obj v
+--------------+ +--------------+
| machine code | | machine code |
+--------------+ +--------------+
| |
| |
+------------------+ +--------------------+
| |
v v
************* Standard Library
* Linker *<----------+--------------------+
************* | eg. implementation |
| | of printf or the |
| | math functions |
| | |
| +--------------------+
main.exe v
+-------------------------+
| Executable which can |
| be run on a particluar |
| operating system |
+-------------------------+
So the steps are: compile each translation unit (each source file) independently
and then link the resulting object code files to form the executable. To do that
misssing functions (like printf or sqrt) are added by linking in a prebuilt library
which contains the object modules for them.
The important part is:
Each translation unit is compiled independently! So when the compiler compiles
test.c it has no knowledge about what happend in main.c and vice versa. When the
compiler tries to compile main.c it eventually reaches the line
foo();
where main.c tries to call function foo(). But the compiler has never heared about
a function foo! Even if you have compiled test.c prior to it, when main.c is
compiled this knowledge is already lost. Thus you have to inform the compiler
thar foo() is not a typing error and that there indeed is somewhere a function
called foo. You do this with an function prototype:
main.c
+----------------+
| void foo(); |
| |
| int main() |
| { |
| foo(); |
| } |
+----------------+
|
|
v
**********
* Compiler *
**********
|
Now the compiler knows about this function and can do its job. In very much the same way
the compiler has never heared about a function called printf(). printf is not part of
the 'core' language. In a conforming C implementation it has to exist somewhere, but
printf() is not on the same level as 'int' is. The compiler knows about 'int' and
what it means, but printf is just a function call and the compiler has to know its
parameters and return type in order to compile a call to it. Thus you have to inform
the compiler of its existence. You could do this in very much the same way as you
did it in main.c, by writing a prototype. But since this is needed so often and
there are so many other functions available, this very fast gets boring and error prone.
Thus somebody else has already provided all those protoypes in a seperate file, called
a header file, and instead of writing the protoypes by yourself, you simply 'pull in'
this header file and have them all available:
test.c
+-----------------------+
| #include <stdio.h> |<-+
| | |
| void foo() | |
| { | |
| printf( "test\n" ); | |
| } | |
+-----------------------+ |
| |
| |
v |
********** stdio.h v
* Compiler * +-------------------------------------+
********** | ... |
| | int printf( const char* fmt, ... ); |
| ... |
+-------------------------------------+
And now the compiler has everything it needs to know to compile test.c
Since main.c and test.c could have been compiled successfully they can be linked
to the final executable which can be run. During the process of linking the linked
figures out that there is a call to foo() in main.obj. Thus the linker tries to find
a function called foo. It finds this function by searching through the object
module test.obj. The linker thus inserts the correct memory address for foo
into main.obj and also includes foo from test.obj into the final executable. But
in doing so, the linker also figures out, that in function foo() there is a call
to printf. The linker thus searches for a function printf. It finds it in the
standard library, which is always searched when linking a C program. There the
linker finds a function printf and this function thus is included into the
final executable too. printf() by itself may use other functions to do its
work but the linker will find all of them in the standard library and include
them into the final executable.
There is one thing left to talk about. While main.c is correct from a technical
point of view it is still unsatisfying. Imagine that our functoni foo() has
a much more complicated argument list. Also imagine that your program does not
consist of just those 2 translation units but instead has 100-dreds of them and
that foo() needs to be called in 87 of them. Thus you would have write a prototype
in every single one of them. I think I don't have to tell you what that means: All those
prototypes need to be correct and just in case function foo() changes (things like
that happen), all those 87 prototypes need to be updated. So how can you do that?
You already know the solution, you have used it already. You do pretty much
the same as you did in the case of stdio.h. You write a header file and
include this instead of the prototype:
main.c
+-------------------+ test.h
| #include "test.h" |<---------+-------------+
| | | void foo(); |
| int main() | | |
| { | +-------------+
| foo(); |
| } |
+-------------------+
|
|
v
**********
* Compiler *
**********
|
Now you can include that header file in all the 87 translation units which
need to know about foo(). And if the prototype for foo() needs some update
you do it in one central place: by editing file test.h. All 87 translation
units will pull in this updated protype when they are recompiled.
HTH