C++ Project Files?????

J

JKop

Here's what I know so far:

You have a C++ project. You have source files in it. When you go to
compile it, first thing the preprocessor sticks the header files into
each source file.

So now you have your ".cpp" files all ready, without any "#include"
or "#define" in them.

Let's assume that there's 2 source files in this project, "a.cpp" and
"b.cpp".

The 2 source files are compiled seperately, referring to functions
and global variables in each other via "extern" statements, which are
usually within header files. For instance "a.cpp" can call a function
in "b.cpp" simply by having a function prototype.

So both your ".cpp" files are ____________ into object files. (what
word should I use in the blank? "compiled"?)

So now the linker links the object files together, and wallah, now
"a.cpp" and "b.cpp" share they're functions, class definitions,
global variables.

So what we're looking at is something like this:

..cpp & .hpp -> .cpp -> .o -> .exe


Each source files has it's headers put into it, then it's turned into
an object file, then the linker gathers up all the object files and
makes an executable file.

----

QUESTIONS:

What the hell is a library file? I've heard that once your source
files have been turned into object files, the linker then adds
library files, and then turns it into an executable. So what the hell
is a library file?!

----

Could someone please give me a full complete list of all the
different file types associate with a C++ project, eg.:

..cpp Sourcecode File
..hpp Header File
..o Object File


I learned C++ from a book and I pretty much have it by the reins now,
but except for one thing, the book had nothing in it about the actual
form of a C++ project, ie. how there's source files and header files
and how the linker links object files.


Thanks allot for your time.


-JKop
 
P

Peter van Merkerk

What the hell is a library file? I've heard that once your source
files have been turned into object files, the linker then adds
library files, and then turns it into an executable. So what the hell
is a library file?!

A library file is nothing more than a couple of object files grouped
together.
 
M

MW Ron

JKop <[email protected]> said:
Here's what I know so far:

You have a C++ project. You have source files in it. When you go to
compile it, first thing the preprocessor sticks the header files into
each source file.

So now you have your ".cpp" files all ready, without any "#include"
or "#define" in them.

This is a translation unit It is important to understand this because
some things can be and some can not beaccessed externally to the
translation unit.
Let's assume that there's 2 source files in this project, "a.cpp" and
"b.cpp".

The 2 source files are compiled seperately, referring to functions
and global variables in each other via "extern" statements, which are
usually within header files. For instance "a.cpp" can call a function
in "b.cpp" simply by having a function prototype.

So both your ".cpp" files are ____________ into object files. (what
word should I use in the blank? "compiled"?)

Generally they just are called object files.
So now the linker links the object files together, and wallah, now
"a.cpp" and "b.cpp" share they're functions, class definitions,
global variables.

So what we're looking at is something like this:

.cpp & .hpp -> .cpp -> .o -> .exe

The source file "includes" the headers but yes this is basically right
I'd say...

source -> object
object + runtime -> executable (not only .exe but also dll's )

QUESTIONS:

What the hell is a library file?

There are two types of librarys one is a source library, just a
collection of templates and functions in a number of sources that you
add to your project as needed. STL is a source library

The other is often called a static library it is an object code file,
usually with a .lib or a .o extensio. This is the source code files
compiled into an object code but not processed for executable format.
i.e. it isn't deadstripped for unused code.
I've heard that once your source
files have been turned into object files, the linker then adds
library files, and then turns it into an executable. So what the hell
is a library file?!

Just a normal object file but with a library you can add or remove
files /functions from it. By using libraries it saves a lot of time
since one assumes that a library is fully tested code. It does not need
to be debugged it does not need to be compiled again, and so on.

Hope this helps,

Ron

--
Free Download - New Version Released
CodeWarrior Development Studio for HC(S)12 Microcontrollers v3.1
offers "emulator-like" debugging support for HCS12 derivatives
http://www.metrowerks.com/MW/Develop/Embedded/HC12/Default.htm

Ron Liechty - (e-mail address removed) - http://www.metrowerks.com
 
P

Philipp Bachmann

What the hell is a library file? I've heard that once your source
A library file is nothing more than a couple of object files grouped
together.

To supplement this statement: There are two types of libraries: Static
and dynamic / shared ones. Peter wrote about static ones. If your system supports
dynamic linking, there also exists the second type, which carries much
more information with it than the static ones, especially it can point to other
shared libraries it depends on. In this sense, shared libraries are more similar
to executables, which can carry the same kind of information with them,
than to static libraries. It's the job of another linker, the run-time linker, to
load these dynamic libraries.

The usage of libraries is a different story. The usage of both kinds of libraries
is designed to be as similar as possible, especially with Unix operating systems.

Shared libraries have another, useful property. They not only can be
link-edited (what you've called "linked") against, they can also be loaded
from within your code (Unix: "dlopen()", MS Win32 "LoadLibrary()") to
realize what often is referred to as "plugins".

Cheers,
Philipp.
 
J

JKop

Thanks allot, MW Ron.


So, an object file...


Let's take for a second a source-code file, ".cpp", or a module as some
people refer to it. Let's say it's called "boo.cpp", and it looks like this:


#include "boo.hpp"

int Hello(char k)
{
//blah blah blah
return 5;
}


After the Preprocessor gets at it, it looks like:


int Hello(char);

int Hello(char k)
{
//blah blah blah
return 5;
}


And then from there it's made into an object file. (BTW, are they just
simply called an object file?). So now we have "boo.o". So now what exactly
is "boo.o"? Is the file platform specific, or compiler specific, or is it
totally universal and can it be used with any project in any environment
(eg. MSVisual C++) and with any compiler?
So the file "boo.o" just contains the function "Hello". Now some other
".cpp" file in the same project wants to use "Hello". Let's call this file
"ret.cpp". "Ret.cpp" looks like this:

#include "boo.h"

int main(void);

int main(void)
{
Hello('r');
return 0;
}

So "ret.cpp" is turned into "ret.o" (Is the correct word here "build"?) So
at this stage, "ret.o" still knows nothing about where the "Hello"
function's source code actually is. So now Mr Linker comes in, takes
"ret.o" and "boo.o" and introduces them. From there, they're compiled into a
".exe".

So again, back to libraries, can I just rename "boo.o" to "boo.lib" and
start e-mailing it to all buddies for them to use in their C++ projects, eg.
making a game for the PS2, the interface for a microwave, WAP on mobile
phone?

---

So at heart, is a C++ program just a load of ".cpp" files?

Then these ".cpp" files get a load of "extern" statements via header files.
Then each ".cpp" file is built seperately into an "object file", and then
the linker links the object files together and analyzes all the "extern"
statements to share stuff between the files.


---

So, basically still, I want to know how the whole process works, in each
stage, with each type of file created, from "int main(void)" to .EXE!!


Thanks for your time!


PS I'm really starting to get fond of C++!

-JKop
 
K

Karl Heinz Buchegger

JKop said:
And then from there it's made into an object file. (BTW, are they just
simply called an object file?). So now we have "boo.o". So now what exactly
is "boo.o"? Is the file platform specific, or compiler specific,

Both.
The file contains the machine code which represents the function.
or is it
totally universal and can it be used with any project in any environment
(eg. MSVisual C++) and with any compiler?

Usually: no
In special cases: yes

But this is dependent on your compiler vendor.
So the file "boo.o" just contains the function "Hello". Now some other
".cpp" file in the same project wants to use "Hello". Let's call this file
"ret.cpp". "Ret.cpp" looks like this:

#include "boo.h"

int main(void);

int main(void)
{
Hello('r');
return 0;
}

So "ret.cpp" is turned into "ret.o" (Is the correct word here "build"?)

We say: it is compiled to ret.o
So
at this stage, "ret.o" still knows nothing about where the "Hello"
function's source code actually is.
right.

So now Mr Linker comes in, takes
"ret.o" and "boo.o" and introduces them. From there, they're compiled into a
".exe".

:) Yes.
So again, back to libraries, can I just rename "boo.o" to "boo.lib"

Usually: no.

You use another program, a librarian, to collect multiple *.o files
and create a library from them.
and
start e-mailing it to all buddies for them to use in their C++ projects, eg.
making a game for the PS2, the interface for a microwave, WAP on mobile
phone?

You can do the very same with the original boo.o file. Your buddies
can link it into their programs and use the functionality you provide.
Libraries come into play, if you don't have just one single .0 file.
E.g. I have a solid modeller. This solid modeller resides in ~35 *.ccp
files. I can compile them to *.o files and when I need that modeller
in one project I can include all of them into the project. I think you
recognize the problem: In every single project I have to include those
35 files. I must not forget one of them or else the linker will not be able
to link in the functions in it. On the other hand if I add a 36. file
to that modeller, I have to update all of the using projects with that
additional file. So what can I do? I can create a library. This library
contains all 35 files, and I just have to specify the library to the linker
to enable it to link against my modeller. No longer 35 files, instead
just one file - the library. And just in case that the modeller needs
a 36. file, I simply compile it and put it also in the library. The
projects using that library don't need to be updated, since they link
against the library, and the library contains everything needed for
linking in the modeller.
---

So at heart, is a C++ program just a load of ".cpp" files?

Then these ".cpp" files get a load of "extern" statements via header files.
Then each ".cpp" file is built seperately into an "object file", and then
the linker links the object files together and analyzes all the "extern"
statements to share stuff between the files.

Basically: correct.

Maybe the following is of some use to you. It turns around the
question of: "Why header files", but is also a short introduction of
how the process of compilation and linking works:

*******************************************************************************************

First of all let me introduce a few terms and clearify
their meaning:

source code file The files which contains C or C++
code in the form of functions and/or
class definitions

header file Another form of source file. Header files
usually are used to seperate the 'interface'
description from the actual implementation
which resides in the source code files.

object code file The result of feeding a source code file through
the compiler. Object code files already contain
machine code, the one and only language your computer
understands. Nevertheless object code at this stage
is not executable. One object code file is the direct
translation of one source code file und thus usually
lacks external references, eg. the actual implementation
of functions which are defined in other source code files.

library file a collection of object code files. It happens frequently that
a set of object code files is always used together. Instead
of always listing all those object code files during the
link process it is often possible to build a library from
them and use the library instead. But there is no magic
with a library. A library can be seen as some repository
where one can deposit object code files such that the library
forms a collection of them.

compiling the process of transforming the source code files into
object code file. C and C++ define the concept of 'translation
unit'. Each translation unit (normally: one single source code
file) is translated independently of all other translation units.

linking the process of combining multiple object code files and libraries
into an executable. During the linking process all external references
of one object code file are examined and the linker tries to find
modules which satisfy those external references.


In practice the whole process works as follows:
Say you have 2 source files (with errors, we will return to them later)

main.c
******

int main()
{
foo();
}

test.c
******

void foo()
{
printf( "test\n" );
}

and you want to create an executable. The steps are
as in the graphics:


main.c test.c
+----------------+ +-----------------------+
| | | |
| int main() | | void foo() |
| { | | { |
| foo(); | | printf( "test\n" ); |
| } | | } |
+----------------+ +-----------------------+
| |
| |
v v
********** **********
* Compiler * * Compiler *
********** **********
| |
| |
| |
main.obj v test.obj v
+--------------+ +--------------+
| machine code | | machine code |
+--------------+ +--------------+
| |
| |
+------------------+ +--------------------+
| |
v v
************* Standard Library
* Linker *<----------+--------------------+
************* | eg. implementation |
| | of printf or the |
| | math functions |
| | |
| +--------------------+
main.exe v
+-------------------------+
| Executable which can |
| be run on a particluar |
| operating system |
+-------------------------+


So the steps are: compile each translation unit (each source file) independently
and then link the resulting object code files to form the executable. To do that
misssing functions (like printf or sqrt) are added by linking in a prebuilt library
which contains the object modules for them.

The important part is:
Each translation unit is compiled independently! So when the compiler compiles
test.c it has no knowledge about what happend in main.c and vice versa. When the
compiler tries to compile main.c it eventually reaches the line
foo();
where main.c tries to call function foo(). But the compiler has never heared about
a function foo! Even if you have compiled test.c prior to it, when main.c is
compiled this knowledge is already lost. Thus you have to inform the compiler
thar foo() is not a typing error and that there indeed is somewhere a function
called foo. You do this with an function prototype:


main.c
+----------------+
| void foo(); |
| |
| int main() |
| { |
| foo(); |
| } |
+----------------+
|
|
v
**********
* Compiler *
**********
|

Now the compiler knows about this function and can do its job. In very much the same way
the compiler has never heared about a function called printf(). printf is not part of
the 'core' language. In a conforming C implementation it has to exist somewhere, but
printf() is not on the same level as 'int' is. The compiler knows about 'int' and
what it means, but printf is just a function call and the compiler has to know its
parameters and return type in order to compile a call to it. Thus you have to inform
the compiler of its existence. You could do this in very much the same way as you
did it in main.c, by writing a prototype. But since this is needed so often and
there are so many other functions available, this very fast gets boring and error prone.
Thus somebody else has already provided all those protoypes in a seperate file, called
a header file, and instead of writing the protoypes by yourself, you simply 'pull in'
this header file and have them all available:


test.c
+-----------------------+
| #include <stdio.h> |<-+
| | |
| void foo() | |
| { | |
| printf( "test\n" ); | |
| } | |
+-----------------------+ |
| |
| |
v |
********** stdio.h v
* Compiler * +-------------------------------------+
********** | ... |
| | int printf( const char* fmt, ... ); |
| ... |
+-------------------------------------+

And now the compiler has everything it needs to know to compile test.c
Since main.c and test.c could have been compiled successfully they can be linked
to the final executable which can be run. During the process of linking the linked
figures out that there is a call to foo() in main.obj. Thus the linker tries to find
a function called foo. It finds this function by searching through the object
module test.obj. The linker thus inserts the correct memory address for foo
into main.obj and also includes foo from test.obj into the final executable. But
in doing so, the linker also figures out, that in function foo() there is a call
to printf. The linker thus searches for a function printf. It finds it in the
standard library, which is always searched when linking a C program. There the
linker finds a function printf and this function thus is included into the
final executable too. printf() by itself may use other functions to do its
work but the linker will find all of them in the standard library and include
them into the final executable.

There is one thing left to talk about. While main.c is correct from a technical
point of view it is still unsatisfying. Imagine that our functoni foo() has
a much more complicated argument list. Also imagine that your program does not
consist of just those 2 translation units but instead has 100-dreds of them and
that foo() needs to be called in 87 of them. Thus you would have write a prototype
in every single one of them. I think I don't have to tell you what that means: All those
prototypes need to be correct and just in case function foo() changes (things like
that happen), all those 87 prototypes need to be updated. So how can you do that?
You already know the solution, you have used it already. You do pretty much
the same as you did in the case of stdio.h. You write a header file and
include this instead of the prototype:

main.c
+-------------------+ test.h
| #include "test.h" |<---------+-------------+
| | | void foo(); |
| int main() | | |
| { | +-------------+
| foo(); |
| } |
+-------------------+
|
|
v
**********
* Compiler *
**********
|

Now you can include that header file in all the 87 translation units which
need to know about foo(). And if the prototype for foo() needs some update
you do it in one central place: by editing file test.h. All 87 translation
units will pull in this updated protype when they are recompiled.
 
K

Karl Heinz Buchegger

Karl said:

Sorry. No. They are *linked* into an executable.
compile: transform the human readable source code into machine code
link: put individual pieces together to form the whole.
 
J

JKop

Thanks a trillion!! You really do have the gift of teaching, anticipating
the student's confusion, curiosity and questions! Plus you obviously enjoy
it too!


Having digested all that...


Say for instance my "boo.hpp" not only contained a prototype of "Hello", but
also the definition.

We have 5 "*.cpp" files that all include "boo.hpp". There's no problem with
compiling the "*.cpp" files into object files, but then when Mr Linker takes
over, he shouts "Function defined 5 times! ERROR!!". I can fully understand
that.

But, now we stick "inline" before the definition in "boo.hpp". We re-compile
the entire executable, and wallah, it works. This I DON'T understand! I know
that the function is expanded where-ever it's called, but there's still 5
definitions of it! How come all of a sudden they no longer conflict?! Can
you explain please?

--

You say that object files aren't "universal" always. Take this scenario:

John is writing a game for the XBox. Let's say his environment is called
"Purlple Five C++". He has written a certain class called "Fillex". He has
the declarations in a ".hpp". He compiles his "Fillex.cpp" into
"Fillex.obj".

John's friend Mary is writing an interface for a top of the range fridge.
Her environment is "Yellow Krk C++". She wants to use the "Fillex" class.
She gets the header file firstly off John. John then sends her "Fillex.obj".

Are you saying that John's "Fillex.obj" won't necessarily work for Mary? And
thus that object files aren't "standardized"? Have I got that right?

And thus are library files not standardized either?


Is there anything in the C++ Standard about Object Files or Library Files?


-------


Right now I'm using Dev-C++ and I'm writing Win32 progs. I look in my
project and I see:

main.cpp
currency.cpp
currency.hpp


Let's somewhere say in "main.cpp", I have:

cout << "Hello!!";


Therefore I'm gonna need:

#include <iostream>


But we still don't have a definition of the object called "cout". We have a
declaration of it's class alright, including it's member variables and
member functions. But, we don't have an actual definiton of cout:

ostream cout;

Nor do we have a definition of the member functions of the "ostream" class.

When I compile my project I have "main.o" and "currency.o". I see absolutely
nothing about any other object files or library files that the linker puts
in there with it. I had to search the web looking for an answer to this
until I finally found out that "in the background" the linker has linked
with all these libraries that I knew nothing about. Fair enough, but I want
more control than that, I want to specify which object files / library files
have earned that status of being in my prog.

So then someone might just say "Okay, just don't call any of its functions
and ignore it". But then think of this: let's say one of these object files
has a line in it:

int hkj[587363];

(This is a global variable)

I don't want my prog taking that big dirty chunk of memory!! ESPECIALLY
WITHOUT ME KNOWING ABOUT IT!! So now I want to find out ALL the global
varibles that are going to be in my ".exe", but how the hell can I?!! All
the object files / libraries are in machine code, right?!

And then there's another little pickle I have with it. Let's say I have a
function in one of my ".cpp" files called:

unsigned long int square(unsigned short int kk);

Now I compile my ".cpp" and no problem. But then Mr Linker goes and links
with all these libraries I didn't ask for and wallah, one of them has a
"square" function. If you had loads and loads of libraries then I assume
that enventually you're gonna run into this problem of a "multiple
definition". Right?

Are my arguments justified?

---

Another scenario...


We have 2 files

a.cpp
b.cpp


Contents of b.cpp :
~


int Hello(void)
{

return 7;
}


~


Contents of a.cpp :

~


extern int Hello(void);

int main(void);


int main(void)

{
return Hello();
}



~


Now I can definitely understand why, in "a.cpp", there is the following
line:

extern int Hello(void);


....but why is the -


int main(void);


- function prototype necessary?

For example, why is the following necessary:



Here's a source file called "jack.cpp" and it's the only one in the project.
(BTW, is "project" the right lingo?)

~


int rest(void);

int rest(void)
{
return 5;
}

int main(void)
{
rest();
return 4;
{


~


My question is: Why is that declaration of -

int rest(void);


- necessary, especially since the definition of "rest" appears before
"main".


The only reason I can think of is that maybe throughout writing your prog
you may switch around the order of the functions within the ".cpp" file. And
instead of going and sticking function prototypes everywhere, C++ just gives
you a kick up the arse and says "Just put function prototypes in there
regardless of the order of the functions. Just do what I say, believe me I'm
saving you some hassle". Am I right?


----


And again, Karl, thank you very very much for your very helpful help.


-JKop
 
P

Peter van Merkerk

JKop said:
Say for instance my "boo.hpp" not only contained a prototype of "Hello", but
also the definition.

We have 5 "*.cpp" files that all include "boo.hpp". There's no problem with
compiling the "*.cpp" files into object files, but then when Mr Linker takes
over, he shouts "Function defined 5 times! ERROR!!". I can fully understand
that.

But, now we stick "inline" before the definition in "boo.hpp". We re-compile
the entire executable, and wallah, it works. This I DON'T understand! I know
that the function is expanded where-ever it's called, but there's still 5
definitions of it! How come all of a sudden they no longer conflict?! Can
you explain please?

The explanation is that the linker doesn't have to do anything for
inline functions, in fact it does not even need to know about them
because everything that needs to be done for those functions is handled
by the compiler.

When you put inline before the function definition, you tell the
compiler if possible not to call the function, but to insert the code of
the function instead. This is somewhat similar to what happens when you
use macro's instead of normal functions, except that macro's are handled
by the preprocessor and inline functions by the compiler. (tip: if you
have the choice prefer inline functions). Inlining is done for speed; it
avoids the function call overhead, which may be significant if the
function body is small, and allows for better optimization because
compiler knows exactly what happens before the function is called, what
happens inside the function and what happens the after the function. If
the function is compiled separately the compiler has to be conservative
because it does not know everything that might help it to produce
optimal code. The downside of inlining is that it may lead to larger
executables, because the code of the function is duplicated instead of
called. On processors with cache memory it may even result in a
performance penalty. When code is inlined the linker does not have to do
anything for that function.

However the inline keyword is just a hint. The compiler may decide for
whatever reason not to honor your request. In that case the "inline"
function has internal linkage. Which means that just like static
functions, the function can only be called from inside the same
translation unit (.cpp file). But since functions with internal linkage
are not accessible from other object files, the linker does not need to
know about them. Therefore the linker won't complain about having
multiple times the same function. Note that even if a inline function is
not inlined, the code has to be duplicated for every translation unit
that uses the function, since internal linkage prevents the code from
being shared between translation units.

--

You say that object files aren't "universal" always. Take this scenario:

John is writing a game for the XBox. Let's say his environment is called
"Purlple Five C++". He has written a certain class called "Fillex". He has
the declarations in a ".hpp". He compiles his "Fillex.cpp" into
"Fillex.obj".

John's friend Mary is writing an interface for a top of the range fridge.
Her environment is "Yellow Krk C++". She wants to use the "Fillex" class.
She gets the header file firstly off John. John then sends her "Fillex.obj".

Are you saying that John's "Fillex.obj" won't necessarily work for Mary? And
thus that object files aren't "standardized"? Have I got that right?

You are correct, object files are not really standardized. With object
files generated from C code there is a chance you might be able to mix
them with object files with generated with another compiler.

However the chance of being able to mix two object files generated from
C++ code compiled with different C++ compilers is extremely low. The
reason is not so much the object file itself, but name mangling of the
C++ compiler. With C functions, the name is put more or less as-is into
the the object file. However because of parameter overloading this
scheme could not be used for C++. In C++ there can be many functions
with the same name, but with different parameter types. C++ compilers
add extra stuff to the function name to uniquely identify overloaded
function in the object file.

For example the function names:
void fun(int i, char c);
void fun(const char* s);

May be transformed to:
fun@void@const_char_ptr@
fun@void@int@char@

The problem is that the name mangling algorithm isn't standardized.
Every C++ compiler does it in its own way. So even if the structure of
object files is compatible, it is impossible to link two object files
generated by two different C++ compilers because the names in the object
files don't match.
And thus are library files not standardized either?

You are learning fast: that is correct!
Is there anything in the C++ Standard about Object Files or Library Files?

No this is beyond the scope of the C++ standard (so technically we are
off-topic here ;-)).

-------


Right now I'm using Dev-C++ and I'm writing Win32 progs. I look in my
project and I see:

main.cpp
currency.cpp
currency.hpp


Let's somewhere say in "main.cpp", I have:

cout << "Hello!!";


Therefore I'm gonna need:

#include <iostream>


But we still don't have a definition of the object called "cout". We have a
declaration of it's class alright, including it's member variables and
member functions. But, we don't have an actual definiton of cout:

ostream cout;

Nor do we have a definition of the member functions of the "ostream" class.

When I compile my project I have "main.o" and "currency.o". I see absolutely
nothing about any other object files or library files that the linker puts
in there with it. I had to search the web looking for an answer to this
until I finally found out that "in the background" the linker has linked
with all these libraries that I knew nothing about.

Usually the runtime library is linked in, unless you explicitly tell it
not to. Note that before the first line of your main() function is
executed, already a lot of code in runtime library has executed. For
example the code that fills argv and argc arguments.
Fair enough, but I want
more control than that, I want to specify which object files / library files
have earned that status of being in my prog.

Note that not necessarilly everything that is in the library will also
go in your executable. Most linkers are selective at object file level;
object files in a library that aren't used won't be put into the final
executable. Some compiler+linker combinations can be even more selective
than that.
So then someone might just say "Okay, just don't call any of its functions
and ignore it". But then think of this: let's say one of these object files
has a line in it:

int hkj[587363];

(This is a global variable)

I don't want my prog taking that big dirty chunk of memory!! ESPECIALLY
WITHOUT ME KNOWING ABOUT IT!! So now I want to find out ALL the global
varibles that are going to be in my ".exe", but how the hell can I?!! All
the object files / libraries are in machine code, right?!

Usually you can tell the linker to generate a map file so you can see
what went into the executable, and how many bytes thoe items did add to
the executable. Controlling what goes into the executable and what not
is a lot harder, and how to do it depends on the tools you are using
(which is beyond the scope of this newsgroup). At some point you will
have to ask yourself if it is really worth the effort.
And then there's another little pickle I have with it. Let's say I have a
function in one of my ".cpp" files called:

unsigned long int square(unsigned short int kk);

Now I compile my ".cpp" and no problem. But then Mr Linker goes and links
with all these libraries I didn't ask for and wallah, one of them has a
"square" function. If you had loads and loads of libraries then I assume
that enventually you're gonna run into this problem of a "multiple
definition". Right?

Yes, but to avoid name clashes C++ has namespaces.
Are my arguments justified?

You make good points. Just don't be overly worried that there goes
(sometimes a lot) more into an executable than there need to be. Often
this is not really a big issue.
---

Another scenario...


We have 2 files

a.cpp
b.cpp


Contents of b.cpp :
~


int Hello(void)
{

return 7;
}


~


Contents of a.cpp :

~


extern int Hello(void);

int main(void);


int main(void)

{
return Hello();
}



~


Now I can definitely understand why, in "a.cpp", there is the following
line:

extern int Hello(void);


...but why is the -


int main(void);


- function prototype necessary?

It isn't necessary.
For example, why is the following necessary:

Here's a source file called "jack.cpp" and it's the only one in the project.
(BTW, is "project" the right lingo?)

~


int rest(void);

int rest(void)
{
return 5;
}

int main(void)
{
rest();
return 4;
{


~


My question is: Why is that declaration of -

int rest(void);


- necessary, especially since the definition of "rest" appears before
"main".

The declaration is not necessary in this case. Since the definition of
rest() appears before main() you can omit the declarion of rest() - Just
try it.

The declaration of rest() is only needed if the definition of main()
comes before the definition rest(), or if rest() is defined in a
different translation unit (.cpp file). Without the declaration of
rest() the compiler would encounter "rest" for the first time when
compiling the main function, and at that point the compiler would not
know what "rest" is and flag an error.

I would like to compliment you with your questions. With this attitude
you will no doubt become a excellent C++ programmer. The last few years
I have seen too many people that just know how to push a buttons in
the IDE without knowing or even being interested how things work. It is
good to see that there are still people like you who want to understand
what really happens under the hood. Those are in my experience by far
the best people to work with.

Regards,
 
E

E. Robert Tisdale

JKop said:
You have a C++ project. You have source files in it.
When you go to compile it, the first thing that happens is
the C preprocessor sticks the header files into each source file.

A typical C++ compiler translates your C++ program in *phases*:

1. the C preprocessor,
2. the C++ compiler proper emits assembler,
3. the assembler emits machine [object] code,
4. the link editor finds the object code and library archives,
resolves all of the external links and emit an executable program.

These phases may themselves be executed in two or more phases.
So now you have your ".cpp" files all ready,
without any "#include" or "#define" in them.

No. The C preprocessor *merges* the header files with the source files
*expands* the macros and emits a single *translation unit*.
Let's assume that there's 2 source files in this project,
"a.cpp" and "b.cpp".

The 2 source files are compiled separately, referring to functions
and global variables in each other via "extern" statements, which are
usually within header files. For instance "a.cpp" can call a function
in "b.cpp" simply by having a function prototype.

So both your ".cpp" files are ____________ into object files.
(what word should I use in the blank? "compiled"?)

Compiled is fine.
So now the [link editor] links the object files together
and voila, now "a.cpp" and "b.cpp" share their functions,
class definitions, global variables.


So what we're looking at is something like this:

.cpp & .hpp -> [translation unit] -> .s -> .o -> .exe

Each [translation unit] has it's headers put into it

then it's turned into an assembler file
then it's turned into an object file
then the linker gathers up all the object files,

resolves all of the links
and [emits] an executable file.

----

QUESTIONS:

What the is a library file? I've heard that once your source
files have been turned into object files, the linker then adds
library files, and then turns it into an executable.

So what is a library file?

The correct term is *library archive*.
It's simply a special file containing one or more object files
and [usually] some sort of table of contents
to help the link editor find the correct object file efficiently.
Could someone please give me a full complete list of all the
different file types associate with a C++ project, e.g..:

.cpp Sourcecode File
.hpp Header File
.o Object File

file.ii
C++ source code which should not be preprocessed.
file.cc
file.cp
file.cxx
file.cpp
file.c++
file.C
C++ source code which must be preprocessed.
Note that in .cxx, the last two letters
must both be literally x.
Likewise, .C refers to a literal capital C.
file.s
Assembler code.
file.S
Assembler code which must be preprocessed.

It depends upon your implementation.
I learned C++ from a book and I pretty much have it by the reins now,
but except for one thing, the book had nothing in it about the actual
form of a C++ project, i.e. how there's source files and header files
and how the linker links object files.

That's the Integrated Development Environment (IDE)
that came with your compiler.
It is supposed to hide the details of how all of these things get done.
 
K

Karl Heinz Buchegger

Peter has laready answered your questions in another thread and
there is not much to add to it. However:
--

You say that object files aren't "universal" always. Take this scenario:

John is writing a game for the XBox. Let's say his environment is called
"Purlple Five C++". He has written a certain class called "Fillex". He has
the declarations in a ".hpp". He compiles his "Fillex.cpp" into
"Fillex.obj".

John's friend Mary is writing an interface for a top of the range fridge.
Her environment is "Yellow Krk C++". She wants to use the "Fillex" class.
She gets the header file firstly off John. John then sends her "Fillex.obj".

Are you saying that John's "Fillex.obj" won't necessarily work for Mary? And
thus that object files aren't "standardized"? Have I got that right?

Right.
Technically there are 2 main problems.
Different computer use different processors, or as we say: Not all the
world is a PC (in former times we said: not all the world is a VAX).
Now what does that mean? Your processor executes programs. Well those
programs don't look like anything you or I are writing in C or C++. On
the processor level, programs are just a stream of numbers, where each number
has a specific meaning.
Example: If a program my Z80 computer (Z80 is a CPU built by Zilog and the
late 70's and is still popular for small one card computers), It may
look like this

F3 3E 08 0E FF 06 FF 05 C2 07 00 0D C2 05 00 3D C2 03 00

That's a program (actually the start of a program). Every number has a specific
meaning and that meaning is different for each type of CPU. E.g. for a Motorolla
6502 CPU those numbers mean something completely different.

Just if you wonder, here is the assembler source for the above sequence

0000 F3 DI
0001 3E 08 LD A,08H
0003 0E FF V1: LD C,FFH
0005 06 FF V2: LD B,FFH
0007 05 V3: DEC B
0008 C2 07 00 JMP NZ,V3
000B 0D DEC C
000C C2 05 00 JMP NZ,V2
000F 3D DEC A
0010 C2 03 00 JMP NZ,V1

The whole thing is a waiting loop, which counts the CPU registers BC from 0xFFFF
down to 0 and repeats this 8 times using register A. (DI = disable interrupts,
LD = load, DEC = decrement, JMP = jump, NZ = not zero). Well a Z80 CPU has some
registers (a register is a memory cell in the CPU itself which can be used to
store values or to do arithmetic), but other CPU's have other number of registers.
Eg. in a Z80 one can do arithmetic mostly in the A register only, in a 68000 (if
I recall correctly) one can use any register for arithmetic. But then there are
CPU's which don't have registers at all, or have special instructions for doing
certain things, or have a minimal subset of instructions (that was e.g. one
of topic of 'wars' a number of years ago: what is better CISC or RISC?, CISC
stands for complex instruction set computers, RISC means reduced instruction
set computers).

So much for different CPU's. So if Maries fridge runs with a different CPU then
Johns XBox, and chances are high, there is simply no way that the XBox
program will do anything meaningful in the fridge CPU.
Imagine a world where all people use the same words. That means: our dictionaries
contain the same word. But now imagine an island, which of course use the very
same words, but each word may mean something different. So when I say:
"This is a nice car", for the people on this island it may mean (retranslated):
"I hereby declare wholy war". So when I visit that island and think I make
a compliment to my host, he immediatly kills me. Even if we use the same words
we are not talking the same language :)

Even if both computers run the same CPU, there are important differences. Computers
are usually not standalone hardware, they run what is called: 'the operating system'.
The OS gives every program a set of operations that can be performed. Eg. output
a character to the console, check if a character is waiting from the keyboard,
set a pixel to white, read a sector from the floppy disk, things like that.
Now: the way how to request such a service is different from operating system
to operating system. On some systems it means: put some values in some CPU
registers and call the function starting at address xxxxx. On another operating
systems it may be: put the same values into the same CPU registers but call
the function starting at address yyyyy. On other operating systems it may
be: put other values in different CPU registers and execute a software interupt
statement or .... You see there are endless ways to do the same thing. Your
compiler knows about the convention used at that operating system and emits
the right instructions (the right numbers) to do it the way the OS expects
things to happen. If you switch the operating system, then you need a compiler
which knows about the convention used on that OS.
 
J

JKop

I must say I'm overwhelmed with the help I've gotten! Thank you Peter van
Merkerk, thank you Karl Heinz Buchegger, and thank you MW Ron.


(Please read this using a monospace font)



Having digested that... (Here I go again!)


Have I got the following correct?:



..cpp & .hpp ---> PREPROCESSOR
|
|
V
-----------------
|
|
|
V
Translation Unit ---> COMPILER
|
|
V
------------------
|
|
|
V
..obj Object File --------------------> LINKER <-------------- Other ".obj"s
|
|
V
------------------------------------
|
|
|
V
.EXE !!!



(Q: Is there a "translation unit" file generated? If so, what's it's
extension? I pressume it's universal to all compilers, right?)


.... That Right??



If so, I now understand why there's no "multiple definition" problems from
the linker as regards inline functions defined within a header file, reason
being: The inline function becomes a part of the translation unit generated
from EVERY ".cpp" file that includes the header file, with... wait for it...
internal linkage!(ie. the "static" is implied)!!



(BTW, I'm a great fan of specifics, therefore even with my header files I
write "extern int Hello(char*);" and with my function declarations, I never
write "int Hello()", I always write "int Hello(void)", I also always include
"public:" at the start of my classes, even though it's not required, but I
like it! Anyway, what I'm getting at is: I'm going to start writing "static
inline int Hello(char);"!!!)


Concordantly, Mr Linker won't even hear about these functions! And what he
doesn't know, he can't complain about! I just love when a complex concept
has been broken down to get a true understanding of it!

It's oh so simple.


----


Next thing, I'm going to have a search of the web for "C++ namespaces". If I
don't find anything, yous will hear from me!

----


So just how much of all this is in the C++ Standard?

I pressume that the Preprocessor is.
I pressume that the Compiler is.

Is the Linker a part of the C++ Standard?


....this may sound naive, but why the hell isn't name mangling standardized?!


---

Now, C++ progs start at the function entitled "main", right? Is this in the
C++ Standard? Can it be changed? And if it can be changed, is this permitted
by the C++ Standard? For instance, take a Win32 program (don't worry, I'm
not going off-topic here!), you generate a new project and you're given:

int WinMain(char blah, float nre, int k)
{


return 0;
}


So I think to myself: "Is my program starting at "WinMain", or is it
starting at "main", and is "main" just hidden away in some object/library
file and looks like this:

int main(void)
{
//blah blah blah
WinMain('j',7.8,4);
}


I, in vain, browse through my project settings seeking enlightenment,
looking for an answer. One thing I tried though, I wrote a "main" function
alongside the "WinMain" function, and to my surprise, _MY_ "main" function
overrode the "WinMain" function, which suggests to me that, Yes, my program
does begin with "main", and Yes, that one of these crappy little object
files that I didn't ask for DOES in fact have a "main" function in it. So
now my next question is, "Why don't these 2 "main" functions generate a
"multiple definition" error with Mr Linker. I'm just guessing here, but does
it have something to do with "namespaces"? As soon as I've posted this I'm
off to learn about namespaces.


Thank you all for your time and help.


PS. C++ is the best programming language, right?!!



-JKop
 
K

Karl Heinz Buchegger

JKop said:
Having digested that... (Here I go again!)

Have I got the following correct?:

.cpp & .hpp ---> PREPROCESSOR
|
|
V
-----------------
|
|
|
V
Translation Unit ---> COMPILER
|
|
V
------------------
|
|
|
V
.obj Object File --------------------> LINKER <-------------- Other ".obj"s
|
|
V
------------------------------------
|
|
|
V
.EXE !!!

Correct.

(Q: Is there a "translation unit" file generated? If so, what's it's
extension? I pressume it's universal to all compilers, right?)

It depends :)
If there really is a 'translation unt' file generated, then it
usually is some temporary file with whatever extension the system
devloper choose. But there need not be that file explicitely. Some
operating systems provide e.g. piping capabilities. That means the
output of one program is feed as input to a second program immediatly
without storing it on disc. In such a scenario the preprocessor and the
real compiler run in parallel and the output of the preprocessot is 'piped'
into the compiler.

Having said that: Most systems have a compiler switch which enable
you to save away a copy of that translation unit after the preprocessor
has processed it. You then choose whatever extension pleases you.
If so, I now understand why there's no "multiple definition" problems from
the linker as regards inline functions defined within a header file, reason
being: The inline function becomes a part of the translation unit generated
from EVERY ".cpp" file that includes the header file, with... wait for it...
internal linkage!(ie. the "static" is implied)!!
Yep.


(BTW, I'm a great fan of specifics, therefore even with my header files I
write "extern int Hello(char*);" and with my function declarations, I never
write "int Hello()", I always write "int Hello(void)", I also always include
"public:" at the start of my classes, even though it's not required, but I
like it! Anyway, what I'm getting at is: I'm going to start writing "static
inline int Hello(char);"!!!)

I am not sure if that's a good idea.
As we say: 'If you are in Rom, do as the Romans do'.

It is not common C++ writing to write the static before inline. Personally
I think it would bring up more confusion in a team then it is good for.
Concordantly, Mr Linker won't even hear about these functions! And what he
doesn't know, he can't complain about! I just love when a complex concept
has been broken down to get a true understanding of it!

It's oh so simple.

You will find that most so called 'complex concepts' are very simple
in reality. The tricky part is to come up with the simple solution :)
 
P

Peter van Merkerk

So just how much of all this is in the C++ Standard?
I pressume that the Preprocessor is.
I pressume that the Compiler is.

Is the Linker a part of the C++ Standard?

If you ever get your hands on the holy standard you will probably be
disappointed. It is a very formal description (== boring and hard to read)
of the C++ language, and only the language not the tools! It does tell how
thing should behave (in lawyer talk), but not how it is to be implemented.
Consequently you won't find anything about the linker or some equivalent to
the ASCII drawings we have seen in this thread. It is certainly not a good
starting point to learn C++. I use the C++ standard at times to see if what
I am doing is legal C++ and to see if a construct is guaranteed to behave
the same on all compliant compilers.
...this may sound naive, but why the hell isn't name mangling
standardized?!

Good question, I don't know. Maybe it was considered to be beyond the scope
of the C++ standard and/or the standards comitee wanted to give the
implementers freedom (I don't know what good that has brought us).
Now, C++ progs start at the function entitled "main", right?

Not necessarilly. For example if there are global objects, their constructor
is (maybe*) called before main(). There are also some technicalities why
main() or even the global constructors are not the first code that is being
executed (see below).

* IIRC according to the standard the construction of global objects may be
postponed until the object is accessed for the first time (I don't have the
standard within arms reach at the moment to confirm this).
Is this in the C++ Standard?

Yes, main() is discussed in the C++ standard.
Can it be changed?

As far as the C++ standard is concerned no. But if you hack the runtime
library possibly yes.
And if it can be changed, is this permitted
by the C++ Standard?
No.

For instance, take a Win32 program (don't worry, I'm
not going off-topic here!), you generate a new project and you're given:

int WinMain(char blah, float nre, int k)
{


return 0;
}


So I think to myself: "Is my program starting at "WinMain", or is it
starting at "main", and is "main" just hidden away in some object/library
file and looks like this:

int main(void)
{
//blah blah blah
WinMain('j',7.8,4);
}

In the runtime library there is a entry point that is usually set with some
linker directives (this is all very platform/tool specific thus off-topic
for this group). The entry point is a pointer to the first code to be
executed when a program is loaded. You might think that the entry point
points to main() or WinMain(), but that is not the case. Before main() or
WinMain() can be executed all kinds initializiations need to be performed,
global objects need to constructed, and in case of main() the command line
arguments need to transformed into argv/argc parameters. This is all done by
the runtime library. After the startup code in the runtime library has
finished main() or WinMain() will be called. When the main() function
returns shutdown code needs to be executed. So the code of the main()
function is neither the first nor the last code of a program that is being
executed. That also explains why a "do nothing" programs requires so many
kilobytes.

The articles below discuss how things work on the MS Windows platform:
http://www.microsoft.com/msj/archive/S569.aspx
http://msdn.microsoft.com/msdnmag/issues/01/01/hood/
I, in vain, browse through my project settings seeking enlightenment,
looking for an answer. One thing I tried though, I wrote a "main" function
alongside the "WinMain" function, and to my surprise, _MY_ "main" function
overrode the "WinMain" function, which suggests to me that, Yes, my program
does begin with "main", and Yes, that one of these crappy little object
files that I didn't ask for DOES in fact have a "main" function in it. So
now my next question is, "Why don't these 2 "main" functions generate a
"multiple definition" error with Mr Linker. I'm just guessing here, but does
it have something to do with "namespaces"?

No, there is no main() function in the runtime library so there is no
conflict. I don't remember the details, but main() is a somewhat special
case.
PS. C++ is the best programming language, right?!!

Cross-posting this line to newsgroup dedicated to an other programming
language is a sure way to start a flamewar :)

Personally I don't believe there is such a thing as the "best programming
language". Every programming language (including C++) has it strenghts and
weaknesses which make it a good choice for some applications and a poor
choice for other applications. In my opinion C++ is good choice for a wide
range of applications, however it is not always the best choice. Especially
languages designed for a small application domain can do better in those
domains than C++. If you were allowed to learn only one programming language
I think C++ would be the best choice for various reasons. But it is always
good to know a few other languages as well, preferably not C++ wannabees,
but for example Python which nicely complements C++; it is strong where C++
is weak and visa versa. But for the moment just stick with C++ :)
 
J

JKop

Thanks.

Not necessarilly. For example if there are global objects, their
constructor is (maybe*) called before main(). There are also some
technicalities why main() or even the global constructors are not the
first code that is being executed (see below).


*Eyes widen* Enlightenment!


My program _does_ in fact start at "main". So I have:


int main(void)
{

return 0;
}



We _should_ have a blank prog. (Which in Win32, would yield a prog
with no Win32API references AT ALL)


What didn't cross my mind is that in one of the aformentioned "I
didn't ask for" libraries, there's something akin to this:


class CrappyClass
{
CrappyClass(void)
{
//evil stuff
}
};


CrappyClass CrappyObjectThatGetsInitializedBeforeMain;


Solution, have a chat with Mr Linker; tell him I only want _my_
object files to be linked. Now... apart from the initialization of
_my own_ objects, my prog _does_ start at main.

Looks like my paranoia as regards "libraries I didn't ask for" _is_
in fact justified. First it was just global variables taking up a few
more bytes; then it was all my token names being used up; and now
finally, the gremlins have hijacked my prog.
 
P

Pete Becker

JKop said:
...this may sound naive, but why the hell isn't name mangling standardized?!

Different compilers can lay objects out differently, can use different
schemes for handling exceptions, and do lots of other things
differently. Being able to link incompatible object files together
wouldn't do you any good.
 
P

Peter van Merkerk

Looks like my paranoia as regards "libraries I didn't ask for" _is_
in fact justified. First it was just global variables taking up a few
more bytes; then it was all my token names being used up; and now
finally, the gremlins have hijacked my prog.

Take a chill pill. Yes, you usually link in more than than what is strictly
needed (or at least more than you think is needed - there may be
difference). It most cases it does not really matter. In those rare cases it
does matter you can get rid of the unneeded stuff, but that can take time
and effort. If you really want to you can make Win32 programs as small as 2
KByte, which mostly consist of mandatory headers and padding need because of
section alignment requirements. The links in my previous post give you some
handles how to take matters in your own hands if the need ever arises.
 
R

Richard Herring

Karl Heinz Buchegger said:
Sorry. No. They are *linked* into an executable.
compile: transform the human readable source code into
.... an intermediate representation of ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,143
Messages
2,570,822
Members
47,368
Latest member
michaelsmithh

Latest Threads

Top