Why does execution start at main()?

D

Dan Pop

In said:
And call me stupid, but I need to do a google on "hosted" and
"freestanding" implementations.

No need to call you stupid: not every C programmer is supposed to be
familiar with the jargon of the C standard. But we're using it here,
because it's better than adopting our own jargon.

Dan
 
D

Dan Pop

In said:
On 12 May 2004 14:09:48 GMT, (e-mail address removed) (Dan Pop) wrote:
write the executable that is being started.

From the perspective of what the Standard stipulates, though, an
environment such as the one I described looks like it would be compliant. I

True, and irrelevant. The convention was made with *real* implementations
in mind, by someone interested in creating a language for solving concrete
problems. The standard merely codified the existing practice, as it had
no other valid alternative.

When the answer to a question *necessarily* involves practical aspects
of real implementations, there is little point in speculating about
hypothetical implementations allowed by the standard.

Dan
 
C

CBFalconer

Dan said:
The name is unimportant, what really matters is that a C startup
module that actually calls the main() function must exist on
hosted implementations. At least on those providing meaningful
suport for main's arguments.

The only thing that forces such a "crt" module is the requirement
that main() be a re-entrant function. Without that initial code
in main could do all the system dependant initialization.
 
D

Dan Pop

In said:
The only thing that forces such a "crt" module is the requirement
that main() be a re-entrant function. Without that initial code
in main could do all the system dependant initialization.

Reentrance is a non-issue, as main could use an internal static flag to
distinguish between its initial call and the others. But who's going to
make the initial call of main()? ;-)

Dan
 
M

Malcolm

Dan Pop said:
The name is unimportant, what really matters is that a C startup
module that actually calls the main() function must exist on hosted
implementations. At least on those providing meaningful suport for
main's arguments.
This is worth pointing out. On hosted implementations there has to be some
code to set everything up and code to free memory and flush buffers on exit.
So the compiler seldom if ever compiles main as a normal function with the
entry point also happening to be the first instruction in the executable.
 
A

Arthur J. O'Dwyer

Leor Zolman said:
Yet "something" has to pass argc and the argv[] array to main(). While not
specifically defined, it IS guaranteed by K&R. I don't know if there's a
standard that says whether quotes should be stripped from arguments (i.e.
echo arg1 "arg2 has spaces" arg3)...

Couldn't that "something" be the OS directly? It doesn't seem to me that
there's anything that would preclude argc/argv being set up for a C (or
whatever) program directly by the shell/command-processor, with the
C-generated executable having an entry point directly at the start of
main().

This is a highly unrealistic approach.

This is, however, essentially the approach taken by *nix systems
as I see and understand them. The shell takes care of; sets up argv (and implicitly argc), and passes those along to
a helper function supplied by the system which presumably sets up
argc and argv in an OS-standard location so that the C program can
access them. Now, it may be that in most *nix systems, argc and argv
are then fetched out of that standard location, possibly munged a
little, and then put somewhere different, by the "C runtime." But
there's no reason for an implementation to bother moving them around.
You don't want to include
information about the C implementation in other components of the system,
that couldn't and shouldn't care less. All you need is a well defined
interface for passing the command line information from the command line
processor to the called program, regardless of the language in which it
was originally written. It is the C startup module's job to obtain this
information and to pass it to main() according to the C standard's
conventions. This way, the command line processor need not care about
the language used to write the executable that is being started.

Special case of the above: the command line processor could use the
same well-known and widespread standard as the C language. Then programs
written in C don't need any magic startup code.
On the contrary, a well thought of argument *must* take [I/O startup]
issues into account. Note that even Unix, which is the ideal
environment for such an approach, uses the classical startup module
solution instead!

One reason not to offload "startup" chores onto the shell is that
the shell itself has to start up at some point, too. Thus we would
need to make sure the kernel started up the shell using that same
startup code. But if the startup code was invoked simply with a
system call, anyway, I suppose there wouldn't be any maintenance
problem there.

-Arthur
 
K

Keith Thompson

Mabden said:
Keith Thompson said:
Execution of what? Execution of the user's program starts with main.
What happens before that is implementation-specific; there may or may
not be anything called a "C runtime".

Yet "something" has to pass argc and the argv[] array to main(). While not
specifically defined, it IS guaranteed by K&R. I don't know if there's a
standard that says whether quotes should be stripped from arguments (i.e.
echo arg1 "arg2 has spaces" arg3)...

Nothing in the C standard addresses the presence or absence of quotes
in command-line arguments (argv[]).

<OT>
In Unix, there's nothing special about quotation marks as far as
program invocation is concerned. They're used by the shell as part of
the syntax that specifies what arguments are going to be passed (so in
that sense they're "stripped"), but once that's determined the string
passed to as a program argument is just an arbitrary nul-terminated
string. I'm not as familiar with other systems, but I suspect many of
them are similar.
</OT>
 
C

CBFalconer

Malcolm said:
This is worth pointing out. On hosted implementations there has
to be some code to set everything up and code to free memory and
flush buffers on exit. So the compiler seldom if ever compiles
main as a normal function with the entry point also happening to
be the first instruction in the executable.

Barring some wierd and wonderful use of statics, this cannot be
done in C because main must be callable recursively. The
following should run:

int main(int argc, char **argv)
{
if (argc) return main(--argc, argv);
return argc;
}
 
R

Richard Bos

CBFalconer said:
Barring some wierd and wonderful use of statics, this cannot be
done in C because main must be callable recursively. The
following should run:

int main(int argc, char **argv)
{
if (argc) return main(--argc, argv);
return argc;
}

This use of statics need hardly be weird and wonderful. In fact, it can
be as simple as this:

int __main_has_been_initialised=0;

int main(int argc, char **argv)
{
if (!__main_has_been_initialised) {
/* System-specific code to set up the program's data tables,
stack, and allocation arena, initialise the necessary statics
other than __main_has_been_initialised, initialise argc and
argv, and so forth. */
__main_has_been_initialised=1;
}

if (argc) return main(--argc, argv);
return argc;
}

All the rudiments of the startup code need to do now is initialise
__main_has_been_initialised, and call main(). Since the former can be
done by making it, and its initial value, part of the program image, all
that is left to do is call main() - making main() the startup code point
for that program!
Of course, this is not, AFAIAA, how most compilers and OSes actually
_do_ set up a program. But they _could_ do so, and it hardly qualifies
as "weird", IYAM.

Richard
 
V

Villy Kruse

Special case of the above: the command line processor could use the
same well-known and widespread standard as the C language. Then programs
written in C don't need any magic startup code.

Save for the fact that things gets more complicated when you involve
intializing locales, aranging for atexit, loading the dynamic libraries,
aranging for cleanup code (someone mentioned flushing file buffers and
closing files) to be run after main returns, etc.
By the way, on unix like systems most programs realy starts in the code of
/lib/ld.so or something similar.

In its most simple form the startup code is similar to

exit( main(argc, argv) );

Setting up argc and argv in kernel code isn't difficult. All the other
things are realy best suited for a user level program, and instead of
compiling it into the main program by the compiler, linking to a standard
pice of code seems the better approach. The side effect is that the compiler
doesn't have to make main() a special case.

For the C programmer, the program starts in main() at that should be the
only thing that realy matters. What goes on before that, being that in
kernel code or some user level start up and clean up code shouldn't be
of any concert to the programmer.

Villy
 
D

Dan Pop

In said:
Less so than discussions of programs without semicolons, though, no? ;-)

Pray tell, why are those programs irrelevant? They may not be
particularly interesting, once you've figured out how to code them, but
they are as relevant as you can get.

Dan
 
M

Michael Wojcik

Keith Thompson said:
[...]
No. Dan Pop gave the correct explanation. Execution does NOT start with
main(), but with the C runtime ("crt") code which sets up a runtime
environment for main(), e.g.,: initializing argc, argv and envp.

I haven't seen anyone point out yet that there is no "envp" argument
to main in C, in a conforming hosted implementation. In nonstandard
extensions to C, of course, there may all sorts of things.
Execution of what? Execution of the user's program starts with main.
What happens before that is implementation-specific; there may or may
not be anything called a "C runtime".

Yet "something" has to pass argc and the argv[] array to main(). While not
specifically defined, it IS guaranteed by K&R.

Not exactly. What's guaranteed by the standard is that in a hosted
implementation, program execution will begin in main, which will have
two arguments if it's defined as having two arguments. (It can also
be defined with no parameters.) The first argument will be an int
with a non-negative value; the second argument is a char ** with
certain guarantees about its value and the value of the char * objects
it points to.

Those guarantees are significant, but they are not strong enough to
require that all hosted implementations actually pass anything useful
to main.
Couldn't that "something" be the OS directly? It doesn't seem to me that
there's anything that would preclude argc/argv being set up for a C (or
whatever) program directly by the shell/command-processor, with the
C-generated executable having an entry point directly at the start of
main().

Whether a hosted implementation's execution environment is
conceptually divided into two entities called "OS" and "C runtime",
or divided up any other way, is purely a matter of definition as far
as C is concerned and completely irrelevant to the initial call to
main.

In my favorite oddball C implementation, EPM C for the AS/400, the
C execution environment is not "bound" to a C program at all; it
is part of a multilanguage execution environment that was loaded
by the OS on a job-by-job basis as needed. The first time an EPM
program is executed in an OS/400 job (which might be an interactive
session, or a batch job, or various other things), the EPM
environment is loaded and initialized. After that, any compiled C
program object can be invoked from anywhere in that job, until the
job ends or the EPM environment is explicitly terminated (by a system
call or user command).
 
S

Stephen Sprunk

Dan Pop said:
Reentrance is a non-issue, as main could use an internal static flag to
distinguish between its initial call and the others. But who's going to
make the initial call of main()? ;-)

The same magical fairy that calls _start() on your implementation.

Of course, since most OSes need to support binaries from arbirary languages,
picking a fixed name for program entry and using it for the C compiler's
runtime startup module is an obvious choice (though not the only one).
Runtime code for other languages would do something else if they have no
special meaning for the "main" label.

However, the short answer to the OP's question is "because the Standard says
so."

S
 
G

glen herrmannsfeldt

Beni said:
I have been programming in C for about a year now. It sounds silly,
but I never took the time to question why a C(or C++ or Java) program
execution begins only at the main(). Is it a convention or is there
some deeper underlying reason?

Fortran compilers I knew many years ago generated an object
program called MAIN for program units without a FUNCTION or
SUBROUTINE statement. I don't know, though, that the language
excluded calling a SUBROUTINE or FUNCTION the name MAIN, but
I think it would fail.

PL/I allows one to give the main program any name desired,
and the system figures out how to start it.

Note that in Java applets the outer level user written method
is not called main, but for other programs it is main.

Just a convention that got incorporated into the language.

-- glen
 
D

Dan Pop

In said:
The same magical fairy that calls _start() on your implementation.

This fairy has no clue about the C standard and doesn't even want to
hear about it.
Of course, since most OSes need to support binaries from arbirary languages,
picking a fixed name for program entry and using it for the C compiler's
runtime startup module is an obvious choice (though not the only one).
Runtime code for other languages would do something else if they have no
special meaning for the "main" label.

That's the point: use a language neutral entry point and command line
passing convention and let the startup module of each language runtime
support handle the details. The functions performed by the startup module
belong to neither the OS nor the code generated by the compiler from the
application's sources. Neither of these entities wants to know anything
about the other.
However, the short answer to the OP's question is "because the Standard says
so."

There is no point in providing a stupid answer to a sensible question.

Dan
 
K

Kenneth Brody

Stephen Sprunk wrote:
[...]
The same magical fairy that calls _start() on your implementation.

Of course, since most OSes need to support binaries from arbirary languages,
picking a fixed name for program entry and using it for the C compiler's
runtime startup module is an obvious choice (though not the only one).
[...]

Strange... Every O/S I've used has no concept of "a fixed name for program
entry point", as executables don't even include symbol tables. (Except for
debugging purposes, it you so choose.)
 
D

Dan Pop

In said:
Fortran compilers I knew many years ago generated an object
program called MAIN for program units without a FUNCTION or
SUBROUTINE statement. I don't know, though, that the language
excluded calling a SUBROUTINE or FUNCTION the name MAIN, but
I think it would fail.

Nope, if the compiler is careful. Because MAIN is not a reserved name,
the compiler must do whatever magic is needed to avoid conflicts:

fangorn:~/tmp 41> cat main.f
subroutine main
print *,'hello world'
end

call main
end
fangorn:~/tmp 42> f77 -c main.f
fangorn:~/tmp 43> nm main.o | grep -i main
00000000 T main_
0000003d T MAIN__

So, it appears that the name of the main program is actually MAIN_ and
the compiler mangles its symbols by appending an underscore (to avoid
clashes with the C library, because there is nothing preventing the
Fortran programmer from using names like puts and getc in his own code.
So, let's see what happens if I rename my main as main_:

fangorn:~/tmp 47> nm main.o | grep -i main
00000000 T main___
0000003d T MAIN__

The compiler detected the clash and appended two underscores to my main_
to keep both my code and its startup module happy. Needless to say, both
versions work as expected if properly linked:

fangorn:~/tmp 48> f77 main.f
fangorn:~/tmp 49> nm a.out | grep -i main
U __libc_start_main@@GLIBC_2.0
080486f0 T main
08048694 T main___
080486d1 T MAIN__
fangorn:~/tmp 50> ./a.out
hello world

This exercise also reveals the presence of a main in the executable
binary. A strong clue that the Fortran startup module is merely an
extension of the C startup module (the Fortran startup module begins
with a main function).

Of course, this analysis is valid only for the implementation used in
this experiment (g77 under Linux), but it shows that Glen's issue is
easily tractable.

Dan
 
S

Stephen Sprunk

Kenneth Brody said:
Strange... Every O/S I've used has no concept of "a fixed name for program
entry point", as executables don't even include symbol tables. (Except for
debugging purposes, it you so choose.)

Executables for the particular OS I checked (Linux) certainly have a symbol
table: nm shows a _start near the beginning of every binary I checked.
However, it just occured to me to strip a binary (not the default) and the
symbol table does indeed disappear completely. Apparently the _start symbol
isn't magical, so there must be something in the ELF (or whatever) object
format that dictates entry at _start in my implementation.

S
 
G

glen herrmannsfeldt

Kenneth Brody wrote:

(snip)
Strange... Every O/S I've used has no concept of "a fixed name for program
entry point", as executables don't even include symbol tables. (Except for
debugging purposes, it you so choose.)

OS/360, and its successors, include external symbols in the
load module. The entry point is indicated, I believe in symbolic
form, in the end record of the load module. More complete symbol
tables can be included for debugging purposes.

Even more, a single load module can have multiple entry points,
similar to the multiple ENTRY points allowed in a Fortran subroutine.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,142
Messages
2,570,820
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top