Why does execution start at main()?

G

glen herrmannsfeldt

Nope, if the compiler is careful. Because MAIN is not a reserved name,
the compiler must do whatever magic is needed to avoid conflicts:

(snip of one example)
This exercise also reveals the presence of a main in the executable
binary. A strong clue that the Fortran startup module is merely an
extension of the C startup module (the Fortran startup module begins
with a main function).

Of course, this analysis is valid only for the implementation used in
this experiment (g77 under Linux), but it shows that Glen's issue is
easily tractable.

I believe that in the OS/360 case, and I believe that VS/Fortran
still follows this, the name of the main program is changed
with the NAME= compiler option. A main program without that
option would be unable to call subroutines or functions named MAIN.

I never had the desire to name a subroutine main.

-- glen
 
D

Daniel Rudy

And somewhere around the time of 05/14/2004 00:23, the world stopped and
listened as Stephen Sprunk contributed the following to humanity:
The same magical fairy that calls _start() on your implementation.

Of course, since most OSes need to support binaries from arbirary languages,
picking a fixed name for program entry and using it for the C compiler's
runtime startup module is an obvious choice (though not the only one).
Runtime code for other languages would do something else if they have no
special meaning for the "main" label.

However, the short answer to the OP's question is "because the Standard says
so."

S

The main() function is used by the C/C++ compiler to designate the entry
point of the program to the linker in either a.out, object, or obj
format. From the Unix environment, the ELF header specifies the entry
point of the program. Even in DOS/Windows, there is a field in the EXE
header that's labeled entry point. Here's a segment of the man elf page
on my FreeBSD Unix system:

Elf32_Addr Unsigned program address
Elf32_Half Unsigned halfword field
Elf32_Off Unsigned file offset
Elf32_Sword Signed large integer
Elf32_Word Field or unsigned large integer
Elf32_Size Unsigned object size

typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf32_Half e_type;
Elf32_Half e_machine;
Elf32_Word e_version;
Elf32_Addr e_entry; **** Entry Point
Elf32_Off e_phoff;
Elf32_Off e_shoff;
Elf32_Word e_flags;
Elf32_Half e_ehsize;
Elf32_Half e_phentsize;
Elf32_Half e_phnum;
Elf32_Half e_shentsize;
Elf32_Half e_shnum;
Elf32_Half e_shstrndx;
} Elf32_Ehdr;

e_entry - This member gives the virtual address to which the system
first transfers control, thus starting the process. If the file has no
associated entry point, this member holds zero.

For the C/C++ programming language, it is defined that main() designates
start of execution of the program, which is called by the operating
system. The program header, which is produced by the linker, and is
programming language independent, gives the operating system all the
info it needs to properly load and execute the program.

#include <stdio.h>

int main() <--- Entry Point
{
printf("Hello World\n");
return(0);
}

In Pascal (My forte, and off topic in this group), the outermost begin
statement denotes the entry point.

Program Hello(Input, Output);

Begin <--- Entry Point
Writeln("Hello World");
End.


HTH.
 
O

Old Wolf

Stephen Sprunk said:
Executables for the particular OS I checked (Linux) certainly have a symbol
table: nm shows a _start near the beginning of every binary I checked.
However, it just occured to me to strip a binary (not the default) and the
symbol table does indeed disappear completely. Apparently the _start symbol
isn't magical, so there must be something in the ELF (or whatever) object
format that dictates entry at _start in my implementation.

One of the ELF header fields is the offset at which execution starts.
There's an interesting thread where a guy tried to make the smallest
possible ELF executable:
http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
 
A

August Derleth

I never had the desire to name a subroutine main.

What if you were designing a program for an electrical engineering bunch
and needed to determine the input in mili-amps to a specific part of the
circuitry? Remember that FORTRAN is case-insensitive as per the specs.

;)

(My point, in case it isn't obvious: "That word doesn't (always) mean what
you think it means.")
 
G

glen herrmannsfeldt

What if you were designing a program for an electrical engineering bunch
and needed to determine the input in mili-amps to a specific part of the
circuitry? Remember that FORTRAN is case-insensitive as per the specs.

That was supposed to be past tense.

For many years my favorite name for quick programs, or otherwise
needing a name, what some people call foo, is this. So I have
had this.for, this.c, and finally this.java. It turns out
that this is a reserved word in java, and a class can't be named
this.

There is another reason to rename main, though, and that is the
ability to put multiple programs into a library.

-- glen
 
J

James Kanze

|> |> > [...]
|> > > No. Dan Pop gave the correct explanation. Execution does NOT
|> > > start with main(), but with the C runtime ("crt") code which
|> > > sets up a runtime environment for main(), e.g.,: initializing
|> > > argc, argv and envp.

|> > Execution of what? Execution of the user's program starts with
|> > main. What happens before that is implementation-specific; there
|> > may or may not be anything called a "C runtime".

|> Yet "something" has to pass argc and the argv[] array to main().
|> While not specifically defined, it IS guaranteed by K&R. I don't
|> know if there's a standard that says whether quotes should be
|> stripped from arguments (i.e. echo arg1 "arg2 has spaces" arg3)...

The actual contents of argv are system dependant. I've used systems
which simply separated on blanks, quotes or no, and an implementation
which always called with argc == 2 and argv[1] containing the command
line would be legal.
 
J

James Kanze

(e-mail address removed) (Dan Pop) writes:

|> In <[email protected]> Leor Zolman
|> >>|> >>> [...]
|> >>> > No. Dan Pop gave the correct explanation. Execution does NOT
|> >>> > start with main(), but with the C runtime ("crt") code which
|> >>> > sets up a runtime environment for main(), e.g.,: initializing
|> >>> > argc, argv and envp.

|> >>> Execution of what? Execution of the user's program starts with
|> >>> main. What happens before that is implementation-specific; there
|> >>> may or may not be anything called a "C runtime".

|> >>Yet "something" has to pass argc and the argv[] array to main().
|> >>While not specifically defined, it IS guaranteed by K&R. I don't
|> >>know if there's a standard that says whether quotes should be
|> >>stripped from arguments (i.e. echo arg1 "arg2 has spaces" arg3)...

|> >Couldn't that "something" be the OS directly? It doesn't seem to me
|> >that there's anything that would preclude argc/argv being set up
|> >for a C (or whatever) program directly by the
|> >shell/command-processor, with the C-generated executable having an
|> >entry point directly at the start of main().

|> This is a highly unrealistic approach.

So unrealistic that it is actually used by some OS's. Like all Posix
compliant systems, or Linux, for example.

Of course, just parsing the command line isn't all that ctr0 has to do;
under Posix, for example, the OS only knows low level file descriptors,
and crt0 sets up the FILE* stdin, stdout and stderr. (Although as far
as the Posix standard is concerned, it doesn't have to be this way.)

|> You don't want to include information about the C implementation in
|> other components of the system, that couldn't and shouldn't care
|> less.

Unix, at least, traditionally has an intimate relationship with C.

|> All you need is a well defined interface for passing the command
|> line information from the command line processor to the called
|> program, regardless of the language in which it was originally
|> written. It is the C startup module's job to obtain this
|> information and to pass it to main() according to the C standard's
|> conventions. This way, the command line processor need not care
|> about the language used to write the executable that is being
|> started.

In practice, under Unix, it is the exec'ing process which sets up the
argc/argv for the invoked process. This leads to some interesting
questions: is it even possible to implement a hosted C under Unix, since
there is no way to ensure that argv[0] contains the program name (and in
fact, several common Unix programs count on the fact that it doesn't).

And of course, about all the standard requires is that argv[0] either
contain the program name, or be an empty string. Provided that argc is
greater than zero, of course -- it would be perfectly conforming for an
implementation to always start main with argc == 0. And even
reasonable, if the system had no concept of command line.
 
J

James Kanze

(e-mail address removed) (Dan Pop) writes:

|> >> No. Dan Pop gave the correct explanation. Execution does NOT
|> >> start with main(), but with the C runtime ("crt") code which sets
|> >> up a runtime environment for main(), e.g.,: initializing argc,
|> >> argv and envp.

|> >where does it say this in the standard?

|> 5.1.2.2.1

|> The values of argc and argv must come from somewhere. This somewhere
|> is typically called the C runtime environment or C startup module.

Or the OS, or the invoking process, or ...
 
J

Joe Wright

glen said:
That was supposed to be past tense.

For many years my favorite name for quick programs, or otherwise
needing a name, what some people call foo, is this. So I have
had this.for, this.c, and finally this.java. It turns out
that this is a reserved word in java, and a class can't be named
this.

There is another reason to rename main, though, and that is the
ability to put multiple programs into a library.

It has never occurred to me to put complete programs in a library.
Why would you do that?
 
D

Dik T. Winter

> Executables for the particular OS I checked (Linux) certainly have a symbol
> table: nm shows a _start near the beginning of every binary I checked.
> However, it just occured to me to strip a binary (not the default) and the
> symbol table does indeed disappear completely. Apparently the _start symbol
> isn't magical, so there must be something in the ELF (or whatever) object
> format that dictates entry at _start in my implementation.

No. Under Linux (and its Unix predecessors) execution starts at code
address 0 or somesuch. Indeed, a fixed address.
 
D

Dik T. Winter

> glen herrmannsfeldt wrote: ....
>
> It has never occurred to me to put complete programs in a library.
> Why would you do that?

In some OS's (and I think Glen is describing one of them), a program
was processed either from a library (system or custom) or from your
current set of directly available files. So when you have a set of
commands you regularly use, you put those in a library, and put that
library in the search list.
 
S

Stephen Sprunk

Dik T. Winter said:
Executables for the particular OS I checked (Linux) certainly have a symbol
table: nm shows a _start near the beginning of every binary I checked.
However, it just occured to me to strip a binary (not the default) and the
symbol table does indeed disappear completely. Apparently the _start symbol
isn't magical, so there must be something in the ELF (or whatever) object
format that dictates entry at _start in my implementation.

No. Under Linux (and its Unix predecessors) execution starts at code
address 0 or somesuch. Indeed, a fixed address.[/QUOTE]

Linux will load ELF objects and begin execution wherever the ELF header
specifies. There may be common values for those addresses, but any value in
the first 2GB of virtual memory works on the i386 platform. I don't know
about other platforms Linux runs on, nor do I have any clue how COFF objects
work on i386.

[ Which, of course, means that _start is not meaningful to the loader and
therefore is specific to GCC on my platform -- something I hadn't bothered
to investigate before ]

S
 
V

Villy Kruse

For a dynamic linked program there is a separate symbol table for use by the
dynamic linker and that symbol table isn't removed by strip. Neither is
it shown using the nm program, only objdump -T will display it.
No. Under Linux (and its Unix predecessors) execution starts at code
address 0 or somesuch. Indeed, a fixed address.

Must by "somesuch" as address 0 does not become part of your program
under Linux.

The object contains some headers, and one of the fields of one of the headers
have the address of the entry point. This address is determined by the linker,
on unix usuly by specifying the -e option or if no -e option is specified the
address of a default symbol name becomes the entry point.

Things do get quite a bit more complicated for a dynamic linked program.
For thise program execution starts in the dynamic linker; the main program
would be pretty helpless until the dynamic linker has found and loaded
the C runtime library.

Villy
 
I

Irrwahn Grausewitz

Stephen Sprunk said:
"Dik T. Winter" <[email protected]> wrote:

Linux will load ELF objects and begin execution wherever the ELF header
specifies. There may be common values for those addresses, but any value in
the first 2GB of virtual memory works on the i386 platform. I don't know
about other platforms Linux runs on, nor do I have any clue how COFF objects
work on i386.

<OT>
FWIW, the COFF headers for program images contain a field called
'AddressOfEntryPoint', telling the loader the starting address
relative to the image base.
</OT>

Regards
 
D

Dan Pop

In said:
(e-mail address removed) (Dan Pop) writes:

|> In <[email protected]> Leor Zolman
|> >>|> >>> [...]
|> >>> > No. Dan Pop gave the correct explanation. Execution does NOT
|> >>> > start with main(), but with the C runtime ("crt") code which
|> >>> > sets up a runtime environment for main(), e.g.,: initializing
|> >>> > argc, argv and envp.

|> >>> Execution of what? Execution of the user's program starts with
|> >>> main. What happens before that is implementation-specific; there
|> >>> may or may not be anything called a "C runtime".

|> >>Yet "something" has to pass argc and the argv[] array to main().
|> >>While not specifically defined, it IS guaranteed by K&R. I don't
|> >>know if there's a standard that says whether quotes should be
|> >>stripped from arguments (i.e. echo arg1 "arg2 has spaces" arg3)...

|> >Couldn't that "something" be the OS directly? It doesn't seem to me
|> >that there's anything that would preclude argc/argv being set up
|> >for a C (or whatever) program directly by the
|> >shell/command-processor, with the C-generated executable having an
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|> >entry point directly at the start of main().
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|> This is a highly unrealistic approach.

So unrealistic that it is actually used by some OS's. Like all Posix
compliant systems, or Linux, for example.

Nope. None of them invokes the main function *directly*.

Dan
 
D

Dan Pop

In said:
(e-mail address removed) (Dan Pop) writes:

|> >> No. Dan Pop gave the correct explanation. Execution does NOT
|> >> start with main(), but with the C runtime ("crt") code which sets
|> >> up a runtime environment for main(), e.g.,: initializing argc,
|> >> argv and envp.

|> >where does it say this in the standard?

|> 5.1.2.2.1

|> The values of argc and argv must come from somewhere. This somewhere
|> is typically called the C runtime environment or C startup module.

Or the OS, or the invoking process, or ...

Concrete examples, please, but not *bogus* ones. Under Unix, main does
receive its arguments from the startup module and not from anything else.
It is the startup module that receives them from the invoking process, via
the OS, but this is NOT the same thing.

Dan
 
M

Michael Wojcik

In practice, under Unix, it is the exec'ing process which sets up the
argc/argv for the invoked process.

Debatable. In Unix-like systems, the caller of the exec* family does
set the values used for argv, and by extension for argc, but IMO this
does not constitute "set[ting] up argc/argv for the invoked process".
The caller does not modify the invoked process's address space. It's
still up to the C startup code to create the actual parameter list for
main.
This leads to some interesting
questions: is it even possible to implement a hosted C under Unix, since
there is no way to ensure that argv[0] contains the program name (and in
fact, several common Unix programs count on the fact that it doesn't).

And of course, about all the standard requires is that argv[0] either
contain the program name, or be an empty string.

Since the standard does not define "program name", any non-empty
string is acceptable. In fact, as I read C90 5.1.2.2.1, the "program
name" is no more and no less than the contents of argv[0] (if argc >
0 and argv[0][0] != 0). Correspondence with any string known to the
OS or the user is felicitous but not required.

--
Michael Wojcik (e-mail address removed)

The surface of the word "profession" is hard and rough, the inside mixed with
poison. It's this that prevents me crossing over. And what is there on the
other side? Only what people longingly refer to as "the other side".
-- Tawada Yoko (trans. Margaret Mitsutani)
 
G

glen herrmannsfeldt

Joe Wright wrote:

(snip)
It has never occurred to me to put complete programs in a library. Why
would you do that?

For OS/360 and successors, through MVS, OS/390, z/OS...
the link editor has the ability to read its own output,
and the usual form of an object library is such output.

There are some cases where it is useful to store main programs
in such a library, and later link them with other subroutines,
either from a library or supplied by the user.

One might, for example, have a main program that would do
numerical integration or root finding on a supplied function.
Source for the function would then be linked with a main
program from a library and run.

As far as complete programs, not just main programs without
called subprograms, yes, OS/360 and successors store them
in libraries. Maybe similar to Java's jar files, a single
disk file contains multiple executable programs.

-- glen
 
J

James Kanze

(e-mail address removed) (Michael Wojcik) writes:

|> In article
|> <[email protected]>, James

|> > In practice, under Unix, it is the exec'ing process which sets up
|> > the argc/argv for the invoked process.

|> Debatable. In Unix-like systems, the caller of the exec* family does
|> set the values used for argv, and by extension for argc, but IMO
|> this does not constitute "set[ting] up argc/argv for the invoked
|> process". The caller does not modify the invoked process's address
|> space. It's still up to the C startup code to create the actual
|> parameter list for main.

At least in some older Unix, the parameters to exec were copied directly
into the new program stack by the OS. All the start up routine did with
them was pass the argc and the argv on.

|> > This leads to some interesting questions: is it even possible to
|> > implement a hosted C under Unix, since there is no way to ensure
|> > that argv[0] contains the program name (and in fact, several
|> > common Unix programs count on the fact that it doesn't).

|> > And of course, about all the standard requires is that argv[0]
|> > either contain the program name, or be an empty string.

|> Since the standard does not define "program name", any non-empty
|> string is acceptable. In fact, as I read C90 5.1.2.2.1, the "program
|> name" is no more and no less than the contents of argv[0] (if argc >
|> 0 and argv[0][0] != 0). Correspondence with any string known to the
|> OS or the user is felicitous but not required.

Well, I guess that's one way of looking at it. So the program we
generally call bash has the name -bash when started by login.
 
D

Dan Pop

In said:
As far as complete programs, not just main programs without
called subprograms, yes, OS/360 and successors store them
in libraries. Maybe similar to Java's jar files, a single
disk file contains multiple executable programs.

Probably as a lame compensation for not providing a decent hierarchical
file system. Such a file could be considered as a poor surogate of a
directory.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,142
Messages
2,570,820
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top