c source

M

Michael Gaab

I need some help in finding a program written in c that is large enough that
it takes
10-20 minutes to compile/link on an *average* machine.
Why? I am in a HPC class and I am thinking about trying to speed up
compiling of larger programs. At least not so large at first.

I have done a bunch of searching. I have found some apps but most of them
are too big and too complicated.


I know this is a bit off topic, apologies to the purists..
Thanks for any help.

Mike
 
S

Sidney Cadot

Michael said:
I need some help in finding a program written in c that is large enough that
it takes
10-20 minutes to compile/link on an *average* machine.
Why? I am in a HPC class and I am thinking about trying to speed up
compiling of larger programs. At least not so large at first.

Just a few questions and thoughts:

Should it be a single program file, or are multiple files ok? Are you
planning on timing the compilation-step proper, or also the
preprocessing, assembly, and linking steps?

Will this be just an exercise, or are you aiming for something that is
practically useful? In the latter case there should be examples already
out there :)
I have done a bunch of searching. I have found some apps but most of them
are too big and too complicated.

Something that compiles for 10--20 minutes is almost bound to be either
big or complicated. One way of getting e.g. gcc to slow down (even with
not-so-big-or-not-so-complicated programs) is to enable function
inlining and set the inlining threshold to a very high value. This
worked wonders for the simple chess program I once wrote, it took ages
to compile. Of course, this is mainly because the compiler's data
structures blow up. You're essentially measuring near-worst-case cache
behaviour for memory/cpu throughput or even I/O throughput (swapping) in
cases like this.

Another way of producing high compile times is of course to generate
source code. Anyway, this would usually yield non-typical code compared
to real application code, so that is perhaps not suited for you. And to
get in the 10--20 minutes range, it may be better to write a program
generates a program that generates a program, or something like that :)

Of course, in C++ this would be a much easier task since you can do
awful things with templates.

Best regards,

Sidney
 
F

fermath

Hi

Michael said:
I need some help in finding a program written in c that is large enough
that it takes
10-20 minutes to compile/link on an *average* machine.
Why? I am in a HPC class and I am thinking about trying to speed up
compiling of larger programs. At least not so large at first.

I have done a bunch of searching. I have found some apps but most of them
are too big and too complicated.


I know this is a bit off topic, apologies to the purists..
Thanks for any help.

Mike

Something that takes 10-20 minues to compile needs to be complicated ... :)
 
M

Michael B Allen

I need some help in finding a program written in c that is large enough
that it takes
10-20 minutes to compile/link on an *average* machine. Why? I am in a
HPC class and I am thinking about trying to speed up compiling of larger
programs. At least not so large at first.

I have done a bunch of searching. I have found some apps but most of
them are too big and too complicated.

Look at big open source projects like X, the Linux kernel, or Samba. All
of these are several hundred thousand lines of code. Of course anything
this size is going to be complicated. If you need something predictable
and ansi that you can try with different compilers and platforms then I
think your best bet will be to dynamically generate it. Think of 10
C constructs that can be "exploded" to be huge like huge static structure
initializers or large combinations of conditional statements that call
random functions and weave the whole thing together into one big 1MB set
of 10 source files. You probably only need it to compile for 2 minutes
and loop 10 times for startup costs to be statistically insignificant as
opposed to compiling something for 20 minutes.

Mike
 
M

Michael Gaab

Sidney Cadot said:
Just a few questions and thoughts:

Should it be a single program file, or are multiple files ok?

Multiple files.
Are you
planning on timing the compilation-step proper, or also the
preprocessing, assembly, and linking steps?

We were planning on parallelizing the compilation, e.g., gcc -c. I am not
a compiler expert, or for that matter an expert at anything. :) Just keep
plugging along with what God gave me. Doing the best I can.
Will this be just an exercise, or are you aiming for something that is
practically useful? In the latter case there should be examples already
out there :)

It is for a project in a parallel program class. Now that I have a better
understanding of HPC, I think I would have chosen a different thing to
parallel. But then on the other hand, it seems very practical if
complication/linking could be sped up on large projects. Not sure about
that assumption because I have never worked on any.
Something that compiles for 10--20 minutes is almost bound to be either
big or complicated. One way of getting e.g. gcc to slow down (even with
not-so-big-or-not-so-complicated programs) is to enable function
inlining and set the inlining threshold to a very high value. This
worked wonders for the simple chess program I once wrote, it took ages
to compile.

Would you mind sharing the chess program? The issue in our class is whether
or not we can realize any speedup, not whether or not we have developed the
code ourselves or not.

I am not sure about inlining, I'll have to play around with that a bit.

Of course, this is mainly because the compiler's data
structures blow up. You're essentially measuring near-worst-case cache
behaviour for memory/cpu throughput or even I/O throughput (swapping) in
cases like this.

Another way of producing high compile times is of course to generate
source code. Anyway, this would usually yield non-typical code compared
to real application code, so that is perhaps not suited for you.

Like I said, we are only interested in speedup. How do you generate source
code? A flag?
And to
get in the 10--20 minutes range, it may be better to write a program
generates a program that generates a program, or something like that :)

How would you do that?

Thanks a lot for your help.
Mike
 
M

Michael Gaab

Michael B Allen said:
Look at big open source projects like X, the Linux kernel, or Samba.

I have and they are way too complicated for what I need. I am sure I could
do it, just don't have the time.
All
of these are several hundred thousand lines of code. Of course anything
this size is going to be complicated.

Yes, very complicated. More complicated than I had imagined.
If you need something predictable
and ansi that you can try with different compilers and platforms then I
think your best bet will be to dynamically generate it. Think of 10
C constructs that can be "exploded" to be huge like huge static structure
initializers or large combinations of conditional statements that call
random functions and weave the whole thing together into one big 1MB set
of 10 source files.

This is what I need. We have limited disc space on the cluster.
How elaborate would the functions need to be?
You probably only need it to compile for 2 minutes
and loop 10 times for startup costs to be statistically insignificant as
opposed to compiling something for 20 minutes.

That sounds reasonable.

Thanks for your help.
Mike
 
M

Michael Gaab

> Unless, of course, you consider flocking out the multiple C files over a
farm of compiling machines, then collecting the results for linking.
However, this would probably be too trivial.

Yes trivial with a small project but not so with a program like Linux.
I think we are going to break our project into two parts. Part one will be
to prove the concept. Part two will be to reverse engineer Linux using
Rational and start to develop a scheme to parallelize Linux. I said start.

Sure, look at http://libchess.sourceforge.net. However it is not nearly
finished and not in a usable state at the moment. If you want I can send
you a small tarball with a version that is complete enough to solve
chess problems. A suggestion: this may be a better thing to parallelize
than a full compiler, at least it's a well-understood and limited problem.

Too late in the semester to change now. With suggestions from you and one
other
fellow I was able to create a single .5 MB file that took over 3 minutes to
compile.
I'll take this file and duplicate it a number of times but also I will break
it up in such a
way as to simulate a real program with dependencies, etc.
in gcc, you'll have to use -finline-limit=<some number>. This worked
nicely on the chess example since it will also inline recursive function
invocations :)
Thanks.


With a program, which may be programmed in C.

Oh, I miss understood.

With difficulty. It would be rather perverse.

I am sure.

Thanks again.
Mike
 
S

Sidney Cadot

Hi Michael,
We were planning on parallelizing the compilation, e.g., gcc -c. I am not
a compiler expert, or for that matter an expert at anything. :) Just keep
plugging along with what God gave me. Doing the best I can.

If you're not both a compiler expert and a parallelism expert, I'd say
this is aiming way, way too high. I personally know a guy who did a PhD
in parallelization of parsers (which is just the first step of the
compiler) and this took him 5 years.

The compiler-proper has several stages: first parsing (building an
abstract syntax tree), and then a couple of stages that perform
transformations on the AST, then emitting code. Even the parallelization
of one stage would be a tremendous task.

Unless, of course, you consider flocking out the multiple C files over a
farm of compiling machines, then collecting the results for linking.
However, this would probably be too trivial.
It is for a project in a parallel program class.

I would seriously urge you to reconsider doing this. You'll get nowhere
if you're not willing to spend a couple of years on this.
> Now that I have a better
understanding of HPC, I think I would have chosen a different thing to
parallel. But then on the other hand, it seems very practical if
complication/linking could be sped up on large projects. Not sure about
that assumption because I have never worked on any.

Farming out compilations over many machines is practical, and has been
done, at the file level. Anything else, I'd say, is 'only' of
theoretical interest for the time being.
Would you mind sharing the chess program? The issue in our class is whether
or not we can realize any speedup, not whether or not we have developed the
code ourselves or not.

Sure, look at http://libchess.sourceforge.net. However it is not nearly
finished and not in a usable state at the moment. If you want I can send
you a small tarball with a version that is complete enough to solve
chess problems. A suggestion: this may be a better thing to parallelize
than a full compiler, at least it's a well-understood and limited problem.
I am not sure about inlining, I'll have to play around with that a bit.

in gcc, you'll have to use -finline-limit=<some number>. This worked
nicely on the chess example since it will also inline recursive function
invocations :)
Like I said, we are only interested in speedup. How do you generate source
code? A flag?

With a program, which may be programmed in C.

e.g.

#include <stdio.h>
int main(int argc, char *argv[])
{
int howmany=0;
if (argc>1) sscanf(argv[1],"%d", &howmany);
puts("#include <stdio.h>");
puts("int main(void)\n{");
for(;howmany>=0;howmany--)
printf(" puts(\"%d bottles of beer on the wall.\");\n", howmany);
puts(" return 0;\n}");
return 0;
}
How would you do that?

With difficulty. It would be rather perverse.

Best regards, Sidney
 
J

Jeff Rodriguez

Michael said:
I need some help in finding a program written in c that is large enough that
it takes
10-20 minutes to compile/link on an *average* machine.
Why? I am in a HPC class and I am thinking about trying to speed up
compiling of larger programs. At least not so large at first.

I have done a bunch of searching. I have found some apps but most of them
are too big and too complicated.


I know this is a bit off topic, apologies to the purists..
Thanks for any help.

Mike
I can tell you right now, something that takes a long time to compile is
OpenSSL. That may be more complicated than you want though :)

Jeff
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,104
Messages
2,570,643
Members
47,247
Latest member
youngcoin

Latest Threads

Top