What has C++ become?

I

Ian Collins

James said:
They're related, but yes: I should have made it clear that I was
talking about compiler dependencies, and not design coupling.
Just to clarify, your objections are practical (tool limitations) rather
than philosophical?

If that is the case and you can't get a better hammer, use a bigger one.

I like to include build times as part one of my project requirements
(and yes, I do test it!). If the build times get too long, treat this
like any other design issue. Weigh the time/cost of design changes to
the code against design changes to the build environment. On past
projects, adding another machine to the build farm has been the more
cost effective option. This is probably more typical today with
plummeting hardware costs and rising labour costs.
 
J

James Kanze

Just to clarify, your objections are practical (tool
limitations) rather than philosophical?

My objections are always practical, rather than philosophical.
I'm a practicing programmer, not a philosopher. Using templates
today has a very definite cost.
If that is the case and you can't get a better hammer, use a
bigger one.

In other words, C++ isn't the language I should be using for
large applications? From what I can see, it's not really a very
good language, but all of the others are worse.

Note that the standard actually addressed this particular
problem, at least partially, with export, which the compiler
implementors have pretty much ignored. Part of the reason, no
doubt, is that it mainly affects application level code. And
there's really not that much use for templates at that level;
they're generally more appropriate for low level library code.

(The fact that there is a dependency on the implementation of
std::vector isn't generally considered a problem: std::vector is
part of the compiler, and when you upgrade the compiler, you do
a clean build anyway, regardless of how long it takes.)
I like to include build times as part one of my project
requirements (and yes, I do test it!). If the build times get
too long, treat this like any other design issue. Weigh the
time/cost of design changes to the code against design changes
to the build environment. On past projects, adding another
machine to the build farm has been the more cost effective
option. This is probably more typical today with plummeting
hardware costs and rising labour costs.

The problem is less total build time (at least until it starts
reaching the point where you can't do a clean build over the
week-end); it is recompilation times due to a change. In large
applications, for example, header files are generally frozen
early, and only allowed to change exceptionally. Recompile
times aren't the only reason for this, of course, but they're
part of it.

As for adding a machine to the build park: throwing more
hardware at a problem is often the simplest and most economic
solution (although in this case, the problem is perhaps more
linked with IO throughput than with actual CPU power---and
adding a machine can actually make things worse, but increasing
network load). But practically, in most enterprises, it's part
of a different budget:-(.
 
I

Ian Collins

James said:
As for adding a machine to the build park: throwing more
hardware at a problem is often the simplest and most economic
solution (although in this case, the problem is perhaps more
linked with IO throughput than with actual CPU power---and
adding a machine can actually make things worse, but increasing
network load). But practically, in most enterprises, it's part
of a different budget:-(.
On my last couple of C++ projects, I was fortunate to enough to be
responsible for both the build farm design and budget as well as the
software design. So neither problem arose :)
 
I

Ian Collins

Walter said:
Wow, I didn't know people actually used build farms for C++! How many
lines of code was that?

We never bothered to count.

I have been using distributed building for C and C++ for over a decade
now. All that's required is sensible compiler licensing and a decent
make system.
 
J

James Kanze

Wow, I didn't know people actually used build farms for C++!
How many lines of code was that?

And how many different versions does he need? If you have
separate debug and release versions, for each program, on each
target platform, you can easily end up with ten or fifteen
complete builds. And with enough templates in the header files,
it doesn't take very many lines of source code (less than a
million, even) to end needed a build farm, just to be able to do
a clean build over the week-end.

Or course, you usually have the material anyway. Just tell the
programmers to not turn their machines off when they go home for
the week-end.
 
N

Noah Roberts

James said:
The important thing to realise is that they're a tool. Like
most (or even all) tools, they have a cost. If the advantages
of using the tool outweigh the cost, then you should use it. If
they don't, then you shouldn't.

Well, I can agree with that but you seem to be making stuff up to argue
against using a tool. Like asserting, without basis, that templates are
only useful in "lower level code", only decouple in "lower level code",
and various other things that, quite frankly, make no sense at all.

You can't use screws where they are useful if you've got some sort of
weird prejudice against screwdrivers.
 
I

Ian Collins

James said:
And how many different versions does he need? If you have
separate debug and release versions, for each program, on each
target platform, you can easily end up with ten or fifteen
complete builds. And with enough templates in the header files,
it doesn't take very many lines of source code (less than a
million, even) to end needed a build farm, just to be able to do
a clean build over the week-end.
This project was about 300K lines including tests. A distributed clean
build (which included a code generation phase) took about 12 minutes,
which was too long (10 was the design limit). Any longer and
productivity would have been hit enough to add another node.
 
I

ian-news

I've looked into trying to make the C++ compiler multithreaded (so it
could use multi core computers) many times. There just isn't any way to
do it, compiling C++ is fundamentally a sequential operation. The only
thing you can do is farm out the separate source files for separate
builds. The limit achievable there is when there is one node per source
file.
The promlem of distributed building is best soved by a combination of
the build system and the compiler. The build system is responsible for
farming out jobs to cores and the compiler has to be parallel build
aware. Template instantiation is one area where some form of locking
of generated instantiation files may be required.

The two I use are gcc/GNU make which supports parallel building and
Sun CC/dmake which supports parallel and distributed builing.

The number of jobs per core depends on the nature of the code and
should be tuned for each project. Over a number of C++ projects I
have found 2 to 4 jobs per core to be a sweet spot. The projects all
used the many small source file model which works best with parallel
(and more so, distributed) building.

Parallel or distributed building has to be designed in to your process
from day one. Poorly designed makefiles or code layout can loose you
many of the possible gains.
 
I

ian-news

Well, I can agree with that but you seem to be making stuff up to argue
against using a tool. Like asserting, without basis, that templates are
only useful in "lower level code", only decouple in "lower level code",
and various other things that, quite frankly, make no sense at all.
I think James is pretty clear in his mention of a cost/benefit trade-
off.

If your process is designed for rapid building to offset the cost of
extra coupling then the advantages of templates may outweigh the
cost. If a clean build of your project takes a long time, the
productivity cost will outweigh any benefits.
 
J

James Kanze

I've looked into trying to make the C++ compiler multithreaded
(so it could use multi core computers) many times. There just
isn't any way to do it, compiling C++ is fundamentally a
sequential operation. The only thing you can do is farm out
the separate source files for separate builds. The limit
achievable there is when there is one node per source file.

The input must be scanned sequentially, I'm pretty sure, since a
#define along the way can clearly affect how the following
source is read. And I rather suspect that it must also be
parsed sequentially, since the grammar is not context
free---whether a symbol is the name of a type, the name of a
template, or something else, affects parsing. But once you've
got your parse trees, couldn't you parallelize the processing of
each function: low-level optimization and code generation?
 
J

James Kanze

I think James is pretty clear in his mention of a cost/benefit
trade- off.
If your process is designed for rapid building to offset the
cost of extra coupling then the advantages of templates may
outweigh the cost. If a clean build of your project takes a
long time, the productivity cost will outweigh any benefits.

The clean build isn't the problem. You can schedule that
overnight, or for a weekend. (For my library, a clean build for
all of the versions I support under Unix takes something like
eight hours. Which doesn't bother me too much.) The problem is
the incremental builds when someone bug-fixes something in the
implementation. For non-templates, that means recompiling a
single .cc file; for templates, recompiling all source files
which include the header. A difference between maybe 5 seconds,
and a couple of minutes. Which is a very significant difference
if you're sitting in front of the computer, waiting for it to
finish.
 
I

Ian Collins

James said:
The clean build isn't the problem. You can schedule that
overnight, or for a weekend. (For my library, a clean build for
all of the versions I support under Unix takes something like
eight hours. Which doesn't bother me too much.) The problem is
the incremental builds when someone bug-fixes something in the
implementation. For non-templates, that means recompiling a
single .cc file; for templates, recompiling all source files
which include the header. A difference between maybe 5 seconds,
and a couple of minutes. Which is a very significant difference
if you're sitting in front of the computer, waiting for it to
finish.
You can say the same for a change to any header. There's always
something else to look at for a couple of minutes..
 
M

Matthias Buelow

Walter said:
My experiences with trying to accelerate C++ compilation led to many
design decisions in the D programming language. Each pass (lexing,
parsing, semantic analysis, etc.) is logically separate from the others,

Arguably, this is just a workaround for the basic problem that C++ (and
presumably D, aswell) is a language where the program must be completely
recompiled and linked before execution. Incremental development where
new code can be directly loaded and tested in a running object image is
imho a more productive model for large program development.
 
I

Ian Collins

Matthias said:
Arguably, this is just a workaround for the basic problem that C++ (and
presumably D, aswell) is a language where the program must be completely
recompiled and linked before execution. Incremental development where
new code can be directly loaded and tested in a running object image is
imho a more productive model for large program development.

A model which isn't unusual in C or C++ development, consider device
drivers and other loadable modules or plugins.
 
I

Ian Collins

Walter said:
Nearly instant rebuilds are a transformative experience for development.
Going off for 2 minutes to get coffee, read slashdot, etc., gets one out
of the 'zone'.

My "something else" was the next problem or test.
 
C

coal

A full build of the dmd compiler (using dmc++) takes 18 seconds on an
Intel 1.6 GHz machine <g>. 33 seconds for g++ on AMD 64 4000.

Do you give any thought to bringing either of those compilers on-
line?
I think it would be a good idea. I know of two C++ compilers that
have taken small steps toward being available on-line.

Brian Wood
Ebenezer Enterprises
www.webEbenezer.net
 
N

Noah Roberts

James said:
The clean build isn't the problem. You can schedule that
overnight, or for a weekend. (For my library, a clean build for
all of the versions I support under Unix takes something like
eight hours. Which doesn't bother me too much.) The problem is
the incremental builds when someone bug-fixes something in the
implementation. For non-templates, that means recompiling a
single .cc file; for templates, recompiling all source files
which include the header. A difference between maybe 5 seconds,
and a couple of minutes. Which is a very significant difference
if you're sitting in front of the computer, waiting for it to
finish.

See the "Stable Dependencies Principle" and the "Stable Abstractions
Principle".

http://www.objectmentor.com/resources/articles/stability.pdf

"Thus, the software that encapsulates the *high level design model* of
the system should be placed into stable packages."

- Emphasis added -

"[The Stable Abstractions Principle] says that a stable package should
also be abstract so that its stability does not prevent it from being
extended."

Robert C. Martin's article on stability principles pretty much stands
against everything you've said in this thread to date. Templates are
the epitome of abstraction. Perhaps if you were not so anti-template
you'd do some looking into how to make the best use of them and you
would not be arguing about changing templates causing long builds; you'd
be well aware that you simply don't change templates that often.

Does this mean you'll never find a bug in a template? Of course not.
But if you find yourself often having to alter or fix templates that are
permeating your entire source tree, instead of a few modules, then the
problem is poor design and testing practices...it is not the fault of
templates.

Of course, you need to go back and read about the other design
principles that Martin describes in order to see the entire reasoning
behind why you put the *high level code* in your stable, abstract
packages. I'm not begging an authority, Martin's stuff just happens to
be very good and the reasoning stands on its own.

The principles of OOD translate very well to Generic Programming.
 
M

Michael Furman

James said:
....

The clean build isn't the problem. You can schedule that
overnight, or for a weekend. (For my library, a clean build for
all of the versions I support under Unix takes something like
eight hours. Which doesn't bother me too much.) The problem is
the incremental builds when someone bug-fixes something in the
implementation. For non-templates, that means recompiling a
single .cc file; for templates, recompiling all source files
which include the header. A difference between maybe 5 seconds,
and a couple of minutes. Which is a very significant difference
if you're sitting in front of the computer, waiting for it to
finish.

I love when compilation takes more then a couple of seconds: I have
extra time to think! Sometimes it ends with killing the compilation
and doing something else, rather then trying the result.

Michael Furman
 
I

Ian Collins

Walter said:
I've tried many times to multitask. I'll have the test suite running in
one window, a compile in a second, and edit documentation in a third.
All closely related, but I find that inevitably I get confabulated
switching mental contexts between them and screw things up.

I know, it's a male thing :(

My builds always run the unit tests, so that's one less window to worry
about.

This all goes to show that what ever you can do the improve build times
is worth the effort!
 
J

James Kanze

Token pasting is another feature that mucks up all hope of
doing things non-sequentially.

Generally speaking, the pre-processor is a real problem for a
lot of reasons.
Take a look at the rules for looking up names. What names the
compiler 'sees' depends very much on a sequential view of the
input, which affects overloading, which affects ...

Yes. Even without the problems in the grammar, name binding
supposes some degree of sequential reading.
Yes, you could probably do that in parallel for each function,
though you'd have to do a complex merge process to turn the
result into a single object file.

The standard doesn't require a single object file:).
I decided that wasn't worth the effort, because the bulk of
the time spent was in the front end which wasn't
parallelizable. The big gains would be in asynchronously
processing all those header files.

It could potentially be a significant gain if you did extensive
optimization. Except that, of course, extensive optimization,
today, means going beyond function boundaries, and we're back to
where we started. There probably are possibilities for
parallelization in some of the most advances optimization
techniques, but I've not studied the issues enough to be sure.
(In the end, much advanced optimization involves visiting nodes
in a graph, and I think that there are ways to parallelize
this, although I don't know whether they are pratical or only
theoretical.)
P.S. Even worse for C++ is that header files must be
reprocessed for every source file compilation. So, if you have
m source files, each with a header file, and every source file
winds up #include'ing every header (a normal, if regrettable,
situation), compilation times are O(m*m).

And for the application headers, even farming the compiles out
to different machines (in parallel) may not work; since the
application headers will normally reside on one machine, you may
end up saturating the network. (I've seen this in real life.
The usual ethernet degrades rapidly when the number of
collisions gets too high.)
The D programming language is designed so that import files
compile independently of where they are imported, so
compilation times are O(m).
P.P.S. Yes, I know all about precompiled headers in C++, but
there is no way to make pch perfectly language conformant. You
have to accept some deviation from the standard to use them.

I'm not sure of that, but you certainly need more infrastructure
than is present in any compiler I know of currently (except
maybe Visual Age). Basically, the compiler needs a data base:
the first time it sees a header, it notes all of the macros (and
name bindings?) used in that header, and stores the information
(including the macro definitions) in a data base. The next time
it sees the header, it checks whether all of the definitions are
the same, and uses the results of the previous compilation if
so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top