how to design a replacement for C++

J

Joshua Maurice

Exactly. Joshua Maurice and I were just discussing this in
another thread. Apparently the forced conflation of interface
and implementation into the same file really bones Java build
systems. Score another point for the C++ separate compilation
model. It may not be perfect but it's much better than Java's
design for large projects. Worse is Better for the win.

Well, not to muck up this thread, but I was merely arguing that a file
level dependency graph, where an out of date node forces a rebuild of
everything downstream, aka the Make model, does not work well if you
want incremental Java compilation. This is because Java has
"interface" and "implementation" in the same file. (It's also because
the common operating procedure for Java programmers is to possibly
have circular references in the same compilation dir, to use other
Java names from the same compilation dir and not specify these
dependencies anywhere apart from the actual Java source, and so on.) I
did, however, note that there are approaches to do quite good
incremental Java builds, but I had to write my own tool to do so, and
I had to use some Sun-javac standard, not Java standard, APIs to get
the equivalent of gcc -M.

I'm not actually sure which approach is better. When headers can
include headers, it's a lot of maintenance using pImpl and keeping the
transitive header dependencies low compared to what you have to do in
Java. In Java, I think no actual change is needed to standard
operating procedure; no maintenance is required to make sure you have
the proper separation of "interface" and "implementation" nor manual
work to keep down on transitive header dependencies. I might \guess\
that Java's compilation model is a better one for developers (barring
implementation concerns for the compiler and build system). However,
for Java, someone needs to write the incremental build system (or use
mine if I ever get it finished and open sourced from my company) as no
one has done this yet for a command line general purpose build system
like Make or Ant (and one really can't do it on top of Make).

More information is available in the other thread. Please feel free to
take any other build comments there:
Build systems (was Re: No unanswered question)
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/c830c7c07951f4a6#

Finally, I don't quite see how you can hack a Java style compilation
model onto C++. It seems like quite a big change. Can anyone point me
to a / the C++ module proposal, or walk me through the rough idea?
 
J

James Kanze

Sure. Both Java and C# are widely used for large scale
software development, including massively parrallel
applications,

Hmmm. I know of more than one company that have tried using
Java for large scale applications, and have given up. (In this
case, large scale is measured in terms of quantity of code.)
But consider. There are lots of areas where there were competing
vendor server products, one written in Java and the other in C++,
where the Java one thrashed the C++ one, e.g. WebLogic versus the
Sybase web server.

Most successful Web servers use a mixture of languages. Java is
very successful for the front end interface: a small, frequently
changing program.
There are lots of spaces where a suite of Java or C# server
apps have displaced an older suite of C++ apps, for example,
in finance, SunGard Adaptive Analytics (C#) and Calypso (Java)
versus the older C++ products like Mysis Carma or SunGard
Infinity. In telecommunications in the 90s, Java server
products were displacing C+ + products.

In the telecommunications projects I worked on, Java was used
for the user interfaces, but not much else.
Simplicity of development and maintenance is an important part of
suitability for large scale software development, and other languages
have that in spades over C++.

Do they? I'm sure some do (Ada 95, perhaps), but Java doesn't.
Once you get beyond two or three programmers, managing a project
in Java becomes a nightmare.
What brought me back to C++ recently is the opposite, writing
small scale numerical calulators for use within other
environments.

What's kept me in C++ is large scale projects, where reliability
was important.
 
Ö

Öö Tiib

Well, if you include massively scalable applications that run 24/7 or
24/6, that support massive simulations of millions of scenarios over a
grid, that support hundreds of users OLTP or DP dispersed
geographically over a number of continents, if you count all that as
"animating web pages or the like", I really don't know what to
say :)  

:-D Think. It just can not be the applications really run all these
scenarios and all that crap you say and outcome is simply something as
detrimental as collapse of the institutions themselves and worldwide
economy in tail of those. So something is fake there and the thing
only animates pre-made web pages to the poor users for to look
smart. :)
I choose finance examples because I know them, but you could
pick examples in any industry including telco, energy, transportation,
etc, not too mention tooling coming out of Oracle and IBM.  Java has
moved into the server space in a big way over the last decade, and C#
is making inroads too.  It's factually incorrect to suggest otherwise.

No, java and C# are fine languages. I was somewhat jokingly dodging
back, because you miss the mark of real problems with C++.

Imagine ... say you are that usual good-for-nothing fatso teaching in
universities ... do you teach something that is told to be noob
friendly or do you teach something that is told to be complex? Now
imagine that you are the other fatso who wants papers from university
that he did study something ... do you learn something that is told to
be newbie-hand-holding-friendly or do you learn something that is told
to be difficult? We can not lie to fatsos, C++ is indeed more feature-
rich and complex to learn (and to teach) than most other languages.
There it goes, fatsos wish it dead.

C++ lives so well *only* because it is almost unavoidable necessity in
so lot of situations. Everything tries to slag it off, tear down or
discriminate, but in vain. If you need power ... then you need it,
nothing to do. It takes more than year of very hard beating to get
livable C++ out of former Javascript or Vis-basic guy but again ...
sometimes there's nothing to do but cream the fatsos. ;-)
Do some of the Java and C# server apps and tooling have issues?  Yes.
Did some of the C++ sever apps that they displaced also have issues?
Yes.  In most cases these are broad architectural design issues that
vendors screwed up, though, rather than language issues.

In most modern programming languages you can write *almost* anything.
All issues are always since people in charge did not timely realize
where true bottlenecks and problems are and once they realized it then
they did not manage to repair the screw-up.
 
I

Ian Collins

Hmmm. I know of more than one company that have tried using
Java for large scale applications, and have given up. (In this
case, large scale is measured in terms of quantity of code.)


Most successful Web servers use a mixture of languages. Java is
very successful for the front end interface: a small, frequently
changing program.

At this point, those of us who have had to configure and maintain Tomcat
run screaming from the room!

I used to write all of the sever side code in my web applications in
PHP, but now all of the required library components, I've gone back to
using C++.
 
I

Ian Collins

It's been some time since I've had the opportunity of building a large
C++ code base, but my recollection is that it took hours, even after
reducing it considerably with precompiled headers and other tweaks.
This compared with minutes for a Java code base of similar size. I
don't think this is a unique observation, and issues with build times
may be of special significance to C++ developers, as compared to other
languages.

One reason is the output of a C++ build is the fully cooked application
where the output form Java compilation is like those part baked breads
supermarkets sell. The consumer has to take them home and finish the job!
Issues with maintenance and rebuilding have more to do with how
dependencies are structured than the .h/.cpp division. And to note,
it is not correct that C++ header files represent interface, they are
as much implementation as interface - including all the private bits.
These are not infrequently affected by maintenance.

True, but they also act as documentation of the interface. I often end
up creating an interface for a PHP class just so I can see all the
member function declarations in one place.
With both C++ and Java, genuine separation of interface and
implementation requires careful attention to design, but I would
suggest that the .h/.cpp separation doesn't particularly help, that's
just distributing implementation.

Fair enough.
 
Ö

Öö Tiib

Finally, I don't quite see how you can hack a Java style compilation
model onto C++. It seems like quite a big change. Can anyone point me
to a / the C++ module proposal, or walk me through the rough idea?

Roughly:
Every module or library has (should have) a unique namespace. Add two
operations:
1) You can mark stuff as exported from namespace. That stuff is
interface of module.
2) When someone imports that modules namespace then he gets stuff
declared that the namespace did export without preprocessor involved.
More details-shmetails and so on. I did really hate the proposed bit-
shift-like semantics but otherwise it felt good idea.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1778.pdf
 
D

Daniel

At this point, those of us who have had to configure and maintain Tomcat
run screaming from the room!
No argument from me on that :)

Myself, I have painful memories of getting combinations of jar files
from various projects to work. When Sun moved to SAX2, they didn't
freeze and deprecate the old SAX1 interfaces, rather, they just added
new methods to existing interfaces (Sun did that a lot.) In any
runtime environment, you could easily pull in several implementations
of these interfaces, some SAX1, some SAX2, as they were typically
found in more that one jar file in the classpath. Renaming a jar file
to start with the letter 'z', to ensure it was picked up last in an
automatically generated classpath, was not unheard of. And relying on
SAX readers was painful, they supported different compliance levels,
worked differently, and produced different output. It was hard to
know exactly whose implementation was being instantiated, because it
was based on abstract factories and reflection, and whichever
component managed to get their class name to the front. With every
new release of a third party component, something seemed to break.

This isn't so much about the language, as about management of change.
Although it would be nice to have language support for versioning,
which Java doesn't have.

-- Daniel
 
J

joe

Juha Nieminen said:
AFAIK (I'm not a Java programmer) there are significant differences
between C/C++ pointers and Java references.

C++ pointers are more tied to low-level memory addresses. You can
perform pointer arithmetic to them (eg. "pointer+1" to get a pointer
to the next element in an array pointed by 'pointer') and you can
unambiguously and consistently compare pointers (in standard C++ at
least with std::less and the other comparison templates).
("Consistently"
in this case means that if you compare two pointers now and the same
pointers one minute from now, you will get the same result.) Because
C++
pointers are basically memory addresses, this allows casting a pointer
of one type to a different type (if you are doing a reinterpret-cast,
then the memory address doesn't change).

There are both advantages and disadvantages to equating pointers with
memory addresses in C/C++. It allows more low-level flexibility when
dealing with raw memory and performing other tricks (such as
xor-lists),
but on the other hand it makes it more rigid with respect to the memory
allocator (making it more difficult to implement things like garbage
collectors and memory compactors in a fool-proof way).

Java references are much more abstract. They are not tied to a
specific
memory address, you can't compare them (other than for equality, I
think),
you cannot cast them to incompatible types and obviously you cannot
perform
pointer arithmetic.

AFAIK it's possible for a reference to be "pointing" to one memory
location at one moment, and a minute later the memory management system
having changed it to "point" to a completely different location
(completely
transparently from the program's point of view). This allows things
like
memory compaction (which is good for cache optimization and other
things).
This is not possible in C++ because of the raw wild memory pointers.

Where I come from, we call that a HANDLE (caps for emphasis, and NOT to
refer to the Windowsism).
 
B

Branimir Maksimovic

AFAIK it's possible for a reference to be "pointing" to one memory
location at one moment, and a minute later the memory management
system having changed it to "point" to a completely different
location (completely transparently from the program's point of view).
This allows things like memory compaction (which is good for cache
optimization and other things). This is not possible in C++ because
of the raw wild memory pointers.

That means: stop program, move memory allocated blocks, then update
all references in memory. Imagine n threads running on n cpus and your
program become completely halted while operation works.
This will also cause page trashing thus heavy swapping in
larger programs.
That's why GC per thread instead of GC per n threads is always
better solution, therefore processes that don;t share memory are
always faster then threads that share address space.

Greets
 
J

Juha Nieminen

joe said:
Where I come from, we call that a HANDLE (caps for emphasis, and NOT to
refer to the Windowsism).

The world would certainly be a better place if everybody used the same
unified terminology, but that's unfortunately not the case, and I'm assuming
that this reference/handle thing might be an example.
 
J

James Kanze

One reason is the output of a C++ build is the fully cooked application
where the output form Java compilation is like those part baked breads
supermarkets sell. The consumer has to take them home and finish the job!

And you have to hope that he does it correctly (using the right
versions of the libraries, etc.). As they say: write once,
debug everywhere.

But the argument concerned more the developer, who modifies one
small implementation detail (in a source file in C++), then does
make. With Java, every class which uses the modified class will
be recompiled. With C++, only the one source file will be
recompiled.
 
J

Juha Nieminen

Branimir Maksimovic said:
That means: stop program, move memory allocated blocks, then update
all references in memory. Imagine n threads running on n cpus and your
program become completely halted while operation works.

You are both having a very simplistic view of how garbage collection and
memory compaction work (after all, the algorithms have significantly
improved since the early 90's), and underestimating the significance of
cache optimizations. You would be surprised how much significance cache
misses have to the efficiency of a program. (For example, the speed
difference between multiple cache-frienly memory allocations vs. the
same allocations done at random memory locations can easily make a speed
difference of almost an order of magnitude. At least 90% of the speed of
any given program is thanks to the CPU cache.)

Garbage collection *used* to work in the 90's like you describe (ie.
stop the program during a sweep, which could take a noticeable amount
of time), but modern efficient garbage collectors do it a lot more
smartly and efficiently. Likewise memory compaction can be done smartly
and efficiently. Algorithms have improved since those times.
 
B

Branimir Maksimovic

You are both having a very simplistic view of how garbage
collection and memory compaction work (after all, the algorithms have
significantly improved since the early 90's), and underestimating the
significance of cache optimizations. You would be surprised how much
significance cache misses have to the efficiency of a program. (For
example, the speed difference between multiple cache-frienly memory
allocations vs. the same allocations done at random memory locations
can easily make a speed difference of almost an order of magnitude.
At least 90% of the speed of any given program is thanks to the CPU
cache.)
I don;t think that any of this goes in favor of gc.
http://confluence.atlassian.com/display/DOC/Garbage+Collector+Performance+Issues
http://stackoverflow.com/questions/2297920/jvm-outofmemory-error-death-spiral-not-memory-leak
http://stackoverflow.com/questions/771920/help-with-really-odd-java-gc-behavior
http://www.devproconnections.com/article/net-framework2/Get-to-Know-NET-4-0-s-CLR/2.aspx
"
But there's another feature in CLR 4.0 that a developer very concerned
about ASP.NET garbage collection performance can take advantage of:
garbage collection notifications. There's a "death spiral" that an
ASP.NET server can go into when it's running short of memory and under
extreme load. As memory gets low, the Gen 2 garbage collection runs to
try and free up memory. Often threads are blocked because they depend
on objects in Gen 2, like in-process session objects (you're not still
using in-process session objects are you?). While the Gen 2 garbage
collection runs, the server ends up with all threads blocked, resulting
in request queues growing. When the garbage collection finishes, the
server is hammered by all the requests that are backed up in the queue.
That backlog runs the server out of memory again, causing another Gen 2
garbage collection. With each iteration the queues get larger, until
the worker process recycles or the server crashes.
"
Garbage collection *used* to work in the 90's like you describe (ie.
stop the program during a sweep, which could take a noticeable amount
of time), but modern efficient garbage collectors do it a lot more
smartly and efficiently. Likewise memory compaction can be done
smartly and efficiently. Algorithms have improved since those times.

Problem is that when you move memory allocated blocks around,
you have to update all references in program, also with scanning
references you force cache invalidation and page trashing, and I don;t
see how that can be much improved and without stopping whole program.


Greets
 
J

Joshua Maurice

And you have to hope that he does it correctly (using the right
versions of the libraries, etc.).  As they say: write once,
debug everywhere.

But the argument concerned more the developer, who modifies one
small implementation detail (in a source file in C++), then does
make.  With Java, every class which uses the modified class will
be recompiled.  With C++, only the one source file will be
recompiled.

The system could be improved so that if it detects that the "external
visible class file exported 'interface' " has not changed since last
compile, that is it has the same list of functions and fields with the
same types and names, then it doesn't need to recompile "clients" aka
class files which use that type name. If you just modify the internals
of a Java function, then the class "exported interface" will not
change, so there is no need to recompile "clients", but if you add,
remove, or modify the name or type of a field or function, then you do
have to recompile clients. This is exactly analogous to headers and
cpp files for C++. You just need some not-make-based logic to deal
with it.

PS: Yes I'm glossing over details. See the other thread in comp.lang.c+
+ for the details:
Build systems (was Re: No unanswered question)
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/c830c7c07951f4a6#
Specific post:
http://groups.google.com/group/comp.lang.c++/msg/1775877bafa20eda
This specific paragraph, where I described how I applied my build-
system-in-progress to roughly 100 of those jars:
 
K

Keith H Duggar

The system could be improved so that if it detects that the "external
visible class file exported 'interface' " has not changed since last
compile, that is it has the same list of functions and fields with the
same types and names, then it doesn't need to recompile "clients" aka
class files which use that type name. If you just modify the internals
of a Java function, then the class "exported interface" will not
change, so there is no need to recompile "clients", but if you add,
remove, or modify the name or type of a field or function, then you do
have to recompile clients. This is exactly analogous to headers and
cpp files for C++. You just need some not-make-based logic to deal
with it.

The problem is that such systems require parsing of the file
and language specific analysis of the file. And that analysis
must examine not only the current contents but some previously
known contents as well. All that is significantly more complex
and costly than simply checking a timestamp, checksum, etc.

KHD
 
K

Keith H Duggar

How could it be irrelevant?

Because the point is not comparing C++ and Java build times.
It is comparing what Java build times /would be/ if it allowed
separation of implementation and interface files. Ie it is not
about Java vs C++ build times it's about Java vs Java and C++
vs C++ build times under two different code separation models:

a) all code in one file (Java and header only C++)

b) capability of separating some code into different files
(not supported by Java, supported by .hpp/.cpp C++).

This can be actually be evaluated by simply making a change to
the /implementation/ of a Java function, measuring the compile
time of that one file and then measuring the compile time of
the incremental build which will required compling many more
files specifically the transitive closure of .java files that
import that functions class. (At least from what I understand
from some recent discussions.)
 Compilation time is a practical matter, nothing else. It a
full rebuild would take just milliseconds, who would bother
at all?

Apparently it is much more than that or I doubt Joshua would
have been discussing the Java build dependency problems in the
another thread.

http://groups.google.com/group/comp.lang.c++/msg/116fa28be7ebde6f
As long as a full rebuild of java still takes a magnitude
more than a fine-tuned C++ system, arguments about superiority
of the latter will hardly win. ;-)

It's not about Java vs C++ build times it's about Java vs Java
and C++ vs C++ build times under different code separation models.
Possibly could make a PhD thesis and impress a some folks who only know
build from books...  Not anyone from practice.

Judging from Joshua's concerns it apparently has very real
practical implications. But, yeah, a Java fanboy compiling
online toy Fahrenheit-to-Celsius calculators won't grok the
concept.
I'm light on java, but I saw people using 'interface' alot. That
is certainly pure. And does a plenty of separation too. And compiles
separately too using your terms. Is anything preventing to do all
the job through interfaces? Stating that any implementation of the
interface is considered private stuff?

Yes you are prevented from doing "all the job through interfaces"
by the fact than at some point you must import a concrete class
to instantiate if you are actually going to do any actual work.

Thus that file now depends on the /implementation/ of the concrete
class and any modification to the /implementation/ will trigger
unnecessary compilation of all files in the transitive closure of
the imports (noting that the Java import command is directed not
symmetric). At least that is my understanding from limited Java
experience and Joshua's comments. Am I wrong about that?
At a really steep cost -- and leaving that immensly reduced time
still pretty high.

Which is irrelevant to the point as I hope you now understand.

KHD
 
J

Joshua Maurice

The problem is that such systems require parsing of the file
and language specific analysis of the file. And that analysis
must examine not only the current contents but some previously
known contents as well. All that is significantly more complex
and costly than simply checking a timestamp, checksum, etc.

Not really. (Yep, time to muck up this thread.)

To be clear, let's examine what happens in a Make system on C++. At
the first go, no object files exist, so the object files are built.
During this build, you use a language specific tool, something like
gcc -M, to extract the file dependencies. For for the next build, you
do not rerun gcc -M when checking dependencies and such. You use only
the precomputed dependencies of gcc -M from the previous run, and file
timestamps, to determine which files are out of date. You then
recompile these out of date files, and rerun the language specific
tools, ex: gcc -M, to get the new file dependencies.
The problem is that such systems require parsing of the file
and language specific analysis of the file.
This is true of the standard C++ and Make solution, and I assume it's
true for your solution. Presumably, you use gcc -M or equivalent to
extract header file dependencies.
And that analysis must examine not only the current contents but some previously known contents as well
In the standard C++ and Make solution, and I assume also for your
solution, the gcc -M results are saved, traditionally to a .d file. On
the next make execution, these .d files are read in by make.
All that is significantly more complex
and costly than simply checking a timestamp, checksum, etc.
Once Make reads in the .d files of the saved state from the previous
run, it then uses simple rules, and file timestamps, to figure out
what's out of date. Specifically, it does not need to call gcc -M on
every call to correctly determine dependencies. It only needs to call
gcc -M when a file is recompiled, and generally such analysis is quite
cheap relative to the cost of a full compile, and it only happens on a
recompile, so it's quite worth it.

So, at face value, your complaints of my Java system apply to the
standard C++ and Make solution. Presumably, you also mean that the
language specific source file parsing must happen on every file on
every build system execution to determine dependencies, and that's
where you're wrong, at least for C++ and Java.

I am suggesting something exactly analogous for Java can exist. At the
first go, there are no output files, so the Java files are all out of
date, so they all get built. As part of the Java compilation process,
you can use language specific tools to extract the needed dependency
information. During the next build, you can use only the precomputed
dependency information and file timestamps to determine if there are
out of date Java files. If you find some out of date Java files, then
recompile those Java files and recompute the dependencies using the
language specific tools. You can then use the available dependency
information (without parsing any additional Java files) to see if any
further Java files are affected by the just-recompiled Java files. If
so, continue the cascading rebuild. Stop the cascading rebuild when
you find yourself in a position where all of the just-built Java files
do not affect Java files not yet built during this build.

You do not need to reparse all of the Java files on every build. You
just need to employ some logic which isn't make-style "file level
dependency graph cascading rebuild without termination".
 
D

Daniel Pitts

Because the point is not comparing C++ and Java build times.
It is comparing what Java build times /would be/ if it allowed
separation of implementation and interface files. Ie it is not
about Java vs C++ build times it's about Java vs Java and C++
vs C++ build times under two different code separation models:

a) all code in one file (Java and header only C++)

b) capability of separating some code into different files
(not supported by Java, supported by .hpp/.cpp C++).

This can be actually be evaluated by simply making a change to
the /implementation/ of a Java function, measuring the compile
time of that one file and then measuring the compile time of
the incremental build which will required compling many more
files specifically the transitive closure of .java files that
import that functions class. (At least from what I understand
from some recent discussions.)


Apparently it is much more than that or I doubt Joshua would
have been discussing the Java build dependency problems in the
another thread.

http://groups.google.com/group/comp.lang.c++/msg/116fa28be7ebde6f


It's not about Java vs C++ build times it's about Java vs Java
and C++ vs C++ build times under different code separation models.


Judging from Joshua's concerns it apparently has very real
practical implications. But, yeah, a Java fanboy compiling
online toy Fahrenheit-to-Celsius calculators won't grok the
concept.


Yes you are prevented from doing "all the job through interfaces"
by the fact than at some point you must import a concrete class
to instantiate if you are actually going to do any actual work.

Thus that file now depends on the /implementation/ of the concrete
class and any modification to the /implementation/ will trigger
unnecessary compilation of all files in the transitive closure of
the imports (noting that the Java import command is directed not
symmetric). At least that is my understanding from limited Java
experience and Joshua's comments. Am I wrong about that?
Actually, you have an incorrect/oversimplified understanding of Java
compilation dependencies.

The only thing that might cause a compile-time dependency is interface
change, or constant (static final primitive/String variable) value
change. You can safely modify/add methods in one Java file without
recompiling other Java files which depend on it. Removing members *may*
result in a link-time error, but also do not require rebuild. Modifying
constants may result in inconsistencies between references to that
constant until a rebuild.

The Java keyword "import" is not analogous to the #include directive. It
tells the compiler that "References to the imported class needn't be
fully qualified." For example, import java.util.List lets you refer to
"List" rather than "java.util.List" in the rest of the file. Importing
or not doesn't really affect compile time (there are a few exceptions,
but they are trivial, and unrelated to the classes imported.)
Which is irrelevant to the point as I hope you now understand.
Sorry, late-comer to this thread, so I can't comment on your point ;-)
 
J

Joshua Maurice

Actually, you have an incorrect/oversimplified understanding of Java
compilation dependencies.

  The only thing that might cause a compile-time dependency is interface
change, or constant (static final primitive/String variable) value
change.  You can safely modify/add methods in one Java file without
recompiling other Java files which depend on it.  Removing members *may*
result in a link-time error, but also do not require rebuild.  Modifying
constants may result in inconsistencies between references to that
constant until a rebuild.

The Java keyword "import" is not analogous to the #include directive. It
tells the compiler that "References to the imported class needn't be
fully qualified."  For example, import java.util.List lets you refer to
"List" rather than "java.util.List" in the rest of the file. Importing
or not doesn't really affect compile time (there are a few exceptions,
but they are trivial, and unrelated to the classes imported.)

Actually, no. Let me continue mucking up this thread. First, let's get
some definitions out of the way:

A build is the act of compiled (and linking, etc.) source files into
"executable" files. A build system is a process or system for doing a
build. It may be the English "call gcc", or it could be an automated
script, or a makefile, etc.

An incremental build is a special kind of build. Developers have their
own local view of the source. Suppose he does a full clean build, then
makes a change to some of the source files. He can then build only
some of the source files, and by selectively building only certain
source files, he can end up in a situation equivalent to what would be
if he did another full clean build. This is an incremental build. An
incremental build is a build which works on top of an already existing
build, and which skips some (preferably most) of the build steps which
would produce equivalent output to the previous build.

Finally, I define a correct incremental build as an incremental build
which produces output equivalent to a full clean build, includes any
build errors. A correct incremental build system is a build system
which does incremental builds, and which only does correct incremental
builds. This is what every developer wants. It doesn't matter if the
build is fast if the build also produces output not equivalent to a
full clean build.

So, in order to guarantee correct incremental, if a Java file has its
"exported interface" change (ex: the type of a function is changed),
then this could affect the compilation of another java file which
"imports" it (either with an import statement, or through a fully
qualified name, or a name when it's in the same package, etc.). If you
did a change to the Java A which would cause the next compilation of
Java file B to fail, but you skip compiling Java file B, then you do
not have a correct incremental build. A full clean build would fail
(and execution may also fail).

So yes, a change inside a Java function body does not require
recompiling any other Java file, but for a correct incremental build
system, a change to a Java classes "exported interface" does require
recompiling all direct dependents.
 
D

Daniel Pitts

Actually, no. Let me continue mucking up this thread. First, let's get
some definitions out of the way:

A build is the act of compiled (and linking, etc.) source files into
"executable" files. A build system is a process or system for doing a
build. It may be the English "call gcc", or it could be an automated
script, or a makefile, etc.

An incremental build is a special kind of build. Developers have their
own local view of the source. Suppose he does a full clean build, then
makes a change to some of the source files. He can then build only
some of the source files, and by selectively building only certain
source files, he can end up in a situation equivalent to what would be
if he did another full clean build. This is an incremental build. An
incremental build is a build which works on top of an already existing
build, and which skips some (preferably most) of the build steps which
would produce equivalent output to the previous build.

Finally, I define a correct incremental build as an incremental build
which produces output equivalent to a full clean build, includes any
build errors. A correct incremental build system is a build system
which does incremental builds, and which only does correct incremental
builds. This is what every developer wants. It doesn't matter if the
build is fast if the build also produces output not equivalent to a
full clean build.

So, in order to guarantee correct incremental, if a Java file has its
"exported interface" change (ex: the type of a function is changed),
then this could affect the compilation of another java file which
"imports" it (either with an import statement, or through a fully
qualified name, or a name when it's in the same package, etc.). If you
did a change to the Java A which would cause the next compilation of
Java file B to fail, but you skip compiling Java file B, then you do
not have a correct incremental build. A full clean build would fail
(and execution may also fail).

So yes, a change inside a Java function body does not require
recompiling any other Java file, but for a correct incremental build
system, a change to a Java classes "exported interface" does require
recompiling all direct dependents.
That is mostly correct. I would qualify it with "exported interface"
with "used" exported interface. If the portion of the exported
interface which changed has not been used, then there need not be any
cascading builds.

It seems to me that it wouldn't be too difficult at build-time to create
a dependency relationship among classes as part of the build. It may
reduce the speed of full builds, but that seems unlikely to matter as it
would be the less-common case.

FWIW, I work on some relatively large projects, and the unit tests are
what take the most time of the build. Completely rebuilding the
projects (sans unit tests) take under a minute each, for most projects.
Yes, I would be happier if they ran in under 10 seconds, but the time
isn't a major loss of productivity.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,372
Latest member
LucretiaFo

Latest Threads

Top