Andreas said:
I'm dreaming of reliable, non-full rebuilds. They shouldn't
depend on developers following guidelines (which all have their
"accepted exceptions", anyway), and shouldn't even depend on
developers contributing correct code (but detect all errors,
even those caused only in dependent classes.)
I'm viewing this problem from a CM point of view (although I'm
rather developer than CM). Any change that gets into the repository
might have been checked in by a monkey, as well as by a senior expert.
The build process shouldn't care. It should in the end say: "yes, the
project has been built", or "it could not be built due to these
errors, and furthermore the build process should do this with minimal
use of processor-ressources (on whatever machine it is applied, be it
developer's workstation, or a dedicated compile-server).
I'm aware, that such a build-tool-chain is either not existing now,
or at least not known to anyone participating in this thread.
I'd like to discuss how this could be done. First, what is
principially possible to do - where are the theoretic limits?
Would dependency-management be necessarily more expensive than
the unconditional full compile?
I've thought about this a bit, though not to the point of creating a design,
much less building prototypes. It seems to me that this approach is worth
investigating:
1. The interface of each class C in the system needs to be captured and
stored persistently, where "interface" means method signatures, field
definitions, and constant values. Superclass name too, to cover the changes
that can occur if what C inherits changes.
2. Dependencies also need to be captured and stored persistently. This will
be information of the form:
. Class D depends on (some feature of) the interface of class C
3. Whenever a class is compiled successfully:
A. Its new interface is constructed.
B. Changes to the previous interface are computed, and all classes dependent
upon something that changed arerked invalid (except for other classes
compiled by the same invocation of javac, of course -- they're up to date
and will be marked valid).
C. Its new dependencies are calculated.
D. Its stored interface and dependencies are updated.
E. It marked valid
4. Whenever a group of Java files is to be rebuilt, e.g. by Ant's <javac>
task, any classes marked invalid in step 3 are recompiled, in addition to
classes that are not up-to-date with their source files.
Notes:
"Class" is used a bit ambiguously above, sometimes to mean a .class file,
sometimes to mean all of the .class files built from a single source file.
Clearly if A$Inner depends upon B, it's A that's marked invalid when B
changes.
The obvious outstanding problems with the above are:
A. How granular should the dependency information be? If it's simply "A
depends on B", there will be a lot of unnecessary recompilations. If it's
"A depends separately on the following 20 method calls it makes to B", there
will be a vast amount of dependency information stored, updated, and
checked. I have no intuition for where the sweet spot lies.
B. How to generate the dependency information. I presume it can be
calculated from .class file analysis, but I haven't verified this in detail,
nor do I know how expensive that would be. It would be awfully nice if
javac would generate it for us, but it doesn't.
C. How to represent the dependency "C calls a method on an instance of D
that's inherited from E". I think all of this information is required: if
the method definition changes, it will change in E, but we also need to mark
C invalid if D is reparented.
Inheritance adds some wrinkles. If Sub overrides a method it inherits from
Super, that doesn't really change its interface. Classes which previously
called Sub.meth() don't have to be recompiled. On the other hand, if Sub
defines a field that hides a field defined in Super, classes that accessed
Super.field should be marked invalid.
Overloads add some more wrinkles. If a new overload is added, methods that
called the existing overloads and might now call the new one need to be
recompiled. For practical purposes, it should be fine to recompiled callers
to any of the previous overloads, even if they're wholly disjoint.
When a class is reparented, it probably makes more sense to mark all of its
dependents invalid, rather than to try to calculate exactly what changed.
Note that changing the interfaces an abstract class implements is a kind of
reparenting, since it can change the set of methods that the class defines.
I'm sure there are many more of these which further analysis would reveal.
One more note: this is an ideal open source project, since it could be
greatly useful to the development community and there is no money to be made
by solving it.