dependency-detection in java - Take 2

A

Andreas Leitgeb

Andreas Leitgeb said:
Say, I've got two classes A and B, one of which (A)
contains "static final" fields (SFFs), the other (B)
references these fields. ...

Obviously, I utterly failed to describe my problem before.

I've got a project tree full of classes (who hasn't?),
some of these contain SFFs, some use SFFs.

Now, someone else of the team checks in com.mycompany.fubar.SomeClass,
and everyone else in the project has it show up as new in
his working space, and rebuild it. (The rebuild might
even be central and automatic). It's a waste of ressources
to recompile the whole tree each time a single .java is
checked in, so we (of course!) use ant.

And now, think this SomeClass that has been checked in might
contain new values for existing SFFs, which are used elsewhere.

Now what?
The choices seem to be these:
* Always recompile everything.
* force each developer to find all dependent files, and also
check in those with just a whitespace-change, so they will
be picked up for compilation. Leaves the job of finding
dependencies to each developer...
* force the developer to set some repository-flag when he changes
a constant, and have the build-process read that flag and
trigger a full-compile... still error-prone, since the
developer might forget to set the flag.
new features (would require changes to javac,ant or new tools)
* have javac create dependency-information for each file it
compiles: a list of all referenced classes. Then ant could
consult this list, and pass dependent files automatically
to the list, even if these weren't changed themselves.

Please don't waste your own time with saying, that your SFFs
never change. You just lie to yourself, and your build-process
is unreliable. (or perhaps your project is small enough that
a full-compile doesn't hurt, or you "just know", when to do
a full compile, and when not.
This isn't only about SFFs, but also about classes changing
their interface incompatibly (could be a typo of the developer,
but without compiling the dependents, it may go unnoticed for
a while!)
 
M

Michael Jung

Andreas Leitgeb said:
I've got a project tree full of classes (who hasn't?),
some of these contain SFFs, some use SFFs.
Now, someone else of the team checks in com.mycompany.fubar.SomeClass,
and everyone else in the project has it show up as new in
his working space, and rebuild it. (The rebuild might
even be central and automatic). It's a waste of ressources
to recompile the whole tree each time a single .java is
checked in, so we (of course!) use ant.
And now, think this SomeClass that has been checked in might
contain new values for existing SFFs, which are used elsewhere.
Please don't waste your own time with saying, that your SFFs
never change. You just lie to yourself, and your build-process
is unreliable. (or perhaps your project is small enough that
a full-compile doesn't hurt, or you "just know", when to do
a full compile, and when not.
This isn't only about SFFs, but also about classes changing
their interface incompatibly (could be a typo of the developer,
but without compiling the dependents, it may go unnoticed for
a while!)

SFFs should only be used locally, say within a package, so when you rebuild
your SFF, it is not a big hassle to rebuild the package. The problem arises
when you have that constant spread throughout a bigger project. It is, in
effect, a "uncontracted" interface or a global variable, defying all
encapsulation approaches and should be avoided.

Sometimes it can't be avoided fully. These IDL-files with generated constants
are a good example, other external sources, you may not change are others.
You can try and create fascades, i.e. hide the generated code, but provide our
own interface to it, forcing all users of the IDL and it's constant to use a
getter instead.

Even if you get ant or some other semantic analyser to solve that problem for
you, you may still be stuck with the runtime problem, when someone in a
distributed environment compiled against an old constant.

Michael
 
R

Roedy Green

Now what?
The choices seem to be these:

I think a solution might work like this:

You have a hardware "compile server". Its job is to when probed to
fetch the latest source and recompile it. You might use a tool like
the Replicator to distribute the latest successful compilation class
files to everyone .That saves clients the work of doing a huge build
on perhaps a machine too tiny to compile efficiently.

See http://mindprod.com/webstart/replicator.html

If you change a static final, the safest route is a clean recompile of
everything. It would be the duty of the person checking in such code
to warn the compile server, perhaps with a comment in the embedded
checkin log.

A little java program would run on the compile server all the time,
that spawn ant tasks, and fields requests to recompile.

One of its main functions in to maintain a coherent set of class files
to distribute to anyone who requests tem from the last successful
compile in case some idiot checks in code that blows the build out the
water.
 
A

Andreas Leitgeb

SFFs should only be used locally,

I'm dreaming of reliable, non-full rebuilds. They shouldn't
depend on developers following guidelines (which all have their
"accepted exceptions", anyway), and shouldn't even depend on
developers contributing correct code (but detect all errors,
even those caused only in dependent classes.)

I'm viewing this problem from a CM point of view (although I'm
rather developer than CM). Any change that gets into the repository
might have been checked in by a monkey, as well as by a senior expert.
The build process shouldn't care. It should in the end say: "yes, the
project has been built", or "it could not be built due to these errors,
and furthermore the build process should do this with minimal use of
processor-ressources (on whatever machine it is applied, be it
developer's workstation, or a dedicated compile-server).

I'm aware, that such a build-tool-chain is either not existing now,
or at least not known to anyone participating in this thread.

I'd like to discuss how this could be done. First, what is
principially possible to do - where are the theoretic limits?
Would dependency-management be necessarily more expensive than
the unconditional full compile?
Even if you get ant or some other semantic analyser to solve that problem for
you, you may still be stuck with the runtime problem, when someone in a
distributed environment compiled against an old constant.

The goal of this discussion is a build-process (but not the full one!),
which yields the same result regardless which of the java files were most
recently changed. So, in the end (almost) every developer would use
that build-process (just like almost everyone already uses ant now), and
given that they've checked out the same version, the'd get the same
jar-file (except for files' meta-information like timestamp)
 
A

Andreas Leitgeb

Roedy Green said:
I think a solution might work like this:
You have a hardware "compile server".
That saves clients the work of doing a huge build
on perhaps a machine too tiny to compile efficiently.

I think it doesn't matter if one wastes cycles for full
build on each developer's PC or on a central server.
I want to discuss saved effort of a new (yet to
be developed) build-process, that respects reverse-
dependencies to make an incremental build as reliably
"correct" as a full rebuild.

Please see my reply to Michael Jung.
It seems to take a me few iterations to get my point
clearer. (not only to others, but also to myself)
 
M

Mike Schilling

Andreas said:
I'm dreaming of reliable, non-full rebuilds. They shouldn't
depend on developers following guidelines (which all have their
"accepted exceptions", anyway), and shouldn't even depend on
developers contributing correct code (but detect all errors,
even those caused only in dependent classes.)

I'm viewing this problem from a CM point of view (although I'm
rather developer than CM). Any change that gets into the repository
might have been checked in by a monkey, as well as by a senior expert.
The build process shouldn't care. It should in the end say: "yes, the
project has been built", or "it could not be built due to these
errors, and furthermore the build process should do this with minimal
use of processor-ressources (on whatever machine it is applied, be it
developer's workstation, or a dedicated compile-server).

I'm aware, that such a build-tool-chain is either not existing now,
or at least not known to anyone participating in this thread.

I'd like to discuss how this could be done. First, what is
principially possible to do - where are the theoretic limits?
Would dependency-management be necessarily more expensive than
the unconditional full compile?

I've thought about this a bit, though not to the point of creating a design,
much less building prototypes. It seems to me that this approach is worth
investigating:

1. The interface of each class C in the system needs to be captured and
stored persistently, where "interface" means method signatures, field
definitions, and constant values. Superclass name too, to cover the changes
that can occur if what C inherits changes.

2. Dependencies also need to be captured and stored persistently. This will
be information of the form:

. Class D depends on (some feature of) the interface of class C

3. Whenever a class is compiled successfully:

A. Its new interface is constructed.
B. Changes to the previous interface are computed, and all classes dependent
upon something that changed arerked invalid (except for other classes
compiled by the same invocation of javac, of course -- they're up to date
and will be marked valid).
C. Its new dependencies are calculated.
D. Its stored interface and dependencies are updated.
E. It marked valid

4. Whenever a group of Java files is to be rebuilt, e.g. by Ant's <javac>
task, any classes marked invalid in step 3 are recompiled, in addition to
classes that are not up-to-date with their source files.

Notes:
"Class" is used a bit ambiguously above, sometimes to mean a .class file,
sometimes to mean all of the .class files built from a single source file.
Clearly if A$Inner depends upon B, it's A that's marked invalid when B
changes.

The obvious outstanding problems with the above are:

A. How granular should the dependency information be? If it's simply "A
depends on B", there will be a lot of unnecessary recompilations. If it's
"A depends separately on the following 20 method calls it makes to B", there
will be a vast amount of dependency information stored, updated, and
checked. I have no intuition for where the sweet spot lies.

B. How to generate the dependency information. I presume it can be
calculated from .class file analysis, but I haven't verified this in detail,
nor do I know how expensive that would be. It would be awfully nice if
javac would generate it for us, but it doesn't.

C. How to represent the dependency "C calls a method on an instance of D
that's inherited from E". I think all of this information is required: if
the method definition changes, it will change in E, but we also need to mark
C invalid if D is reparented.

Inheritance adds some wrinkles. If Sub overrides a method it inherits from
Super, that doesn't really change its interface. Classes which previously
called Sub.meth() don't have to be recompiled. On the other hand, if Sub
defines a field that hides a field defined in Super, classes that accessed
Super.field should be marked invalid.

Overloads add some more wrinkles. If a new overload is added, methods that
called the existing overloads and might now call the new one need to be
recompiled. For practical purposes, it should be fine to recompiled callers
to any of the previous overloads, even if they're wholly disjoint.

When a class is reparented, it probably makes more sense to mark all of its
dependents invalid, rather than to try to calculate exactly what changed.
Note that changing the interfaces an abstract class implements is a kind of
reparenting, since it can change the set of methods that the class defines.

I'm sure there are many more of these which further analysis would reveal.
One more note: this is an ideal open source project, since it could be
greatly useful to the development community and there is no money to be made
by solving it.
 
M

Mike Schilling

Andreas said:
I think it doesn't matter if one wastes cycles for full
build on each developer's PC or on a central server.
I want to discuss saved effort of a new (yet to
be developed) build-process, that respects reverse-
dependencies to make an incremental build as reliably
"correct" as a full rebuild.

One more point: a full rebuild will remove class files generated from source
files that have been deleted. Very few build systems do this for
incremental builds, even though it's not particularly difficult.
 
M

Michael Jung

The goal of this discussion is a build-process (but not the full one!),
which yields the same result regardless which of the java files were most
recently changed. So, in the end (almost) every developer would use
that build-process (just like almost everyone already uses ant now), and
given that they've checked out the same version, the'd get the same
jar-file (except for files' meta-information like timestamp)

Is this what you want: every time a file A changes, all dependant files (B)
should be recompiled automatically? (Because, as each developper compiles his
stuff B, javac will automatically determine that the A class file is older than
the A java file and recompile it, you can't mean that.)

What would the end result of such an automatic build be in case it yields lots
of errors? (Because the dependant javas don't compile anymore; incompatible
changes needed to be introduced.)

Michael
 
A

Andreas Leitgeb

One more point: a full rebuild will remove class files generated from source
files that have been deleted. Very few build systems do this for
incremental builds, even though it's not particularly difficult.

Thanks for pointing that out! It's indeed a principial limitation
of incremental builds. Even if we do not delete a whole .java-file,
we might have re-arranged the code such that some helper-class (with
e.g. $0 appended) is no longer generated, or we might have removed
non-public additional classes.

I'll have to think through it further...
(maybe what I wanted is actually impossible :-( )
 
M

Mike Schilling

Andreas said:
Thanks for pointing that out! It's indeed a principial limitation
of incremental builds. Even if we do not delete a whole .java-file,
we might have re-arranged the code such that some helper-class (with
e.g. $0 appended) is no longer generated, or we might have removed
non-public additional classes.

The thing is, removing these files is easy, because it's simple to correlate
class files with the source file they came from; it's just that most build
system don't bother. It's as simple as this:

Before the compilation step, find the .java file corresponding to each
..class file. If

A. There isn't one, or
B. It's in the set of files to be recompiled

then remove the .class file.
 
M

Mark Thornton

Mike said:
The thing is, removing these files is easy, because it's simple to correlate
class files with the source file they came from; it's just that most build
system don't bother. It's as simple as this:

Before the compilation step, find the .java file corresponding to each
..class file. If

A. There isn't one, or
B. It's in the set of files to be recompiled

then remove the .class file.

This task is actually a bit harder than you suggest. It is legal (if bad
practice) to include top level non public classes (i.e. package level)
in .java files which do not match the name of the class.


Mark Thornton
 
M

Mike Schilling

Mark said:
This task is actually a bit harder than you suggest. It is legal (if
bad practice) to include top level non public classes (i.e. package
level) in .java files which do not match the name of the class.

You're right; I was discounting that because I've never heard of anyone
doing it. OK, let's add a flag to the build system to turn off the
behavior described above, which 97% of its users will never have to worry
about setting.
 
A

Andreas Leitgeb

Andreas Leitgeb said:
I'd like to discuss how this could be done. First, what is
principially possible to do - where are the theoretic limits?

Unfortunately, I've to accept, that my ultimate goal is unreachable,
which would have been to have a working incremental build, that would
work with just anything checked in, even if it's in some subtle way
invalid java-code (in which case it would detect and report failure).

This won't work. There are situations, where in it's generality
nothing than a full compile can detect the error: this is, because
any two java-files (A.java & B.java) could contain a second
(non-public) class of a common name (C). Only compiling this A.java
and B.java in the same javac-run will detect this problem.

Also, as Mike pointed out, removing .java files is another thing,
which an incremental build cannot "correctly" deal with - it can't
remove orphaned .class-files.

I don't give up, yet.
Even if I won't find *all* possible errors incrementally, I can still
find most typical errors that way. Even interface-changes that do not
cause compile errors (which also includes changes of constants) could
still be made to be properly propagated to all dependent classes.

If javac were to add an attribute to the .class-file (containing "source"
and perhaps even the value) for each inlined foreign constant,
then third party tools (like e.g. ant) would have the necessary data
available to do dependency management. It would still be not at all
trivial for the third party tool, but quite likely trivial on javac's
side.

Another approach could be to have javac write out .depend-files
as used in C/C++-world - quite unlikely to ever happen.

javac could also take a new option that tells it not to unconditionally
compile, but check compile-necessity for each given file. Advantage:
it actually only checks forward-dependencies. Disadvantage: the
check could be almost as expensive as a compile (don't know, though).
This would mean, that javac would take over (and do better) some part
of ant.
 
A

Andreas Leitgeb

Mark Thornton said:
This task is actually a bit harder than you suggest. It is legal (if bad
practice) to include top level non public classes (i.e. package level)
in .java files which do not match the name of the class.

Actually, the difficulty depends on whether the compiler correctly
includes the "SourceFile"-attribute into the .class-file. (I think
most, if not all, do, but I've seen class-files without it - perhaps
the work of an obfuscator.)

It's still not exactly trivial, especially, if due to a developer's
error, two .java-files contain the same package-level class.

I dislike these additional classes. Their visibility should be
limited to the main class of that source (and their name autogenerated
like MyClass$0), or they should be treated like nested classes.
Not that I really expected this to change anytime soon...
 
A

Andreas Leitgeb

Michael Jung said:
Is this what you want: every time a file A changes, all dependant files (B)
should be recompiled automatically?

For some definition of "automatically", yes :)
I do *not* expect javac to handle reverse-dependencies (B) as it does
forward-dependencies. That would be a bad thing.

What I want instead is some team-work of ant and javac.
Example 1: ant passes all files (of the codebase) to javac (plus
some new option) and javac will first find all the changed ones,
and then all the type-"B" ones among the others and compile those
as well, but not those unrelated to all regenerated ones.
Example 2: javac emits extra information, from which ant could
determine the type-"B" candidates, and add them to the list
of files passed to javac.

As of now, ant only passes the changed files to javac, so javac
doesn't even have a chance of seeing any reverse-dependent ones,
and it surely shouldn't go searching for these the way it does
for forward-dependencies.
What would the end result of such an automatic build be in case it
yields lots of errors? (Because the dependant javas don't compile
anymore; incompatible changes needed to be introduced.)

Practical example:
A user checks in a new version of an Interface-class, in which he
added a new method. At next build, all classes within my codebase
(a concept known to ant) that implement that interface, and which
don't already implement that new method, are supposed to throw
compile-errors then.
 
M

Michael Jung

Andreas Leitgeb said:
For some definition of "automatically", yes :)
I do *not* expect javac to handle reverse-dependencies (B) as it does
forward-dependencies. That would be a bad thing.
What I want instead is some team-work of ant and javac.
Example 1: ant passes all files (of the codebase) to javac (plus
some new option) and javac will first find all the changed ones,
and then all the type-"B" ones among the others and compile those
as well, but not those unrelated to all regenerated ones.
[...]

Sticking to this: This would require ant/javac to walk through all of the
codebase and then through all of the imports.

Say a developer recompiles his most volatile java file ever so often. Each
time some remote change might possibly affect him, the whole engine starts? I
don't know, but it was one of the things that really turned me off in the
default eclipse settings, that every "save" would get the thing petrified,
because we had a rather large source tree, 90% of which I don't care about.
But some indirect dependency forced me to wait until the thing was through
with checking all of it.

In other words, when the code base gets big enough to warrant such an
import-analysis over "make all", the advantages of not doing it at all also
increase dramatically. YMMV.

Michael
 
M

Mark Thornton

Andreas said:
Actually, the difficulty depends on whether the compiler correctly
includes the "SourceFile"-attribute into the .class-file. (I think
The javac option -g:none should emit class files without this attribute.
There are also tools which will strip such attributes from class files
after compilation.
I dislike these additional classes.
So do I, but as they exist a dependency tool ought to detect them even
if only to emit a large raspberry! :)

Mark Thornton
 
M

Mike Schilling

Mark said:
The javac option -g:none should emit class files without this
attribute. There are also tools which will strip such attributes from
class files after compilation.

So do I, but as they exist a dependency tool ought to detect them even
if only to emit a large raspberry! :)

"I cannot work under these conditions!"
 
R

Roedy Green

I think it doesn't matter if one wastes cycles for full
build on each developer's PC or on a central server.
I want to discuss saved effort of a new (yet to
be developed) build-process, that respects reverse-
dependencies to make an incremental build as reliably
"correct" as a full rebuild.

how about a rule like this.

Your central compile server is notified on checkin. It does a
checkout. It looks for the string "public static final". If it sees
it is redoes a clean compile, in not, incremental.

This is different from doing the compiles at the client machines since
it is triggered as soon as possible. Client compiles would not be
triggered till much later, when someone does a checkout.

The server can serve the most recent coherent source/object, just the
deltas. This is almost instantaneous so the extra time to clean
compile is almost irrelevant. When you compile on a client machine,
you have to wait. Further the time to compile is longer on the client
since the client machines are not as powerful.
 
A

Andreas Leitgeb

Michael Jung said:
Andreas Leitgeb said:
For some definition of "automatically", yes :)
I do *not* expect javac to handle reverse-dependencies (B) as it does
forward-dependencies. That would be a bad thing.
What I want instead is some team-work of ant and javac.
Example 1: ant passes all files (of the codebase) to javac (plus
some new option) and javac will first find all the changed ones,
and then all the type-"B" ones among the others and compile those
as well, but not those unrelated to all regenerated ones.
[...]
Sticking to this: This would require ant/javac to walk through all of the
codebase and then through all of the imports.

"Walk through all the codebase" ... this sounds quite expensive, but
actually this happens with every incremental build already: ant checks
every .java-file in the codebase, whether it's newer than its .class
file.
Say a developer recompiles his most volatile java file ever so often. Each
time some remote change might possibly affect him, the whole engine starts?

The advantage of javac doing the reverse-dependencies itself could be,
that it could trigger those recompilations only if not just the depended
source has changed, but it even also changed it's interface.
That would mean: a central java-class adding a new method/field or
changing only implementation could even skip the reverse-dependency-
handling. If, however, some class changed its interface (like changing
a static final, adding abstract methods, or removing non-abstract ones),
then anything else than following reverse-dependencies leaves an
inconsistent state among your .class-files.


Perhaps following reverse-dependencies isn't the only solution.
Having a way to detect binary-incompatible interface-changes
in any of the recompiled classes (during the normal incremental
build, like ant already supports) would let me know when to start
a full-compile. This might even be enough.
Actually, now that I think of that, this could even be done
without any enhancements on javac.

I don't know, but it was one of the things that really turned me off in the
default eclipse settings, that every "save" would get the thing petrified,

I never made any claims, that a build should be auto-started on
file-change. It's quite comfortable for the guy with the big machine
(to whom a background compile is hardly noticable). The other guys rather
turn off auto-build-on-file-save (as well as auto-build-on-key-typed :)
In other words, when the code base gets big enough to warrant such an
import-analysis over "make all", the advantages of not doing it at all
also increase dramatically. YMMV.

That's what I feared since the start of this discussion. Is
dependency-analysis necessarily more (or almost as) expensive
than a full compile?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,709
Latest member
AustinMudi

Latest Threads

Top