how to design a replacement for C++

J

joe

Juha said:
The world would certainly be a better place if everybody used the
same unified terminology, but that's unfortunately not the case, and
I'm assuming that this reference/handle thing might be an example.

Well that's obviously not ever going to happen, but is it so darned hard
to provide a glossary that gives the meaning of the terms so as to
resolve any potential ambiguity? (rhetorical). A lot of newsgroup
discussions go round-n-round because of the same failing: definitions are
not established at the onset so they must painstakingly bubble up to the
surface after a thousand posts and sometimes NEVER reach fruition because
of the lack-of.
 
J

joe

Juha said:
You are both having a very simplistic view of how garbage collection
and memory compaction work (after all, the algorithms have
significantly improved since the early 90's), and underestimating the
significance of cache optimizations. You would be surprised how much
significance cache misses have to the efficiency of a program. (For
example, the speed difference between multiple cache-frienly memory
allocations vs. the same allocations done at random memory locations
can easily make a speed difference of almost an order of magnitude.
At least 90% of the speed of any given program is thanks to the CPU
cache.)

Garbage collection *used* to work in the 90's like you describe (ie.
stop the program during a sweep, which could take a noticeable amount
of time), but modern efficient garbage collectors do it a lot more
smartly and efficiently. Likewise memory compaction can be done
smartly and efficiently. Algorithms have improved since those times.

So, how big is one of these new-fangled collectors? How many source files
and lines of code? How many developer-years are required to implement a
robust one? Examples if ya got em please.
 
B

Balog Pal

Joshua Maurice said:
So yes, a change inside a Java function body does not require
recompiling any other Java file, but for a correct incremental build
system, a change to a Java classes "exported interface" does require
recompiling all direct dependents.

I didn't use it, but my friend used to show me Visual Age for Java, like a
decade back. It had a 'code repository' instead of the classic source
files -- you saw all your packages, classes, functions in a tree view, could
edit anything, got a full version control system on that function level,
etc.

I bet it had no problem doing your perfect incremental build having handy
the nature of any change...

I wish i had like a system for my C/C++ work -- unfortunately the language
is such you can not get rid of source files/translation units entirely. And
using external libraries would spoil a self-limiting approach too.
 
B

Balog Pal

Keith H Duggar said:
Because the point is not comparing C++ and Java build times.

Kinde tweaking discussion to some made-up point. Sure, it is possible to
use a FU build system with java -- I saw ours out of order time to time,
causing it go make-like, invoking compile on individual files or folders.
then someone fixed it back to get the full fileset and it processed
everything in "no-time" again. Looing for potential speed-up of something
that is fast enough already is IMO fishy, especially when comparing to a
different model where it does not hold.

Yet,
It is comparing what Java build times /would be/ if it allowed
separation of implementation and interface files.

What exactly is that disallows the said separation? As mentioned in another
post, VisualAge for Java IIRC did that very separation on its own. Before
Y2K. While the parallel froduct for C++ could not do it, as there having
source files seem unavoidable both in theory and practice.
Ie it is not
about Java vs C++ build times it's about Java vs Java and C++
vs C++ build times under two different code separation models:
a) all code in one file (Java and header only C++)
b) capability of separating some code into different files
(not supported by Java, supported by .hpp/.cpp C++).

And consider also
c) code repository, no files at all...
 
J

Juha Nieminen

Branimir Maksimovic said:

I didn't claim that garbage collection is completely problem-free and
extremely efficient in all possible situations. I was simply pointing out
that GC algorithms have improved since the early 90's and are much better
now than they were back then (which isn't the same thing as claiming that
they are perfect).

(And for the record, I'm not a GC nor Java fanboy. In fact, I hate the
guts of Java, but that's a different story. It's just that I don't like
to exaggerate problems which aren't there.)
Problem is that when you move memory allocated blocks around,
you have to update all references in program, also with scanning
references you force cache invalidation and page trashing, and I don;t
see how that can be much improved and without stopping whole program.

You seem to have the mentality that reference == pointer == memory
address.

As I already wrote, Java references are (AFAIK) more abstract than that
(or can internally made more abstract because the language doesn't assume
they are memory addresses). Moving a block of memory from one place to
another doesn't necessarily mean that all references which work as handles
to that memory block need to be modified. It's possible to have an extra
indirection step, which allows changing the low-level memory address of
the memory block in only one place and have all the references work
without any change. (Think about it like a vtable.)
 
J

Juha Nieminen

joe said:
So, how big is one of these new-fangled collectors? How many source files
and lines of code? How many developer-years are required to implement a
robust one? Examples if ya got em please.

You want me to do the googling for you? I'm sure you can do it yourself.

If you want an intro to modern garbage collection, try the wikipedia
page as a starting point.
 
J

James Kanze

On 7/30/2010 11:00 AM, Keith H Duggar wrote:
Actually, you have an incorrect/oversimplified understanding of Java
compilation dependencies.

And you have an incorrect/oversimplified understanding of C++
compilation dependencies.:)
The only thing that might cause a compile-time dependency is interface
change, or constant (static final primitive/String variable) value
change. You can safely modify/add methods in one Java file without
recompiling other Java files which depend on it. Removing members *may*
result in a link-time error, but also do not require rebuild. Modifying
constants may result in inconsistencies between references to that
constant until a rebuild.

That's all in theory. In practice, the build systems I've seen
(Java or C++, with one exception: Visual Age) use the file as
the lowest level of granularity. Which means that a change will
trigger a recompilation, even if the change won't change the
generated code.
The Java keyword "import" is not analogous to the #include directive. It
tells the compiler that "References to the imported class needn't be
fully qualified." For example, import java.util.List lets you refer to
"List" rather than "java.util.List" in the rest of the file. Importing
or not doesn't really affect compile time (there are a few exceptions,
but they are trivial, and unrelated to the classes imported.)

Yes, but that's not the question. Java's import is more or less
like C++'s using. And Java "implicitly" includes anything that
is needed, where as in C++, you have to explicitly include it.
But the inclusion mechanism is different---in C++, it's pure
texual inclusion, where as in Java, the included data come from
the compiled .class file. But the fact remains, if you modify a
..java file, the timestamp on the files will tell the build
system that the .class file is out of date, and it will
recompile the .java file, producing a new .class file. Which
will then trigger a recompile of all of the .java files which
use this .class file. Where as if you modify the implementation
code in a C++ source file (.cpp or .cc), the corresponding
object file will be recompiled, but the time stamp on the header
file (.hpp or .hh) will not be modified, and objects built from
sources dependent on the header will not be recompiled.

There are a lot of criticisms to be made for both solutions.
There's no reason why modifying the implementation of a member
function should trigger recompilation of all client sources (as
it does in Java), and there's no reason why correcting an error
in the Doxygen documentation should retrigger recompilation of
all client sources (as it does in both Java and C++---except, I
believe, Visual Age C++). It's also not a good thing that you
can get mixed versions of a class in a single program (most, but
not all, C++ systems; Java only detects this at runtime, because
it doesn't link until runtime, but it's much more tolerant with
regards to differences in the versions).
 
J

James Kanze

On Jul 30, 11:08 pm, Daniel Pitts

[...]
That is mostly correct. I would qualify it with "exported interface"
with "used" exported interface. If the portion of the exported
interface which changed has not been used, then there need not be any
cascading builds.
It seems to me that it wouldn't be too difficult at build-time to create
a dependency relationship among classes as part of the build. It may
reduce the speed of full builds, but that seems unlikely to matter as it
would be the less-common case.

There are many things that could be done to improve incremental
builds, but aren't. On all of the systems I've used, and all
but one I've heard about, the granularity of the build system is
the file. If you modify anything in a Java file, or at least
anything that changes the generated .class file (or causes it's
timestamp to be updated), then the build system will cause any
classes which depend on that .class file to be rebuilt; in
practice, the only information the build system has about the
..class file is its time stamp. The same problem occurs with
"benigne" changes (like adding comments, or a non-virtual
private member function) in a C++ header file.

At least one system I've heard about (but never used) tried to
do better. In C++, Visual Age C++ managed things at a much
lower level, maintaining meta-information concerning who really
used what, and how, and knew which changes affected what types
of use. (There was or is also a Visual Age Java, which I
suppose behaves similarly.)
FWIW, I work on some relatively large projects, and the unit
tests are what take the most time of the build.

A large project is (or should be) cut up into smaller
components. Normally, you're working on a single component, and
when you incrementally build during development, you'll only run
the unit tests for that component. (Before you commit your
changes, you'll run the unit tests for the entire project, but
if the unit tests for the component were complete, changes in
the component which pass those unit tests shouldn't cause
anything to fail elsewhere in the project.)
 
J

James Kanze


[...]
What exactly is that disallows the said separation?

It's extra work:). And it has runtime implications.
As mentioned in another post, VisualAge for Java IIRC did that
very separation on its own. Before Y2K. While the parallel
froduct for C++ could not do it, as there having source files
seem unavoidable both in theory and practice.

Except that in practice, Visual Age C++ did keep different
things separate. And knew what sort of changes affected what
sort of use, so that it wouldn't compile client sources just
because you added a private, non-virtual function to the class
definition.
And consider also
c) code repository, no files at all...

Yes, but only Visual Age uses this model. And it does so for
both C++ and Java. (And as you said, it's been around for some
time now. I believe it's based on work done by Taligent, before
they folded. And I recall one of the developpers from Taligent
explaining it to me sometime around 1992.)
 
J

James Kanze

On Jul 30, 1:31 pm, Joshua Maurice <[email protected]> wrote:

[...]
The problem is that such systems require parsing of the file
and language specific analysis of the file. And that analysis
must examine not only the current contents but some previously
known contents as well. All that is significantly more complex
and costly than simply checking a timestamp, checksum, etc.

They require meta-information of some sort. If the compiler
collaborates, it could easily generate that meta-information
when compiling. And evaluating that meta-information should
actually cost less than checking a checksum (but not a time
stamp).
 
J

joe

Juha said:
You want me to do the googling for you? I'm sure you can do it
yourself.

If you want an intro to modern garbage collection, try the wikipedia
page as a starting point.

I don't need the other info. I was just curious for the stats. I have a
feeling it's going to be one of those monstrous things though which I
tend to avoid. If anyone actually KNOWS what I am curious about, please
speak up, thanks. Surely someone here has actually worked on a
sophisticated (not personal project) one (?).
 
Ö

Öö Tiib

I don't need the other info. I was just curious for the stats. I have a
feeling it's going to be one of those monstrous things though which I
tend to avoid. If anyone actually KNOWS what I am curious about, please
speak up, thanks. Surely someone here has actually worked on a
sophisticated (not personal project) one (?).

What manyears it takes to use? Downloading from sourceforge, some
tinkering ... lets say 0.003 to 0.006 developer years.
 
J

joe

Öö Tiib said:
What manyears it takes to use? Downloading from sourceforge, some
tinkering ... lets say 0.003 to 0.006 developer years.

No, I meant what it takes "to implement from scratch" and the other stats
which someone, no doubt, can rattle off in one second. I want to mentally
catalog the technology is all. I have no need for said technology though
at this time.
 
Ö

Öö Tiib

No, I meant what it takes "to implement from scratch" and the other stats
which someone, no doubt, can rattle off in one second. I want to mentally
catalog the technology is all. I have no need for said technology though
at this time.

I think there are not so lot of people who have implemented GC
libraries. The theory itself is not that complex. Even most clever
algorithms are describable with page of pseudo-code. What makes it
complex is all that platform-specific and implementation-specific
knowledge that is needed for implementing thread safe real-time
garbage collected memory management.

C++ does not yet have such essential things like virtual memory or
threads or atomic operations in it by current standard. Also binary
representation of polymorphic pointer to object may differ from the
one that new did return ... and so on.

Therefore implementing it certainly takes good specialists with deep
knowledge ... and that is situation when you may throw man years in
endlessly and get odd crashes (read "nothing") unless you have such
specialists available.
 
J

Jerry Coffin

[ ... ]
At least one system I've heard about (but never used) tried to
do better. In C++, Visual Age C++ managed things at a much
lower level, maintaining meta-information concerning who really
used what, and how, and knew which changes affected what types
of use. (There was or is also a Visual Age Java, which I
suppose behaves similarly.)

Note, however, that Visual Age for C++ has been discontinued for
quite a while now -- replaced by XL C++, which has better
conformance, but uses a more conventional build process (normal make
based on file time stamps).
 
J

Jerry Coffin

[ ... ]
I didn't claim that garbage collection is completely problem-free
and extremely efficient in all possible situations. I was simply
pointing out that GC algorithms have improved since the early 90's
and are much better now than they were back then (which isn't the
same thing as claiming that they are perfect).

I think it's worth pointing out that this isn't really a matter of GC
algorithms having improved since the early 90's. Rather, it's a
matter of older JVMs mostly using algorithms that have been known
since the late 1950's and early 1960's, where current JVMs use
algorithms from the mid- to late-1980's.

I believe current production JVMs mostly use the basic algorithm from
"Generation Scavenging: A Non-disruptive High Performance Storage
Reclamation Algorithm", (Ungar, 1984) in combination with some (but
I don't think all) of the techniques from: "Tenuring Policies for
Generation-Based Storage Reclamation" (Ungar and Jackson, 1988).
 
J

joe

Öö Tiib said:
I think there are not so lot of people who have implemented GC
libraries. The theory itself is not that complex. Even most clever
algorithms are describable with page of pseudo-code. What makes it
complex is all that platform-specific and implementation-specific
knowledge that is needed for implementing thread safe real-time
garbage collected memory management.

C++ does not yet have such essential things like virtual memory or
threads or atomic operations in it by current standard. Also binary
representation of polymorphic pointer to object may differ from the
one that new did return ... and so on.

Therefore implementing it certainly takes good specialists with deep
knowledge ... and that is situation when you may throw man years in
endlessly and get odd crashes (read "nothing") unless you have such
specialists available.

I was looking for quantification associated with a successful modern
implemenation rather than simple qualification. The example need not be
C++ probably.
 
J

Jerry Coffin

[ ... ]
So, how big is one of these new-fangled collectors? How many source files
and lines of code? How many developer-years are required to implement a
robust one? Examples if ya got em please.

While the exact size and effort obviously varies, a few data points
are available. For example, a few years ago Ravenbrook Ltd. released
their Memory Pool System as open source. At the time, they claimed
that it represented 30 person years of effort. A quick check of the
files shows around 2.4 megabytes.

Most others of which I'm aware are probably in the same general
ballpark. Of course, some can depend on things like how many systems
it may have been ported to. More ports usually means more work and
more code that don't mean much unless you need those particular
targets.
 
J

joe

Jerry said:
[ ... ]
So, how big is one of these new-fangled collectors? How many source
files and lines of code? How many developer-years are required to
implement a robust one? Examples if ya got em please.

While the exact size and effort obviously varies, a few data points
are available. For example, a few years ago Ravenbrook Ltd. released
their Memory Pool System as open source. At the time, they claimed
that it represented 30 person years of effort. A quick check of the
files shows around 2.4 megabytes.

Most others of which I'm aware are probably in the same general
ballpark. Of course, some can depend on things like how many systems
it may have been ported to. More ports usually means more work and
more code that don't mean much unless you need those particular
targets.

Thanks for that data point.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,372
Latest member
LucretiaFo

Latest Threads

Top