Java equivalent of C++ Spirit or Boost Graph Library ?

F

Fab

Hi !

Because of very interesting feature of Java that lacks to C++ :
memory handled by the language, I wonder if it would be eaily possible
to switch from C++ to Java ( In order to make the writing of an
interpreter for a functional language easier. )

I see some STL features like vectors are usable in Java.

But about other C++ libraries : do we often find something equivalent
in Java ?
For example :
With Boost Spirit, it's possible to define a language grammar using
Backus Naur Form : very cooler than lex/yacc/bison.
And Boost Graph Library provide maths graph structure, than can be
visited, studied, printed...

Please, does java provide something like these ?

Thanks for advance

Fabrice
 
C

Chris Uppal

Fab said:
Because of very interesting feature of Java that lacks to C++ :
memory handled by the language, I wonder if it would be eaily possible
to switch from C++ to Java ( In order to make the writing of an
interpreter for a functional language easier. )

It certainly possible, but you will be learning a new language -- don't assume
that familiarity with C++ will help you learn Java. (It will help in some way,
but it'll be a hindrance in other, and perhaps more important, ways.)

In particular, Java does not have templates, nor anything which even vaguely
resembles them. Hence there is not, and cannot be, an equivalent to the
template-based games which underlie the approach taken in the Boost
library(ies).

Of course, Java does have libraries -- it's just that they are based on
different principles.

With Boost Spirit, it's possible to define a language grammar using
Backus Naur Form : very cooler than lex/yacc/bison.

Look for ANTLR.

And Boost Graph Library provide maths graph structure, than can be
visited, studied, printed...

I don't know of any really advanced graph handling libraries in Java, myself.
(There are some pretty sophisticated graph libraries in C++, but I don't know
whether the Boost library is one of them). "Jung" and "JGraphT" are probably
worth a look (I haven't used either of them myself). If the (staggering) price
of "yFiles" is any indication of its quality and completeness then it must be
excellent ;-) No doubt you'll be able to find other packages...

The usual technique for finding stuff is to go to your favourite search engine.
For instance, type:
"graph library" java
into Google (it might help to add:
-chart
to the search terms, though that will eliminate some valid hits too[*]).

As a general point: you'll find that the libraries (standardised, open source,
or commercial) which are available tend to reflect what people are doing with
Java, so there is rather more support for tasks like drawing things or serving
up web pages, than for mathematical operations.

-- chris

([*] Interesting to note that Google's advert-selection algorithm responds to
me indicating a lack of interest in charts by serving up lots of charting
packages...)
 
C

Christopher Benson-Manica

Chris Uppal said:
In particular, Java does not have templates, nor anything which even vaguely
resembles them.

Surely the 1.5 generics can be said to at least "vaguely resemble" C++
templates? There are fundamental differences between how the features
are implmented in each language, but they enable similar design
decisions to be made.
Hence there is not, and cannot be, an equivalent to the
template-based games which underlie the approach taken in the Boost
library(ies).

I haven't looked at the Boost code, but I could definitely believe
that the code rampantly abuses templates, as C++ allows.
Of course, Java does have libraries -- it's just that they are based on
different principles.

It's worth noting (for OP) that Java's standard libraries are more
full-featured in many ways than anything a C++ library could offer -
for example, a Java author finds a wealth of thread support at his
fingertips, while a C++ author must turn to the operating system (or a
third-party package intended for that operating system) for help.
 
P

Patrick May

Christopher Benson-Manica said:
Surely the 1.5 generics can be said to at least "vaguely resemble"
C++ templates?

Only syntactically.
There are fundamental differences between how the features are
implmented in each language, but they enable similar design
decisions to be made.

Actually, they don't. Type erasure means that a generic class in
Java represents a single type, regardless of how it is parameterized.
This makes it impossible to use generics to implement techniques such
as James Coplien's Curiously Recurring Template idiom. Despite their
name, Java's generics do not support Generic Programming.

Generics in Java are only useful for eliminating the need to cast
in certain situations. They're too complex for the minimal value they
provide.

Regards,

Patrick
 
C

Christopher Benson-Manica

Only syntactically.

That sounds pretty vague :)
Actually, they don't. Type erasure means that a generic class in
Java represents a single type, regardless of how it is parameterized.

Well, perhaps I should have selected a more suitably vague definition
of "similar" for my comment. I came to Java from C++ and I admit to
being disappointed to learn that many things that could be done with
C++ templates were not possible in Java.
Generics in Java are only useful for eliminating the need to cast
in certain situations. They're too complex for the minimal value they
provide.

I disagree; even taken as mere syntactic sugar, the clarity
improvement that said sugar provides is significant.
 
J

Jeffrey Schwab

Fab said:
Because of very interesting feature of Java that lacks to C++ :
memory handled by the language, I wonder if it would be eaily possible
to switch from C++ to Java ( In order to make the writing of an
interpreter for a functional language easier. )

C++ provides excellent memory management via its standard library. Post
in comp.lang.c++.moderated. IMO, automatic garbage collection would be
a really bad reason to choose Java over C++, especially since I prefer
C++-style resource management.
I see some STL features like vectors are usable in Java.

But about other C++ libraries : do we often find something equivalent
in Java ?
For example :
With Boost Spirit, it's possible to define a language grammar using
Backus Naur Form : very cooler than lex/yacc/bison.
And Boost Graph Library provide maths graph structure, than can be
visited, studied, printed...

Myriad libraries are available for Java. The presence of specific
libraries would be an excellent reason for you to choose one language
over the other. Graph visualization especially seems to have more and
better open source options in Java.

Java does not have an equivalent of Boost, but the standard library is
enormous on its own. If your interpreted language can reasonably be
parsed with regular expressions and parse trees, you might not even need
a separate library for it.

C++ is tremendous. I am always amazed at what can be achieved
efficiently, both in terms of runtime resources and programmer effort,
with only the core language and standard library. It scales down as
well as it scales up, and its object model is both powerful and
graceful. The conventional wisdom is that Java programs require fewer
lines of code than equivalent C++ programs, but in my limited
experience, the programs being compared are really not equivalent.

The Java language enables a new programmer to start writing useful
programs far more quickly than does C++. The Java platform is enormous,
and is probably the best reason to choose Java over C++ for a specific
application. Though the core language is IMHO inferior to C++ its
relative simplicity and native support for multithreading are compelling
advantages. You will probably have a much easier time going from C++ to
Java than you would going the other way.
 
P

Patrick May

Christopher Benson-Manica said:
I disagree; even taken as mere syntactic sugar, the clarity
improvement that said sugar provides is significant.

Aye, but they could have been so much more.

Regards,

Patrick
 
T

Tor Iver Wilhelmsen

Christopher Benson-Manica said:
Surely the 1.5 generics can be said to at least "vaguely resemble" C++
templates?

Not formally - only if Sun foolishly had chosen to implement "real"
templating by synthesizing classes when you used generics, e.g. if use
of

ArrayList<String>

was turned into a class called

ArrayList_Generic$String

or some such nonsense.

There are fundamental differences between how the features
are implmented in each language, but they enable similar design
decisions to be made.

Yes, but the Java solution prevents foot-shooting.
 
F

Fab

Very thanks for the help and for the interesting answers !

Chris warns me about the difficulty of learning Java. Good thing :
must think about this. At first glance Java seemed to have very similar
syntax than C++. The main differences I was able to notice was the
simple inheritance and the non overloaded operators. I now realize it
is more deeply different.

About Spirit substitute, I see antlr is available as a package for my
debian. I discover it can work for Java and C++ too.

About BGL replacement :
Here is what it can do :
http://www.boost.org/libs/graph/doc/
a lot indeed but its pretty painful interface gives me headeache.
(So I wrap it for an easier use :
http://fabrice.marchant.free.fr/INFO/wrappers/DiGraphPtr.h
)
Looking at Jgraph. (There is a free version too). It seems indeed to
give "... more support for tasks like drawing things ... than for
mathematical operations."

Christopher wrote :
"...I could definitely believe that the code rampantly abuses
templates, ..."
Yes, your absolutely right. When you launch a compile including these
big templates (not really a lib then) on an old PII 350 MHz, you can go
and fetch a beer... But the executable is fast.

( More, maybe you would consider this as an abuse : the Boost MPL,
template metaprogramming give you the possibility to compute absolutely
all you want by the compiler ! Submit it a chess position and it 'll
solve mate at compile time. )

( I snip experts discussion about generic programming in Java. )

Jeffrey wrote :
"C++ provides excellent memory management via its standard library.
Post in comp.lang.c++.moderated."
I did post in comp.lang.c++. Moderated forum would be better for this
question ?

"IMO, automatic garbage collection would be a really bad reason to
choose Java over C++, especially since I prefer C++-style resource
management."
The interpreter I try to write is for a new Lisp-like language.
Mc Carthy created Lisp and Garbage Collection too ( I think ).
I do need GC too cos the project is a functional language that causes
cyclic references and it would be very hard to explicitely unallocate
the forms that are generated at each evaluation.

Here is an excerpt from the agressive book, The UNIX-Haters :

"C++ users, alas, are forced to pick up their garbage manually. Many
have been brainwashed into thinking that somehow this is more efficient
than using something written by experts especially for the platform
they use.
These same people probably prefer to create disk files by asking for
platter, track, and sector numbers instead of by name.
...a study comparing performance of programmer-optimized memory
management techniques in C versus using a standard garbage collector.
C programmers get significantly worse performance by rolling their own.
OK, suppose you are one of those enlightened C++ programmers who wants
a garbage collector portable and reusable..."

I agree with you in the way it would be very cleaner to be able to
accurately free all we have created. Relying on a GC seems at best
blurred.
However do not know how to avoid this.

"...If your interpreted language can reasonably be parsed with regular
expressions and parse trees, you might not even need a separate library
for it."

Joel de Guzman (Spirit) wrote : "True, there are tools such as
regular-expression libraries (such as boost regex) or scanners (such as
boost tokenizer), but these tools do not scale well when we need to
write more elaborate parsers. Attempting to write even a
moderately-complex parser using these tools leads to code that is hard
to understand and maintain. "

I wrote the parse part of my interpreter project using only the
Standard Template Library and the code is complicated, ugly...

Then I discovered Spirit, I was amazed about the clarity it provides.
Moreover, you program in BNF-like C++ directly ! No extra translation
step required.

Thank you again Jeffrey for your interesting comparison between the
languages.

Please, is there somewhere a classified list of the numerous Java
libraries (with a concise description) that we could browse by category
?

Regards

Fabrice
 
C

Chris Uppal

Christopher Benson-Manica wrote:

[me:]
Surely the 1.5 generics can be said to at least "vaguely resemble" C++
templates?

Not really, at least not as I see it. The two language features are completely
different (apart from superficial syntax); the "similarity" is only that there
is an overlap in what they are used /for/.

C++ templates are a way of doing real macros (or "template metaprogramming|" as
they like to call it). A strange, limited, and clunky feature -- or rather, an
elegant and enormously powerful idea given a strange, limited, and clunky
expression in C++. One of the many potential applications for that feature is
in creating type-safe collections.

Java's generics are a way of not having to write explicit casts. A strange,
limited, and clunky feature -- or rather, a moderately useful idea given a
strange, limited, and clunky expression in Java. One of the few potential
applications for that feature is in creating type-safe collections.

You may not agree (and especially, you may not agree with my value judgements),
but I hope you see what I mean.

Note that the OP's questions were about template applications outside the
overlap with Java's generics.

-- chris
 
C

Chris Uppal

Fab said:
Looking at Jgraph. (There is a free version too). It seems indeed to
give "... more support for tasks like drawing things ... than for
mathematical operations."

Small point: not "JGraph", but "JGraphT" -- a different package with a
different purpose (and licensing).

Please, is there somewhere a classified list of the numerous Java
libraries (with a concise description) that we could browse by category
?

Not that I know of.

The Sun-supplied libraries are already extensive enough that they could do with
a better indexing system than is available. Large external libraries (such as
the stuff produced under the Apache Jakarta umbrella) are similarly in need of
better indexing. And there's masses of stuff out there which doesn't fall into
either class.

Sometimes you can find helpful lists on the Web by starting with the names of a
few related libraries, and then searching for pages that mention all of them.
I've just found
http://java-source.net/
using that technique -- it may help you find out more of what's available.

-- chris
 
C

Chris Uppal

Jeffrey said:
C++ provides excellent memory management via its standard library.

I'm not sure that I'd call it "excellent", but however you judge it, it is not
at all suitable for the OP's purposes. He will need proper garbage collection,
so it's a choice between using a system-supplied one, or writing your own /in/
C++.

But a word of warning: GC algorithms/implementation tend (naturally) to be
tuned to the allocation patterns of the languages for which they are intended.
The pattern of references between objects in Java may be very different from
those naturally arising in a functional language implementation -- and it might
also depend on the style of the implementation. (I remember a paper comparing
the Boehm conservative collector for C/C++ in an implementation of Lisp and a
combinator-machine implementation of some FP language -- it didn't work well at
all for the latter application, probably because the reference networks were
much bigger and more tangled).

In this case, I'd be a bit wary in case the JVM's collector assumed that chains
of object references were short enough to be followed by explicitly recursive
function calls. If so, then there /may/ be a risk that it would run out of
stack space trying to GC an object-network that arose from some styles of FP
implementation. It's easy enough to test -- just create a long chain of
objects (longer than would naturally arise in the FP language implementation)
and see if the collector breaks ;-)

class Cell { Cell next; }

Cell head = new Cell();
Cell tail = head;
for (long i = 0; i < LONGEST_PLAUSIBLE_CHAIN; i++)
{
tail.next = new Cell();
tail = tail.next;
}

head = tail = null;

// now wait until the GC has had a chance to crash...


Please note that I'm not saying that the GC in any particular JVM /would/ be
unsuitable for an FP language implementation, only that there /might/ be an
issue (depending on many factors, including the implementation strategy of the
FP language itself).

-- chris
 
J

Jeffrey Schwab

Chris said:
I'm not sure that I'd call it "excellent", but however you judge it, it is not
at all suitable for the OP's purposes. He will need proper garbage collection,
so it's a choice between using a system-supplied one, or writing your own /in/
C++.

I really don't understand why you would even think something like that.
I disagree entirely with your premise. What makes you think that
garbage collection is indispensable to the authoring of an interpreted
language? The most popular dynamic languages are written in C, so
clearly, GC is not strictly necessary. C++, in fact, has a fantastic -
I do call it excellent - model of resource management. It's not
"improper" garbage collection, or a partial GC implementation, or
anything like it.

Don't get me wrong; I'm not trying to advocate either C++ or Java over
the other. The two languages have different approaches, and I'm not
sure why you think one of those approaches is the only right answer. I
definitely feel that GC is an awful reason to choose Java over C++;
memory is only one kind of resource, and I prefer deterministic
destruction to GC and finally-blocks. The functionality of the Java
platform, the cleanliness of the syntax, and the portability of the
compiled code are all much better reasons to prefer Java.
 
J

Jeffrey Schwab

Fab said:
Very thanks for the help and for the interesting answers !

Chris warns me about the difficulty of learning Java. Good thing :
must think about this. At first glance Java seemed to have very similar
syntax than C++. The main differences I was able to notice was the
simple inheritance and the non overloaded operators. I now realize it
is more deeply different.

About Spirit substitute, I see antlr is available as a package for my
debian. I discover it can work for Java and C++ too.

About BGL replacement :
Here is what it can do :
http://www.boost.org/libs/graph/doc/
a lot indeed but its pretty painful interface gives me headeache.
(So I wrap it for an easier use :
http://fabrice.marchant.free.fr/INFO/wrappers/DiGraphPtr.h
)
Looking at Jgraph. (There is a free version too). It seems indeed to
give "... more support for tasks like drawing things ... than for
mathematical operations."

Christopher wrote :
"...I could definitely believe that the code rampantly abuses
templates, ..."
Yes, your absolutely right. When you launch a compile including these
big templates (not really a lib then) on an old PII 350 MHz, you can go
and fetch a beer... But the executable is fast.

( More, maybe you would consider this as an abuse : the Boost MPL,
template metaprogramming give you the possibility to compute absolutely
all you want by the compiler ! Submit it a chess position and it 'll
solve mate at compile time. )

( I snip experts discussion about generic programming in Java. )

Jeffrey wrote :
"C++ provides excellent memory management via its standard library.
Post in comp.lang.c++.moderated."
I did post in comp.lang.c++. Moderated forum would be better for this
question ?

Possibly. The posters are generally far more familiar with the
language, and more accomplished in their fields. Many of them have
strong affections for smart pointers, though, which are likely to be
their suggestion to you re. memory management; let me disagree a priori. :)

Is this your post in comp.lang.c++?

http://groups-beta.google.com/group...e95bb/ec868a8be396b6df?hl=en#ec868a8be396b6df

"I need to use a GC in order to program an
interpreter for a functional language. Memory
handling would be otherwise difficult too much."

Do you understand RAII and deterministic destruction? Once you do, you
will probably be far less nervous about using C++. Of course you if
you go with Java, this group is certainly a great resource, and the
level of discourse in comp.lang.java.programmer seems (believe it or
not) to be generally higher than comp.lang.c++. :)
"IMO, automatic garbage collection would be a really bad reason to
choose Java over C++, especially since I prefer C++-style resource
management."
The interpreter I try to write is for a new Lisp-like language.
Mc Carthy created Lisp and Garbage Collection too ( I think ).

So sayeth Wikipedia.
I do need GC too cos the project is a functional language that causes
cyclic references and it would be very hard to explicitely unallocate
the forms that are generated at each evaluation.

No, that point does not follow from your requirements. At any time,
each dynamically created object A should be owned by exactly one object
B. You can have all the circular references you like, as long as you
avoid circular ownership. Circular references have gained notoriety
because they don't get properly collected by reference-counted garbage
collectors, e.g. Perl's. They're not a problem in either a
well-designed C++ program or a typical, mark & sweep Java implementation.
Here is an excerpt from the agressive book, The UNIX-Haters :

"C++ users, alas, are forced to pick up their garbage manually. Many
have been brainwashed into thinking that somehow this is more efficient
than using something written by experts especially for the platform
they use.
These same people probably prefer to create disk files by asking for
platter, track, and sector numbers instead of by name.

Well, **** the UNIX-Haters, too. C++ does not force you to pick up your
garbage "manually." It does, however, force you to think (for programs
beyond a certain size) about object ownership and other relationships.
It supports you in this endeavor with probably the best static type
system of any language in popular use. You can use (almost) whatever
design techniques you find most natural: procedural, object-oriented,
or even functional; low-level or abstract. (On the down-side, some
techniques, e.g. aspect-oriented programming, do require a silly amount
of typing.)
...a study comparing performance of programmer-optimized memory
management techniques in C versus using a standard garbage collector.
C programmers get significantly worse performance by rolling their own.

Right on. You're unlikely to roll a GC that's better than the HotSpot
VM's. Remember, though, that C is a very different language from C++;
C++ gives you vastly better ways to manage memory and other resources.
OK, suppose you are one of those enlightened C++ programmers who wants
a garbage collector portable and reusable..."

I agree with you in the way it would be very cleaner to be able to
accurately free all we have created. Relying on a GC seems at best
blurred.
However do not know how to avoid this.

Post some details (and preferably a little code) in clc++m and watch the
magic fly. :)
"...If your interpreted language can reasonably be parsed with regular
expressions and parse trees, you might not even need a separate library
for it."

Joel de Guzman (Spirit) wrote : "True, there are tools such as
regular-expression libraries (such as boost regex) or scanners (such as
boost tokenizer), but these tools do not scale well when we need to
write more elaborate parsers.

Do you need an elaborate parser? How does your grammar look? Can you
post some sample code from your interpreted language?
Attempting to write even a
moderately-complex parser using these tools leads to code that is hard
to understand and maintain. "

That has not been my experience. The only interpreted languages I've
written that actually got used regularly by other people have been very
domain-specific, but the parsers were at least moderately complex, and
the code was straight-forward to maintain. My languages of preference
for parsers have been C++ (because it lets me use one language for the
whole application) and Perl. I don't doubt that one *can* write hideous
code using regular expressions and tokenizers, but saying so is like
pointing out that only one end of a hammer is any good for whacking nails.
I wrote the parse part of my interpreter project using only the
Standard Template Library and the code is complicated, ugly...

Don't blame the STL! The STL on its own is not meant to be a complete
solution to every problem, any more than Java Collections are.
Then I discovered Spirit, I was amazed about the clarity it provides.
Moreover, you program in BNF-like C++ directly ! No extra translation
step required.

Cool. I have not used Spirit. On the other hand, I don't usually
specify languages in BNF. :)
Thank you again Jeffrey for your interesting comparison between the
languages.

Please, is there somewhere a classified list of the numerous Java
libraries (with a concise description) that we could browse by category
?

No, there's not CPAN-like central repository. This group seems to be as
good a place as any to ask about specific libraries, though. You might
start here to get some ideas:

http://directory.google.com/Top/Science/Math/Combinatorics/Software/Graph_Drawing/
 
K

kevin cline

Christopher said:
Surely the 1.5 generics can be said to at least "vaguely resemble" C++
templates?

The only thing Java generics have incommon with C++ templates is their
shared use of angle brackets. C++ templates generate code. Java
templates are merely syntactic sugar to allow more precise compile-time
type checking.
 
C

Chris Uppal

Jeffrey Schwab wrote:

[me:]
I really don't understand why you would even think something like that.
I disagree entirely with your premise.

To be honest I cannot understand how /you/ could possible think that. I'm not
trying to be offensive, and I don't want to come across that way, but I can't
think of any milder way to express my bafflement.

The OP stated that he was interested in creating a Lisp-like interpreted
language. There are /very/ few interpreted languages which are not
garbage-collected (the only one /I/ can think is/was early PostScript); and he
also stated that he was interested in Java because it came with GC built-in.
So I'm assuming that he wants his language to feature garbage collection too.
I'll go further than that: he'd have to be insane to consider creating a
general-purpose interpreted language which /didn't/ have garbage collection.
(Special-purpose languages may not need or benefit from it -- I have written
ones which do and -- rather more often -- ones which don't.)

So, how can he implement GC for his language in C++ ? He /can't/ just expose
C++'s memory management (or lack of it) at the level of his language -- by
hypothesis. There is no way to implement what is intrinsically a global
operation ("is this memory used anywhere ?") using only local reasoning. But
C++'s resource management is entirely local. So he'd have to build a garbage
collector /in/ C++. It might be that some of the C++ idioms would help in
implementing that -- for instance one might represent "pointers into the heap"
on the C++ stack as some sort of smart pointer which automatically maintained
the necessary housekeeping information so that the global GC could find out
which stack slots held references into the heap. (Actually, I think that would
be a very bad implementation strategy, but only for performance reasons -- it
would work, but it'd be slow.)

Another option would be to use the Boehm Conservative Collector in the C++
program. I've used that once myself (for a small language implemented in C).
That has potential problems, and I would not recommend it unless, for some
reason, you /have/ to use C or C++ as the implementation language (in which
case it becomes a reasonable option to evaluate). Again, though, that
collector supplants the C++ resource management, replacing it with a global
operation.

The /real/ solutions (the ones that don't attempt to cut corners) would be
either to implement the target language in another language which does feature
proper memory management, or to create a /real/ GC implementation in C++. That
is perfectly possible (note that the GC wouldn't be managing C++'s memory, only
the memory of the target language). The problem is that it is difficult and
messy to create a decent implementation (for instance it affects /all/ the
other code). Which is one reason why so many of the "classic" interpreted
languages (Perl, Python, etc) feature half-arsed GC implementations, and indeed
the early JVMs from Sun featured laughably inadequate attempts at GC too.

Incidentally, you said (in a different post in this thread) something like (not
an exact quote) "in any well-designed C++ program, each dynamically allocated
object will have exactly one owner". I agree, but it's important to realise
that "well designed" here means only "feasibly implementable in C++" -- it is a
/restriction/ imposed by the language, not a feature of well-designed programs
in general. The effect of C++'s lack of automatic /global/ resource management
is to reduce the design space to only those designs in which a clear notion of
ownership can be identified. Not all designs have that property, and indeed, I
would say that not many /do/ have that property, and that for a designer to
have to consider /whether/ that property can be engineered into some design is
just a waste of that designer's time and effort.

It's probably obvious by now that I consider GC a necessity for "real"
general-purpose languages, and hence that I consider C++ and C only suitable
for special cases and very small programs. Writing a very high-performance
language interpreter (such as a modern JVM) can certainly be considered one
such special-case -- and C++ wouldn't at all be a bad choice /if/ the
implementation were to be so sophisticated that it needs to hit the metal hard.
For a less extreme implementation it seems to me that one would be better off
treating the problem as just another programming task (one where things like
abstraction, clarity, and an unrestricted design-space, are more important than
access to bare metal) and write the implementation in a real programming
language -- Lisp, Smalltalk, Haskell, or even Java...

But I've been wandering off-topic a bit. The real point is that, since C++
doesn't have global memory management, using it to implement a language which
/does/ have that is tricky at best.

-- chris
 
J

Jeffrey Schwab

Chris said:
Jeffrey Schwab wrote:

[me:]
I really don't understand why you would even think something like that.
I disagree entirely with your premise.

To be honest I cannot understand how /you/ could possible think that. I'm not
trying to be offensive, and I don't want to come across that way, but I can't
think of any milder way to express my bafflement.

Ditto completely.
The OP stated that he was interested in creating a Lisp-like interpreted
language. There are /very/ few interpreted languages which are not
garbage-collected (the only one /I/ can think is/was early PostScript);

The languages are garbage-collected, but the implementations are not.
Even Java's implementation is written in C++. Nobody's saying his
functional language shouldn't have GC.
and he
also stated that he was interested in Java because it came with GC built-in.
So I'm assuming that he wants his language to feature garbage collection too.

Right, because he thinks he needs it. My point was that he might not,
and that he probably shouldn't choose his language based on the presence
of absence of GC.
I'll go further than that: he'd have to be insane to consider creating a
general-purpose interpreted language which /didn't/ have garbage collection.

I guess that depends on the definition of "general-purpose," but you're
certainly correct that the popular dynamic languages have GC.
(Special-purpose languages may not need or benefit from it -- I have written
ones which do and -- rather more often -- ones which don't.)

So, how can he implement GC for his language in C++ ?

Using the same techniques as Ruby, Python, Java, Perl... Do you really
foresee Groovy or BeanShell replacing LAMP?
He /can't/ just expose
C++'s memory management (or lack of it) at the level of his language -- by
hypothesis.

C++ does not lack memory management. Please stop implying otherwise.
It's getting irritating.
There is no way to implement what is intrinsically a global
operation ("is this memory used anywhere ?") using only local reasoning. But
C++'s resource management is entirely local.

No, it's as local as it needs to be. It doesn't force you to use a big,
global collector, although the standard containers do use allocators
that share the same memory pool.
So he'd have to build a garbage
collector /in/ C++. It might be that some of the C++ idioms would help in
implementing that -- for instance one might represent "pointers into the heap"
on the C++ stack as some sort of smart pointer which automatically maintained
the necessary housekeeping information so that the global GC could find out
which stack slots held references into the heap. (Actually, I think that would
be a very bad implementation strategy, but only for performance reasons -- it
would work, but it'd be slow.)

He wouldn't have to write any such thing, because the standard library
includes smart pointer implementations, and various others are freely
available. Rather than pan the performance on theoretical grounds, you
might want to get some numbers from the real world.

FWIW, I agree that smart pointers are not the way to go. I'd rather
have either Java-style GC, or use smart Factories.
Another option would be to use the Boehm Conservative Collector in the C++
program. I've used that once myself (for a small language implemented in C).
That has potential problems, and I would not recommend it unless, for some
reason, you /have/ to use C or C++ as the implementation language (in which
case it becomes a reasonable option to evaluate). Again, though, that
collector supplants the C++ resource management, replacing it with a global
operation.

C is not C++. Really.

All allocators of a given type have to use the same storage pool in
order to be compatible with the STL. They typically use static pools,
so they actually are global. The STL containers put memory back into
the pools as soon as they're through with it, which may be either faster
or slower than waiting for them to be marked & swept, depending on the
situation.
The /real/ solutions (the ones that don't attempt to cut corners) would be
either to implement the target language in another language which does feature
proper memory management, or to create a /real/ GC implementation in C++. That
is perfectly possible (note that the GC wouldn't be managing C++'s memory, only
the memory of the target language). The problem is that it is difficult and
messy to create a decent implementation (for instance it affects /all/ the
other code). Which is one reason why so many of the "classic" interpreted
languages (Perl, Python, etc) feature half-arsed GC implementations, and indeed
the early JVMs from Sun featured laughably inadequate attempts at GC too.

Have you seriously had a problem with GC in Python? They're not
"half-arsed" implementations.
Incidentally, you said (in a different post in this thread) something like (not
an exact quote) "in any well-designed C++ program, each dynamically allocated
object will have exactly one owner". I agree, but it's important to realise
that "well designed" here means only "feasibly implementable in C++" -- it is a
/restriction/ imposed by the language, not a feature of well-designed programs
in general.

We disagree again. You've effectively said that a certain type of GC is
critical, but that only Java (and I assume you acknowledge C#) provides
it. I say that deterministic destruction is a reasonable alternative.
The fact that C++ has better support for it than other languages does
not mean the language is more restrictive.
The effect of C++'s lack of automatic /global/ resource management
is to reduce the design space to only those designs in which a clear notion of
ownership can be identified. Not all designs have that property, and indeed, I
would say that not many /do/ have that property, and that for a designer to
have to consider /whether/ that property can be engineered into some design is
just a waste of that designer's time and effort.

I think we're viewing the world from very different angles. Determining
the hierarchy ownership has proven one of the most valuable techniques
at my disposal.
It's probably obvious by now that I consider GC a necessity for "real"
general-purpose languages, and hence that I consider C++ and C only suitable
for special cases and very small programs.

Why do you keep mentioning C?

It sounds like only Java has a GC you approve, ergo you consider only
Java a "real" general-purpose language. I suppose you might make an
allowance for C#.
Writing a very high-performance
language interpreter (such as a modern JVM) can certainly be considered one
such special-case -- and C++ wouldn't at all be a bad choice /if/ the
implementation were to be so sophisticated that it needs to hit the metal hard.
For a less extreme implementation it seems to me that one would be better off
treating the problem as just another programming task (one where things like
abstraction, clarity, and an unrestricted design-space, are more important than
access to bare metal) and write the implementation in a real programming
language -- Lisp, Smalltalk, Haskell, or even Java...

None of which are (except Smalltalk) are canonically implemented in
languages with native GC.
But I've been wandering off-topic a bit. The real point is that, since C++
doesn't have global memory management, using it to implement a language which
/does/ have that is tricky at best.

You're wrong. :)
 
C

Chris Smith

Pardon for jumping in...

Jeffrey Schwab said:
The languages are garbage-collected, but the implementations are not.
Even Java's implementation is written in C++. Nobody's saying his
functional language shouldn't have GC.

Right. Machine code is not garbage collected, and there exist garbage
collected interpreted languages, ergo it is possible to implement a
garbage collected language interpreter in a non-collected language.
That doesn't mean it is easy. In fact, I will go ahead and assert,
without explicit evidence, that it is among the most difficult things to
do well, throughout all of software development. I feel comfortable
asserting this because I've done it, and I've done most other things,
and I can compare my experience in the two.

There are ways to make it easier, of course. You can give up on certain
performance implementation techniques, for example by sticking with a
single kind of collector, and avoiding generational implementation. You
can give up on correct implementation and document the deficiencies, as
happens with both conservative collectors and practically all reference
counting implementations. But then you have to deal with the
deficiencies, and it's still not exactly easy unless you go the ref-
counting route.
Right, because he thinks he needs it. My point was that he might not,
and that he probably shouldn't choose his language based on the presence
of absence of GC.

I'd definitely think that if one can save perhaps six months of hard
development work, and get a first-class garbage collector in the
bargain, by choosing a certain implementation language, then that's a
pretty decent reason for the decision. Perhaps there are other factors
that would outweigh that advantage, but they'd have to be pretty big
factors.
C++ does not lack memory management. Please stop implying otherwise.
It's getting irritating.

C++ certainly lacks automatic memory management, resulting in
programmers doing a lot of the work that would be considered memory
management in a garbage collected language. You can be as picky as you
like about the words; but there's something being said there whether you
like the wording or not. Feel free to propose new wording, and I'm sure
people will be happy to consider it.
No, it's as local as it needs to be.

If we're talking about memory management at the C++ level, the language
specification makes it well-nigh impossible to provide a real garbage-
collected implementation of C++. The culprit is that there's too much
specified behavior involving memory layout and pointers. If more of
that kind of thing resulted in undefined behavior, then it may be
possible to provide a garbage-collected implementation; but that's not
the case.

Of course, none of that applies to the target interpreted language; but
you certainly can't push memory management back onto the C++ manual heap
in the same way you can in Java or any other garbage collected language.
A reasonable-performance garbage collector would require that the
interpreter allocate large blocks of memory for itself, probably using a
system call like sbrk rather than any C++ language concept, and then
provide its own memory management with that. Memory that's used by the
interpreted language could not be allocated from the standard C++ heap,
whereas that could easily be done in Java.
Rather than pan the performance on theoretical grounds, you
might want to get some numbers from the real world.

If you're referring to the statement that reference counting would yield
poor performance, that is well-documented enough throughout the gc
literature that there's no reason to ask anyone to test it yet again
every time they want to make the claim.
Have you seriously had a problem with GC in Python? They're not
"half-arsed" implementations.

I haven't used Python enough to know whether the problems are serious or
not. From reading documentation, though, it appears Python has a
correct garbage collector with poorer than needed performance. That may
be fine for many applications, and not for others. It also appears to
have taken them a while to finish (they had a broken garbage collector
until version 2.0). It doesn't appear to be buggy.

That being said, though, Python is a pretty big project with a lot of
developers. Things that are feasible for Python are not necessarily
feasible in other contexts. (Actually, I'm surprised that Python
doesn't have a better collector by now -- one that combines different GC
techniques according to what's best for the specific generation -- but I
suppose there may be enough overhead elsewhere that gc overhead doesn't
matter so much.
We disagree again. You've effectively said that a certain type of GC is
critical, but that only Java (and I assume you acknowledge C#) provides
it. I say that deterministic destruction is a reasonable alternative.

While being careful not to speak for Chris Uppal, I'd suggest it's very
likely that when he talks about a real garbage collector, he means one
that:

(a) doesn't sacrifice performance unnecessarily
(b) is correct (not conservative, collects all garbages)

Garbage collectors that provide deterministic destruction are okay for
very limited applications, but they have performance disadvantages that
prevent their being used for most applications. In most cases where
deterministic destruction is needed, the application should not be
implemented with a garbage collected heap at all.
The fact that C++ has better support for it than other languages does
not mean the language is more restrictive.

If you mean that a non-garbage collected language such as C++ is
sufficient for general purpose scripting, then you are somewhat outside
of the mainstream, and near-universal practice disagrees with you.
I think we're viewing the world from very different angles. Determining
the hierarchy ownership has proven one of the most valuable techniques
at my disposal.

It is provable (see work by Hans Boehm) that there are problems for
which solutions satisfying the need for an explicit free operation
require asymptotically greater running time than the equivalent
implementation in a deferred garbage collector. It is also provable
(see Andrew Appel) that the same is not true in the other direction.
This is not a matter of perspective. If you find assigning ownership
helpful to your design, you may continue to do it in any language; but
not needing to do so when it makes efficient solutions impossible is
objectively beneficial.
It sounds like only Java has a GC you approve, ergo you consider only
Java a "real" general-purpose language. I suppose you might make an
allowance for C#.

Huh? You are reading something into Chris's responses that isn't
actually there.
 
C

Chris Uppal

Chris said:
Pardon for jumping in...

Of course ;-)

It is provable (see work by Hans Boehm) that there are problems for
which solutions satisfying the need for an explicit free operation
require asymptotically greater running time than the equivalent
implementation in a deferred garbage collector. It is also provable
(see Andrew Appel) that the same is not true in the other direction.

Do you have references for that ? (I'm not disputing you -- indeed, something
like it seems intuitively obvious -- but I'd like to see the original papers.)

Huh? You are reading something into Chris's responses that isn't
actually there.

<nods> I assumed that Jeffrey had forgotten that [I knew that?] there are
several (as I would call it) "real" languages, designed /and/ implemented for
programming in-the-large. Java, at best[*], is only one of them.

-- chris

[*] I may as well repeat that I don't think Java is particularly
well-designed -- for programming at any scale -- but my reservations have
nothing to do with the quality of implementation of Sun's recent JVMs.
 
C

Chris Smith

Chris Uppal said:
Do you have references for that ? (I'm not disputing you -- indeed, something
like it seems intuitively obvious -- but I'd like to see the original papers.)

Regarding the first, I have nothing more specific that what I originally
wrote, which is that Hans Boehm is responsible for it. I looked, but
was unable to find a reference for it. The paper gave a specific
problem -- IIRC something to do with string processing -- that can't be
done in better than O(n^2) time if you need to free the memory
explicitly, but which uses O(n) time as part of a larger application
with a copying garbage collector. Essentially, tracking when a piece of
memory needs to be freed is quadratic, but the core algorithm is linear.

Andrew Appel is ultimately responsible for the second, though it's been
refined over time. The original argument is at
http://citeseer.ist.psu.edu/appel87garbage.html. I think it's fair to
say that the consensus is that there is no generalized justification for
the "faster" statement of the original paper, which depends on the
details of the machine architecture and isn't an asymptotic statement at
all; but "asymptotically at least as fast" definitely does follow.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top