New C++ garbage collector

K

Keith H Duggar

GC is for to have well-defined "destroyed and deallocated" state for
objects of dynamic storage. What is so bad about it? Real people have
often to maintain ugly C++ code-bases. It is pointless to mention that
good programmer does not write ugly things. Where to take them
"goods"? Check reality. Majority of C++ code on our planet is written
by sub-average programmers.

The problem is that answering that question requires detailed
and clear thinking exploring all ramifications. Such analysis is
far beyond what the majority of posters bring to the table in a
casual unmoderated forum such as this.

Fortunately, there are people willing to invest the required work
to clearly think through and to discuss the issues more completely
in a moderated forum. Here is a place to start:

http://groups.google.com/group/comp.lang.c++.moderated/msg/35b67606d4c81fa2

to understand the problems with zombie states. But you should
read that entire thread which, though deceptively named, is about
GC in C++. If you don't want to read the whole thread, you should
at least pay very careful attention to the points David Abrahams
makes.

KHD
 
Ö

Öö Tiib

The problem is that answering that question requires detailed
and clear thinking exploring all ramifications. Such analysis is
far beyond what the majority of posters bring to the table in a
casual unmoderated forum such as this.

Fortunately, there are people willing to invest the required work
to clearly think through and to discuss the issues more completely
in a moderated forum. Here is a place to start:

   http://groups.google.com/group/comp.lang.c++.moderated/msg/35b67606d4...

to understand the problems with zombie states. But you should
read that entire thread which, though deceptively named, is about
GC in C++. If you don't want to read the whole thread, you should
at least pay very careful attention to the points David Abrahams
makes.

I have read plenty of such discussions.

1) No one argues with the fact that GC adds well-defined and
detectable corpse state for detecting dangling raw pointer
dereference.

2) I do not argue that it is far from what is GC in C# or Java. Call
it something else I don't care.

3) No one argues that without GC dangling raw pointer dereference is
worst to detect programming error. Both with statical analysis and run-
time.

4) I do not claim that there are lot of cases when raw pointers are
needed in modern C++. There is huge pile of templated containers and
the shared_ptr, scoped_ptr, unique_ptr, auto_ptr, intrusive_ptr,
cow_ptr and just describe it, its there. C++ application with zero raw
pointers is therefore very possible.

5) Because of 4) GC as core language feature would be waste. It is
fine as option/library only for cases when raw pointers are used/
taken.

6) Classes for what raw pointers *are never* taken do not need to be
run-time checked for zombieness. Lack of such checks in classes for
what raw pointers *are* taken is statically analyzable. Such checks
may even help without GC on some implementations (despite it is UB
then).

7) I haven't seen anyone saying that he loves weak_ptr. No wonder. It
is ugly.

8) weak_ptr and shared_ptr combo can be on lot of cases refactored
back to raw pointer with GC. That gains performance ... or at least my
profiling attempts show so. Checks should be anyway there with
weak_ptr. shared_ptr is fine on lot of cases but does not also deserve
any silver-bullet-dance-festivals, it has refcount to manage plus
additional indirection so it costs.

9) GC makes necromancy possible, but what i am a necromancer? No,
necromancy is evil and black magic and it is easier to detect than
dangling pointer.

10) People who are making money by consulting and training C++ (like
Dave Abrahams) certainly love that C++ is such a pile of gotchas and
undefined behaviors from top to bottom. People who write software have
not entirely overlapping goals with them.
 
D

Dilip

My view is that garbage collected C++ sucks, I have yet to be convinced
of the validity of an alternative view. :)

And you won't be because the moment someone disagrees with you, you
will probably start questioning their manhood, their lineage and what
not. As a passive observer I find your posts needlessly arrogant and
reeks of a general my-way-or-the-highway mentality that adds zero
value to any discussion you participate.

I don't know why long-timers like James Kanze even bother to engage
you.
 
A

Andy Venikov

That's a blatent contradiction: an object with an arbitrary
lifetime has an arbitrary lifetime, not one controlled by some
smart pointer. In many applications, almost all instances of
delete are "delete this". (And as far as I know, "this" is
never a smart poionter.)
<snip>

Ok, I have nothing to add here, but I think a small clarification may
help this discussion.

Leigh, do you mean to say that you're talking about objects with
arbitrary BUT *deterministic* lifetime? In that case it makes sense and
I don't see any contradiction here.

Andy.
 
A

Andy Venikov

<snip>

Ok, I have nothing to add here, but I think a small clarification may
help this discussion.

Leigh, do you mean to say that you're talking about objects with
arbitrary BUT *deterministic* lifetime? In that case it makes sense and
I don't see any contradiction here.

Andy.

Ok, I just read you other post. It looks like that's exactly what you mean.
 
J

James Kanze

On Nov 2, 2:40 pm, Keith H Duggar <[email protected]> wrote:

[...]
4) I do not claim that there are lot of cases when raw pointers are
needed in modern C++. There is huge pile of templated containers and
the shared_ptr, scoped_ptr, unique_ptr, auto_ptr, intrusive_ptr,
cow_ptr and just describe it, its there. C++ application with zero raw
pointers is therefore very possible.

Not really. None of the above smart pointers handle navigation,
for example (the major use of pointers on most of the systems
I have worked on in the past), and none are practical for
graphs.
5) Because of 4) GC as core language feature would be waste.

Except for the many applications which have intelligent objects,
and need to navigate between them.

Formally, GC must be a core language feature in order to work;
it places certain constraints on how the compiler can generate
code. (In practice, actual compilers for common architectures
don't violate those constraints, so the Boehm collector works.
Until it doesn't, because some future version of the compiler
violates those constraints.)
 
J

James Kanze

On 02/11/2010 18:31, Paavo Helde wrote:

No. You're the one who is trying to pretend that the two are
exclusive, when all of the users of GC in C++ I know use both.
The problem is that RAII requires destructors to be run; when
exactly do you run the destructor for a garbage collected
object?

You are also the one who doesn't bother reading other people's
postings, and prefers to invent strawmen to argue against.
Unlike memory some resources can not be freed arbitrarily when
the garbage collector feels like running. If you have to
explicity call a cleanup method to close resources then why
bother with a garbage collector at all?

Because it handles one particular problem better than anything
else.

[...]
My view is that garbage collected C++ sucks, I have yet to be
convinced of the validity of an alternative view. :)

My view is that your view is nothing but prejudice, unsupported
by any real experience or sound arguments. (At least, you've
not presented any here.)
 
J

James Kanze

James Kanze <[email protected]> wrote:

[...]
I am afraid I don't see them as such. I can agree that it is
possible for them to coexist but *completely* orthogonal would
require them to have no effect on one another whatsoever.

Which is largely the case. RAII depends on object lifetime, by
giving objects specific behavior on their death. GC has nothing
to do with object lifetime.

Perhaps the problem is that GC has been "oversold" by Java
fanatics. Even in Java, you have to deal with lifetime of
object issues.

Another problem, however, is that too many C++ programmers tend
to insist that every object have a deterministic lifetime. Many
don't, and if memory is managed automatically, most don't.
Without memory management, most destructors would be empty.
Fair enough. It is not an inherant advantage to always treat
all resources the same. If both sides of the interface agree
that different resources are to be treated differently and
both sides of the interface as designed with this
understanding, this is perfectly fine. This works with Java.
However, when you design with RAII, all resources get treated
the same.

Only if you want them to. Memory is very special as a resource,
for a number of reasons.
Any service code that use RAII will encapsulate all resources
and clean up in destructors. This is the way RAII code gets
designed.

RAII doesn't handle all resources. It only handles those whose
lifetime can be easily mapped to a scope, so that the resource
can be freed automatically when that scope ends. This isn't
always, or even often, the case of memory. (If the memory maps
to a scope, you just declare it in that scope, and be done with
it.)
If the service code gets then used by a client code that use
GC, everything breaks since the client is essentially breaking
the contract.

Why? If the contract says that destructors must be called, then
the client code calls the destructor. I don't see where the
problem is there.

Could you give an example? (Just the contract, and perhaps
a short explination of why you think that garbage collection
would break it.)
Sorry, maybe I was not clear enough. I am talking from the
service code point of view. A class doesn't know how it will
be used so can't rely anymore on its destructor being run.

A class certainly should know something about how it will be
used. Something like complex knows that it won't be used to
manage a TCP connection, for example. I'm sure that that's not
what you meant, but I can't figure any other meaning to assign
to those words.

In an application, a class has a role and very specific
responsibilities. In order to perform correctly, it defines
a contract with the client code. For some classes (very few, in
my experience), that contract will require that the client code
"dispose" of them---in C++, we would say call the constructor;
in Java, the class will have a "dispose" function which must be
called, etc. In C++, we have one very big advantage over Java:
if the moment the "dispose" function must be called corresponds
to the moment the object goes out of scope, the compiler will
call the method (the destructor in C++) automatically. If it
doesn't, we still have to call it explicitly (delete operator).

All of this has nothing to do with garbage collection, and
remains unchanged in the presence of garbage collection. With
one big exception: without garbage collection, a lot of classes
which wouldn't otherwise require "dispose", because it's the
last chance they have to free memory. Many (most) of my
classes, for example, have the contract that you must call the
destructor *if* (and only if) there is no garbage collection.
Client/User code that creates an object from a class is
perfectly capable of knowing if the destructor will be run or
not since it is in control of deciding to use GC for this
object or use deterministic lifetime management (new or auto).
However, the service/library/class code cannot know and cannot
rely on the destructor always being run anymore. At that
point, any class that clean up in its destructor is broken and
should never be used by a GC client.

No.

What does change, I guess, is that the service must specify the
fact that it must be disposed/destructed as part of its
contract. But that's really the case today anyway, since the
requirement isn't normally just to be destructed "sometime", but
to be destructed in a timely manner (something like scoped_lock,
for example). Which means that some techniques of manual memory
management are excluded as well: you can't, for example, put the
object (or a pointer to the object) in a vector, and only clean
up when there are too many objects in the vector.
So what this mean:
If you write a class that will be used by other code, you must
either:
- Make sure it will never be used by GC client code and then you can
use RAII.
- Not use RAII at all in case it get used by GC client code.
- If you only use memory, you might be OK. Implement the class in a
RAII way but if the class get used by a GC client, hopefully the GC
will also cleanup the memory you used when the destructor does not get
run.

More specifically, you have to specify a contract, that the
client code has to respect. Nothing new there, and you have to
do that with or without garbage collection.
If you write code that use other code, you must know how it
was implemented internally so that you know if you can use GC
or if you can use RAII. To me, this breaks the concepts of
encapsulation and implementation hiding. Unless you *know*
that a class only uses memory as resources, you can't use it
in GC but I do not believe that the user should know how the
internal of the class are implemented.

I think you're getting hung up on "resources". Classes have
responsibilities and behavior. Some classes have a very
definite end of lifetime, with specific behavior. Client code
must ensure that this is respected. Garbage collection changes
nothing in all this.
I am sorry, maybe the domain we work on are different, maybe
its because for me it is very much normal to use resources
other than memory, maybe its because I have not seen a large
application that uses both RAII and GC at the same time but
what I see is added complexity due to the lack of certainity
and the need to know too much about the other side of the
interface.

Independently of the domain: if the software is well written,
all you need to know about the other side of the interface is
the contract it adhers to. Beyond that, I'm sure that there are
domains where many, or even most, objects do require timely
desposal/destruction, and their lifetimes regularly end at the
end of scope. Even in my applications, there are some:
typically, a Transaction will be allocated on the stack, for
example, and its destructor will effectuate a roll-back if
commit hasn't been called on it. You don't allocate
Transaction's dynamically, and if for some reason you have to
(say because it is to be shared between two asynchronous
threads), then you do use something like shared_ptr.

In typical data servers, and in GUI applications, such objects
are the exception (and having to allocate them dynamically,
rather than on the stack, is even more exceptional). At least
in my experience.
 
I

Ian Collins

Are you saying that a garbage collector solves the problem of memory
leaks and dangling pointers better than RAII? If so you are deluded.
RAII's determinism trumps garbage collector and doesn't suffer from the
problem of when to free resources other than memory.

There you go again backing up James's straw man comment.

GC does indeed handle one particular problem (closures) better than
anything else. If your code uses them, GC can supplement RAII, it does
not conflict with it.

No one, especially James, has claimed that GC solves the problem of
memory leaks and dangling pointers better than RAII. It is simply
another tool in the box.
 
M

Matthias Meixner

Am 02.11.2010 23:23, schrieb Paavo Helde:
This is only possible if all objects are handled by the garbage
collector. In a mixed mode system where some objects are handled by GC
and some are not the destructor must be called since the object might
contain pointers to some non-managed objects and these have to be
cleaned up in the destructor.
 
M

Matthias Meixner

Am 03.11.2010 17:25, schrieb Leigh Johnston:
When using both RAII and garbage collection in C++ having to classify
all your classes as either garbage collectable or not garbage
collectable is an unreasonable requirement and has nothing to do with
one's logical program design; it is more an annoying, unnecessary

It is not required to classify _classes_ to be collectable or not as my
garbage collector demonstrates. The decision whether an object is
collectable or not is taken on _allocation_ of the objects. Therefore,
some objects may be handled by the collector whereas other objects of
the _same_ _class_ may be handled manually.

This is the same situation that you have with reference counting: The
objects do not need to know anything about reference counting. This is
the same here just that mark and swipe is used instead of reference
counting.
 
B

Bart van Ingen Schenau

James Kanze  <[email protected]> wrote:

    [...]
I am afraid I don't see them as such.  I can agree that it is
possible for them to coexist but *completely* orthogonal would
require them to have no effect on one another whatsoever.

Which is largely the case.  RAII depends on object lifetime, by
giving objects specific behavior on their death.  GC has nothing
to do with object lifetime.

Perhaps the problem is that GC has been "oversold" by Java
fanatics.  Even in Java, you have to deal with lifetime of
object issues.

Another problem, however, is that too many C++ programmers tend
to insist that every object have a deterministic lifetime.  Many
don't, and if memory is managed automatically, most don't.
Without memory management, most destructors would be empty.
<snip>

I think that, at least in this discussion, the two sides are not
talking with compatible starting points.
Many C++ developers start with the viewpoint that lifetime management
== memory management.
From the post that I respond to, I get the impression that for you,
James, lifetime management and memory management are not necessarily
connected. And GC is only about memory management and can coexist with
various lifetime management strategies on the objects contained within
that memory.

I can see how RAII and GC can coexist with that viewpoint, because
RAII is primarily about lifetime management (with the added bonus that
it can do memory management as an afterthought) and GC is about memory
management.

My biggest fear for the widespread use of GC is that people get lazy
in their lifetime management of memory-only objects ("what is the big
deal about calling the destructor? The GC will reclaim the memory
anyway."), which breaks horribly if a change to a sub-object (of a sub-
object of a sub-object) makes it an object that also uses non-memory
resources and makes the object require proper lifetime management.
This can have far-reaching ripple effects, in that the entire codebase
may have to be reviewed to ensure lifetime management is properly
applied in all cases where this sub-object is (directly or indirectly)
involved.

Bart v Ingen Schenau
 
J

James Kanze

On Nov 3, 4:56 pm, James Kanze <[email protected]> wrote:

[...]
I think that, at least in this discussion, the two sides are not
talking with compatible starting points.
Many C++ developers start with the viewpoint that lifetime management
== memory management.
From the post that I respond to, I get the impression that for you,
James, lifetime management and memory management are not necessarily
connected. And GC is only about memory management and can coexist with
various lifetime management strategies on the objects contained within
that memory.

Exactly. That is my point. Memory management is distinct from
lifetime, in an absolute sense: the memory underlying an object
can continue to exist, and even be accessible as raw memory,
after the lifetime of an object has finished. The C++ standard
makes this distinction, but far to many programmers confound it.
I can see how RAII and GC can coexist with that viewpoint, because
RAII is primarily about lifetime management (with the added bonus that
it can do memory management as an afterthought) and GC is about memory
management.
My biggest fear for the widespread use of GC is that people get lazy
in their lifetime management of memory-only objects ("what is the big
deal about calling the destructor?

That is a real problem. It happens a lot in Java, for example.
But in my experience, such people are also careless about their
lifetime management without garbage collection.
The GC will reclaim the memory anyway."), which breaks
horribly if a change to a sub-object (of a sub- object of
a sub-object) makes it an object that also uses non-memory
resources and makes the object require proper lifetime
management.

This is a common objection, but I've yet to find a realistic
situation where it might occur. You can't just replace one
object type with another in code, without considering the
contract of the object in question.
This can have far-reaching ripple effects, in that the entire codebase
may have to be reviewed to ensure lifetime management is properly
applied in all cases where this sub-object is (directly or indirectly)
involved.

Some review would possibly be in order (although the situation
is not nearly as bad as when exceptions where introduced). In
fact, I don't recommend introducing garbage in working projects;
it's more something you should adopt when starting new projects.
 
M

Miles Bader

James Kanze said:
This is a common objection, but I've yet to find a realistic
situation where it might occur. You can't just replace one
object type with another in code, without considering the
contract of the object in question.

I've only been following this thread occasionally[*], but the whole
"don't call the constructor" position seems kind of absurd; even if you
only care about memory, surely non-gc-allocated memory resources are
common in C++, especially given an allocator like this that seems to
offer free mixing of allocation styles, and those will only be freed
properly if you run the constructor.

E.g., you use your "GC" allocator to allocate instances of X (cause in
your proggie it's annoying to track them), but X happens to use
std::vector for one of its fields...

-Miles


[*] 'cause it seems to have resulted in even more fanboy wanking than is
usual for c.l.c++ (I'd not have thought that's possible, but ...)
 
J

James Kanze

[...]
I've only been following this thread occasionally[*], but the whole
"don't call the constructor" position seems kind of absurd; even if you
only care about memory, surely non-gc-allocated memory resources are
common in C++,

Are they? In the programs I've worked on, they've been rather
rare, and in all cases, managed by special classes, which are
used in clearly delimited circumstances.
especially given an allocator like this that seems to offer
free mixing of allocation styles, and those will only be freed
properly if you run the constructor.

(I think by "constructor", you actually mean the destructor.
Constructors are always run.)
E.g., you use your "GC" allocator to allocate instances of X
(cause in your proggie it's annoying to track them), but X
happens to use std::vector for one of its fields...

I would not recommend a system where only some of the memory is
managed by the garbage collector. And the only resource
std::vector uses is memory, so there should be no problem
collecting objects which contain std::vector.
 
M

Miles Bader

James Kanze said:
I've only been following this thread occasionally[*], but the whole
"don't call the constructor" position seems kind of absurd; even if you
only care about memory, surely non-gc-allocated memory resources are
common in C++,

Are they? In the programs I've worked on, they've been rather
rare, and in all cases, managed by special classes, which are
used in clearly delimited circumstances.

Er, they're the _standard_ in normal C++... you know, what, "new Foo"
does... (what are you thinking of?)
(I think by "constructor", you actually mean the destructor.
Constructors are always run.)

Yes, thanks, I meant "destructor."
I would not recommend a system where only some of the memory is
managed by the garbage collector.

Why? Providing you run destructors from the GC, the additional memory
will be freed in either case. The GC won't have as much information as
a traditional all-or-nothing GC (it won't know about "secondary" outside
allocation), but presumably the user (of the GC) knows about this, and
the GC parameters reflect it.

[Anyway, "I would not recommend" is not an argument, of course...]
And the only resource
std::vector uses is memory, so there should be no problem
collecting objects which contain std::vector.

Yeah, but the point is that std::vector does its own allocations, which
don't automatically use the same allocator as the enclosing object.

So in order to avoid intrusive source modifications, it's convenient if
secondary allocations are handled transparently -- which they are, of
course, as long as you run the destructor...

-Miles
 
K

Keith H Duggar

I have read plenty of such discussions.

Great, you are aware of (at least some) of the issues. Then one
must wonder why you would pose such a naive question as "What is
so bad about [zombie states]?". Since clearly there are many bad
things about them and the issues involved are complex. Perhaps
that is why noone has yet proposed a fully consistent model for
combining C++ style deterministic destruction with GC.
1) No one argues with the fact that GC adds well-defined and
detectable corpse state for detecting dangling raw pointer
dereference.

And nobody argues that such zombie states introduce considerable
complexity into the language model /especially/ if, as you say
above, it is a "detectable" at the language level.
2) I do not argue that it is far from what is GC in C# or Java.
Call it something else I don't care.

And where is the implementation of your imaginary "call it something
else" system? Where is the specification? The language model?
10) People who are making money by consulting and training C++ (like
Dave Abrahams) certainly love that C++ is such a pile of gotchas and
undefined behaviors from top to bottom. People who write software have
not entirely overlapping goals with them.

Fortunately we don't need to rely on bogus ad hominem arguments
such as above. Their /logic/ and reasoning are widely available
for anyone to /logically/ dispute.

KHD

PS. For those out there who ignorantly think ad hominem == insult,
Oo's point 10) above is a perfect example of an /actual/ ad hominem
argument. Personal attacks, insults, cursing, etc are usually /not/
ad hominem. They are usually just crap.

Rather, ad hominem is arguing that one should not believe the
arguments of a person because of personal qualities. In the example
above Oo argues one should dismiss Dave Abrahams (a highly logical,
intelligent man who expresses his points with exceptional clarity)
because he is a C++ consultant ... lmao.

Please learn this concept. It's just so dumb when somebody jumps
on any personal attack as "ad hominem". Total ignorance of logic.
 
Ö

Öö Tiib

I have read plenty of such discussions.

Great, you are aware of (at least some) of the issues. Then one
must wonder why you would pose such a naive question as "What is
so bad about [zombie states]?". Since clearly there are many bad
things about them and the issues involved are complex. Perhaps
that is why noone has yet proposed a fully consistent model for
combining C++ style deterministic destruction with GC.

Perhaps you took my question out of context and so classified me
extreme naive. What i meant was that "destroyed" state is better than
"destroyed and reused for something else by underlying memory
management" state. So ... if GC guarantees no reuse for memory under
destroyed but still pointed at objects then i do not see why it is
bad?
And nobody argues that such zombie states introduce considerable
complexity into the language model /especially/ if, as you say
above, it is a "detectable" at the language level.

Complexity around these zombies is far worse currently. Reading
destroyed objects goes without any way to detect it on lot of
implementations. Writing into not yet reused memory can be detected,
but with special debugging options. Writing into already reused memory
is again totally undetectable with most implementations. If language
for example guarantees anything there about dangling pointer (thanks
to GC) then how it can be worse or more complex?
And where is the implementation of your imaginary "call it something
else" system? Where is the specification? The language model?

I understand your question like "How can language engine help?" If i
misunderstand please clarify. I have tried that Boehm garbage
collector and it sort of ... worked well. It was like in the GC
languages, only that it did not have finalizers. I had to destroy
things to get the destructors called, but that GC did not forbid
explicit destruction and deallocation so normal C++ worked OK. The
finalizers are something i actually do not find desirable. The issues
with what the language could help that i saw:

1) Garbage collection scales well (better than without in some tests).
For gaining even better performance from using GC the code has to be
tuned/optimized for garbage collection. Implementations can help
there.

2) For gaining additional security with pointers, some manual work has
to be done (these can be implemented as std templates, language
elements and features of GC).

3) Garbage collector can naturally detect leaks. On most cases quite
soon after these did happen. So it is powerful tool to detect leaks of
non-memory resources. The way to mark objects as holders of such
resources is missing from language.
Fortunately we don't need to rely on bogus ad hominem arguments
such as above. Their /logic/ and reasoning are widely available
for anyone to /logically/ dispute.

Yes, you are correct. I love C++ as flexible, elegant and powerful
language, but lot of things in it are clearly too undefined or freaky
(bad). Experienced know their ways there, less experienced can do hard-
to-detect damage and it is hard to teach how not to on cases. Dave
Abrahams is good specialist, intelligent and well respected, but for
some reason he wants that one of the worst parts will stay bad. So my
ad hominem argument was trying to give a reason why he is so negative
there.

Bjarne Stroustrup has always been positive when discussing garbage
collection. I read from somewhere few years ago that he does hope to
see (at least optional) garbage collection as part of C++0x. While
original architect of anything can be wrong at some spot, this is not
one of such cases IMHO.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,824
Members
47,369
Latest member
FTMZ

Latest Threads

Top