Garbage collection in C++

J

James Kanze

"James Kanze" <[email protected]> wrote in message
[...]
That is, of course, the most obvious bullshit I've ever
seen. There are cases where garbage collection isn't
appropriate, and cases where it is necessary.
Humm.. Can you give an example or two of a scenario in which a
GC is absolutely required/necessary?

Off hand, no, but I think Hans Boehm had one.
For some reason, I can't seem to think of one off the top of
my head. What am I missing? Perhaps I misunderstand your use
of the word "necessary"...

Sort of. The statement wasn't meant to be taken as giving any
absolute limits, but rather simply to indicate that there is a
great range. (Of course, if you have limited resources to
develop the code, and garbage collection means you need to write
less lines of code---which it usually does---you could argue
that it is necessary to develop the code within budget.)
I have seen a couple of cases in which a GC covered up more
than a "simple memory leak". Simple example... What if the
memory leak was entangled in a logic error that sometimes
caused weird execution results. If the programmer was alerted
to the memory leak, perhaps he/she could have noticed the
logic error as well. Of course, this is highly contrived case.
However, it can occur.

Not with any of the organizations I've worked with. In the ones
which took quality seriously, code review and tests would have
detected the error. In the other ones, the leak likely would
have gone undetected.

Dangling pointers are a different issue. Garbage collection
allows reliable detection of dangling pointers to dynamically
allocated memory (but not to local variables); without garbage
collection, you can't detect the case reliably. And dangling
pointers are a known security risk for programs connecting to an
open network.
One bad thing about GC is that it sometimes pulls programmers
into a false sense of security,

Garbage collection is just a bunch of code. It doesn't "pull"
programmers in any way. Anthropomorphic analogies really don't
apply; code is not a living being.
and they end up creating a sloppy design. This is NOT the
fault of the GC, but of the lazy programmer.

Or the incompetent one. Or the mismanaged one. But programmers
don't need garbage collection to be lazy, incompetent or
mismanaged.
One can create efficiently engineered programs _and_ use a GC
at the same time.

Certainly. And that's why it should be an available option, to
be used as appropriate.
 
S

Stefan Ram

James Kanze said:
won't give you a saw, because you might cut yourself. C++ gives
you a chain saw, a rip saw, a band saw, and a couple of other

This reminds me of:

»Like Perl, C++ is a swiss army chainsaw of a programming
language. Unlike Perl, it's got all the blades
simultaneously and permanently cast in a fixed half-open
position. Don't turn it on.«

http://fare.livejournal.com/135078.html

My usual set of quotations regarding garbage collection
might be known to some readers:

»There were two versions of it, one in Lisp and one in
C++. The display subsystem of the Lisp version was faster.
There were various reasons, but an important one was GC:
the C++ code copied a lot of buffers because they got
passed around in fairly complex ways, so it could be quite
difficult to know when one could be deallocated. To avoid
that problem, the C++ programmers just copied. The Lisp
was GCed, so the Lisp programmers never had to worry about
it; they just passed the buffers around, which reduced
both memory use and CPU cycles spent copying.«

<[email protected]>

»A lot of us thought in the 1990s that the big battle would
be between procedural and object oriented programming, and
we thought that object oriented programming would provide
a big boost in programmer productivity. I thought that,
too. Some people still think that. It turns out we were
wrong. Object oriented programming is handy dandy, but
it's not really the productivity booster that was
promised. The real significant productivity advance we've
had in programming has been from languages which manage
memory for you automatically.«

http://www.joelonsoftware.com/articles/APIWar.html

»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations. The common code path
for new Object() in HotSpot 1.4.2 and later is
approximately 10 machine instructions (data provided by
Sun; see Resources), whereas the best performing malloc
implementations in C require on average between 60 and 100
instructions per call (Detlefs, et. al.; see Resources).
And allocation performance is not a trivial component of
overall performance -- benchmarks show that many
real-world C and C++ programs, such as Perl and
Ghostscript, spend 20 to 30 percent of their total
execution time in malloc and free -- far more than the
allocation and garbage collection overhead of a healthy
Java application (Zorn; see Resources).«

http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html?ca=dgr-jw22JavaUrbanLegends

»Perhaps the most important realisation I had while developing
this critique is that high level languages are more important
to programming than object-orientation. That is, languages
which have the attribute that they remove the burden of
bookkeeping from the programmer to enhance maintainability and
flexibility are more significant than languages which just
add object-oriented features. While C++ adds object-orientation
to C, it fails in the more important attribute of being high
level. This greatly diminishes any benefits of the
object-oriented paradigm.«

http://burks.brighton.ac.uk/burks/pcinfo/progdocs/cppcrit/index005.htm
 
J

James Kanze

Attention! In practice the above is FALSE!

Apparently, then, you don't even know what garbage collection
does. Garbage collection manages memory. It does NOT manage
object lifetime. The two are different things (and the C++
standard very decisively keeps them separate).
If it were correct there would not have been the rather
lengthy discussion recently in comp.lang.c++.moderated among a
few of the world's foremost C++ experts regarding "zombie"
states etc.

(Don't be fooled by the topic. It was a Trojan Horse bearing
the gift of GC. For example see the subthread starting with
post 63. If the link fails search for "throwing default
constructors".)
When one tries to divorce, in practical terms, the concepts of
object "lifetime" and object "storage" the can opens and
spills worms all over your language model. Since you did not
post in that recent thread I'm not sure what your solution for
"destroyed but not deallocated"
is. If you have one please post it so that Andrei and the like
can employ (or at least research) it.

It's not a recent topic, and Andrei and I have discussed it in
newsgroups in the past, and unless his position has radically
changed, we more or less agree.

But I'm not sure what your question is. It's obvious that in
this case, garbage collection is necessary to protect against
and detect using dangling pointers. The idiom works because
with garbage collection, memory management is decoupled from
object lifetime; because even though the object has ceased to
exist, the memory behind it cannot be reused as long as anyone
has a pointer to it.
 
Z

zr

Attention! Garbage collection does NOT manage the lifetime of
objects.  It only manages memory.  It's generally useful in
multithreaded environments for performance reasons, but it
doesn't address issues as to which thread has the right to
access which objects when.  That problem is completely
orthogonal to how you manage memory (unless you're writing the
memory management code yourself, of course).
How can GC improve the performance of a multithreaded application? Can
you show a convincing example?
 
M

Matthias Buelow

Keith said:
Yes. However, garbage collection is /only/ going to reclaim
memory, eventually. It's not going to correct the logical and
potentially far more serious design bugs that leaked memory
in the first place.

Yes, GC does not correct any bugs, you're right about that ;). However,
there are many situations where having to do manual deallocation is just
additional work without any benefit. For example, if you're doing a lot
of tree handling, building trees, throwing them away again etc., you
don't really want to traverse the tree to free all the elements if you
throw it away. Much more convenient to simply "forget" the pointer to
the tree and let automatic memory management wipe it up.
(This is a bit relativized if you have a situation where you need to
have destructors run in tree nodes; in general, the concepts of
destructors and GC don't really work well together, and imho,
destructors are rather an ugly add-hoc hack that stems from the lack of
automatic memory management and because of that unfortunately has firmly
established itself in C++...)
 
M

Matthias Buelow

Juha said:
By this I mean that since there's no need for
objects to have (memory handling) responsibilities, it easily leads to
the programmer not creating such objects at all,

Isn't that a good thing? The best code is the one you don't have to
write. Down with bureaucracy. Creating objects just to keep track of
memory dependencies is... inane?
which in turn leads to
a more imperative style of programming, rather than a more modular style.

Imperative programming and modular programming are orthogonal concepts.
 
M

Matthias Buelow

zr said:
How can GC improve the performance of a multithreaded application? Can
you show a convincing example?

Many of today's programs (OO or not) have a high
allocation(/deallocation) rate. This means that most objects become
garbage quickly. With many GCs, their collection time is a function of
the amount of live data, that is, they don't have to traverse the
garbage at all. This means in total, in such a situation, there's less
work to do than in a manual deallocation scheme, where all the garbage
is being traversed aswell.
I don't know if multithreading is any factor here; with manual schemes,
you have to do synchronization, too.
 
Z

zr

Many of today's programs (OO or not) have a high
allocation(/deallocation) rate. This means that most objects become
garbage quickly. With many GCs, their collection time is a function of
the amount of live data, that is, they don't have to traverse the
garbage at all. This means in total, in such a situation, there's less
work to do than in a manual deallocation scheme, where all the garbage
is being traversed aswell.
I don't know if multithreading is any factor here; with manual schemes,
you have to do synchronization, too.

But even if in these (rare/common) cases where GC can improve
performance over manual deallocation, isn't there a better solution
which would give better performance than GC? Is performance a real
consideration for adding GC to the language?
 
M

Matthias Buelow

zr said:
But even if in these (rare/common) cases where GC can improve
performance over manual deallocation, isn't there a better solution
which would give better performance than GC? Is performance a real
consideration for adding GC to the language?

No, convenience is. If it increases performance, all the better.
 
A

anon

Chris said:
Humm.. Can you give an example or two of a scenario in which a GC is
absolutely required/necessary? For some reason, I can't seem to think of
one off the top of my head. What am I missing? Perhaps I misunderstand
your use of the word "necessary"...

1) without gc

class A
{
};

A* foo()
{
A *a = NULL;
A *b = NULL;

a = new A;
try
{
b = new A;
}
catch(...)
{
delete(a);
}

// do something

return a;
}

2) with GC
class A
{
};

A* foo()
{
std::auto_ptr< A > a( new A );
std::auto_ptr< A > b( new A );

// do something

return a.release();
}


Would this do?


Another example:

class X
{
public:
X()
{
throw 1111;
}
};

class A
{
public:
A() : a( new int ), b( new X )
{
}
// memory leak when b throws

int *a;
X *b;
};

vs

class A
{
public:
A() : a( new int ), b( new X )
{
}
// OK

std::auto_ptr< int > a;
std::auto_ptr< X > b;
};
 
C

Chris M. Thomasson

Matthias Buelow said:
Yes, GC does not correct any bugs, you're right about that ;). However,
there are many situations where having to do manual deallocation is just
additional work without any benefit. For example, if you're doing a lot
of tree handling, building trees, throwing them away again etc., you
don't really want to traverse the tree to free all the elements if you
throw it away. Much more convenient to simply "forget" the pointer to
the tree and let automatic memory management wipe it up.

I would build a node cache for the trees; why free the memory when you can
reuse some of it? You can do this in a GC environment, but it requires you
to traverse a "portion" of the tree.

Also, why not use a __partitioned__ region allocation algorithm for the
nodes in different trees? That way, a single call to `std::free()' will
promptly destroy all the memory which makes up a given trees nodes in a
single operation. AFAICT, that's more efficient than a GC. Region allocation
is not complicated and drastically amortizes the number of times you need to
call free.

Generally, I find that "clever" __partitioned__ region allocation can give
you the best of both worlds... You get deterministic manual memory
management, and the calls to free get heavily amortized.



(This is a bit relativized if you have a situation where you need to
have destructors run in tree nodes; in general, the concepts of
destructors and GC don't really work well together, and imho,
destructors are rather an ugly add-hoc hack that stems from the lack of
automatic memory management and because of that unfortunately has firmly
established itself in C++...)

I think dtors are very convenient...
 
M

Matthias Buelow

Chris said:
Generally, I find that "clever" __partitioned__ region allocation can
give you the best of both worlds...

You seem to completely fail to understand the point I was trying to
make: Using GC in my example is not to speed up the program, or
whatever, but to avoid even having to think abut this. "Clever"
partition schemes or whatever need thinking effort, that could be
directed at more interesting parts of the program than memory management.
 
J

James Kanze

Yes, GC does not correct any bugs, you're right about that ;).
However, there are many situations where having to do manual
deallocation is just additional work without any benefit.
Exactly.

For example, if you're doing a lot of tree handling, building
trees, throwing them away again etc., you don't really want to
traverse the tree to free all the elements if you throw it
away. Much more convenient to simply "forget" the pointer to
the tree and let automatic memory management wipe it up.
(This is a bit relativized if you have a situation where you
need to have destructors run in tree nodes; in general, the
concepts of destructors and GC don't really work well
together, and imho, destructors are rather an ugly add-hoc
hack that stems from the lack of automatic memory management
and because of that unfortunately has firmly established
itself in C++...)

Yes and no. Destructors and garbage collection are in some ways
orthogonal; destructors are exceedingly useful for local
variables, where they effectively replace finally blocks, only
with the provision that the client can't forget them when
needed. They are much less useful elsewhere, except for
substituting for garbage collection.
 
J

James Kanze

I would build a node cache for the trees;

In other words, extra work. That's exactly what Matthias was
arguing against.
why free the memory when you can reuse some of it?

Why do extra work when you don't have to?
You can do this in a GC environment, but it requires you to
traverse a "portion" of the tree.
Also, why not use a __partitioned__ region allocation
algorithm for the nodes in different trees? That way, a single
call to `std::free()' will promptly destroy all the memory
which makes up a given trees nodes in a single operation.
AFAICT, that's more efficient than a GC.

For the programmer? Since when is writing additional code more
efficient.
 
C

Chris M. Thomasson

Matthias Buelow said:
You seem to completely fail to understand the point I was trying to
make: Using GC in my example is not to speed up the program, or
whatever, but to avoid even having to think abut this. "Clever"
partition schemes or whatever need thinking effort,

Does GC promote a nanny state? Just kidding! Anyway, partitioned region
allocation schemes can definitely help reduce your "thinking effort". For
instance, take this very simple C pseudo-code into account:




static struct region_allocator g_ralloc = REGION_STATIC_INIT;
static struct tree g_tree1 = TREE_STATIC_INIT;
static struct tree g_tree2 = TREE_STATIC_INIT;
static struct tree g_tree3 = TREE_STATIC_INIT;
static struct tree g_tree4 = TREE_STATIC_INIT;


int main(void) {
unsigned i;
struct region_partition* rp1 = region_allocate(&g_ralloc, 4096);
struct region_partition* rp2 = region_allocate(&g_ralloc, 4096);

for (i = 0; i < 1234; ++i) {
struct node* node1 = region_partition_allocate(rp1, sizeof(*node));
struct node* node2 = region_partition_allocate(rp1, sizeof(*node));
struct node* node3 = region_partition_allocate(rp2, sizeof(*node));
struct node* node4 = region_partition_allocate(rp2, sizeof(*node));
tree_push(&g_tree1, node1);
tree_push(&g_tree2, node2);
tree_push(&g_tree3, node3);
tree_push(&g_tree4, node4);
}

region_flush(&g_ralloc);

return 0;
};




No need to explicitly traverse all of the trees. Instead, one single call to
`region_flush()' results in the automatic freeing of all its contained
partitions. You could also free individual partitions. See how it eliminates
the need to free individual nodes? Also, more than one tree can share a
single partition. In the example, trees 1-2 share partition 1, and trees 3-4
share partition 2. IMVHO, its a fairly flexible allocation scheme indeed.



that could be
directed at more interesting parts of the program than memory management.

IMVHO, efficient memory management is an interesting part of any program...
Humm...
 
C

Chris M. Thomasson

In other words, extra work. That's exactly what Matthias was
arguing against.
Why do extra work when you don't have to?

A node cache reduces the number of calls to `new'; IMHO, this is a good
thing.



For the programmer?

For the program.



Since when is writing additional code more efficient.

Its worth it that extra work increases the scalability, throughput and
performance of a program.
 
C

Chris M. Thomasson

Matthias Buelow said:
Many of today's programs (OO or not) have a high
allocation(/deallocation) rate. This means that most objects become
garbage quickly. With many GCs, their collection time is a function of
the amount of live data, that is, they don't have to traverse the
garbage at all. This means in total, in such a situation, there's less
work to do than in a manual deallocation scheme, where all the garbage
is being traversed aswell.
I don't know if multithreading is any factor here; with manual schemes,
you have to do synchronization, too.

There are multi-threaded allocation algorihtms that do not need any
synchronization whatsoever on thread local allocations/deallocations from a
region that is not empty. Remote deallocations can be implemented wait-free
with a single atomic fetch-and-add, or lock-free with a simple
compare-and-swap loop. Here is a very crude and quick example of one:

http://webpages.charter.net/appcore/vzoom/malloc/sergey_vzmem_thread.html
 
M

Matthias Buelow

Chris said:
There are multi-threaded allocation algorihtms that do not need any
synchronization whatsoever on thread local allocations/deallocations
from a region that is not empty.

Very good; I would think that this can (and is being) implemented for
automatic memory management, too. In the trivial case (no inter-thread
dependencies), it is obvious, and in other cases, you still have the
same synchronization problem with manual management, only it might've
moved from the alloc/dealloc routines into the application logic.
 
J

Juha Nieminen

Chris said:
Closing a file handle in a dtor can be _very_ problematic indeed; read
here:

Not more problematic than leaving the file handle open. (Besides, the
file handle was just an *example*.)
 
J

Juha Nieminen

James said:
destructors are exceedingly useful for local
variables, where they effectively replace finally blocks, only
with the provision that the client can't forget them when
needed. They are much less useful elsewhere, except for
substituting for garbage collection.

Memory is not the only resource which needs managing. (And even in
some cases what you are managing *is* memory, but not memory which a GC
can collect. Eg. the "memory" being managed might be, for example, an
mmapped file or a bitmap in the display hardware.)

Also sometimes even when managing pure memory, you might still want to
do something special before deallocating. (Weak pointers are a very good
example.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,901
Members
47,439
Latest member
elif2sghost

Latest Threads

Top