New C++ garbage collector

B

Bo Persson

Yannick said:
On 26/10/2010 18:58, Öö Tiib wrote:
[...]
I replied else-thread that certain resources (of which memory can
be considered one) may be suitable for non-deterministic
releasing. C++ and RAII is superiour to C++ and garbage
collection; this is my opinion of course and (hopefully) the
opinion of others here too.

And C++ with RAII *and* garbage collection is superior to
either. Why limit your options? (If all you have is a hammer,
everything looks like a nail. It's best to have many different
tools in your toolbox, so you can use the most appropriate.)

The problem is that garbage collection essentially breaks RAII.

With RAII, all resources are treated the same regardless of if they
are memory or file handle or sockets or whatever. Designing with
RAII, your destructor will take care of closing files or freeing
memory, everything works fine. RAII is absolutely reliable. As
long
as all resources are owned by objects and all objects are destroyed,
no resources will ever be leaked.

Right. And if the only resources you ever use is memory, GC will take
care of that.



Bo Persson
 
D

Dombo

Op 28-Oct-10 17:13, Yannick Tremblay schreef:
The result, IMO, is that the great "benefit" of GC which is to reduce
programming complexity by not having to worry about memory management
is totally trumped by the newly introduced complexity which is having
to implement finalise methods and having to worry about leaking
non-memory-only resources due to the destructor not being called.

Actually implementing the finalize() method in languages like Java and
C# is usually not recommended since you cannot rely on when, if ever, it
will be called. If you need something to be done no matter how a
function is left one would typically use a finally clause in these
languages. Unfortunately the finally clause puts the burden on the user
rather than on the provider of the functionality. The C# language does
support some features which partially address the lack of RAII support.

Having programmed C++, Java and C# (and a load of other programming
languages), and having most of my work done in C++ I must say that GC
does have its merits (not just for sloppy programmers) though I do miss
RAII at times in languages that do not support it. However different
programming language call for different approaches.
 
J

Juha Nieminen

Paavo Helde said:
Also, in some cases it is handy to know the actual reference count of the
object (e.g. when deciding if a separate copy of the object is needed or
not). This can be done easily with a reference-counting smartpointer, but
(I guess) not with GC.

Yeah, if you want to use a technique like copy-on-write (which can be
pretty useful in some situations), you definitely need reference counts
(or, to be more exact, you need to know if the object/data is currently
being shared or if the ownership is exclusive; I don't know if there is
any efficient way of knowing that other than reference-counting.)

However, although most (all?) GC'd languages don't have strong mechanisms
for implementing reference-counting of objects (mainly because they don't
use RAII), that doesn't mean that, at least in principle, being able to
implement reference counting of objects/data would be mutually exclusive
with automatic garbage collection. You simply have to keep the two things
(ie. RAII and GC) sufficiently separate so that you can choose when to
have deterministic constructor/copy/assignment/destructor calls, and when
it can be up to the GC engine to take care of it.
Also, in some convoluted programming style (e.g. Numeric Recipes) the
actually used pointers do not point to the allocated blocks, but before
or after, which may confuse GC.

Certainly if you use extremely low-level trickery with pointers, such
as implementing a xor-list, it will confuse a GC engine. (A xor-list is
like a regular doubly-linked list, but the next and prev pointers are
merged into one single pointer-sized variable using bit-xor. This means
that only one pointer-sized variable is needed per element rather than
two. The trick to make this work is that the xor-list iterator contains
two pointers, one to the current node and another to the next node. The
traversal back and forth in the list can then be done with xor operations
of these pointers and the xorred variable in the nodes.)
 
M

Matthias Meixner

Am 28.10.2010 18:36, schrieb Paavo Helde:
Also, in some convoluted programming style (e.g. Numeric Recipes) the
actually used pointers do not point to the allocated blocks, but before
or after, which may confuse GC.

That's why my garbage collector keeps a pointer to the allocated object
which is independent from the pointer value.
 
B

Bart van Ingen Schenau

Am 28.10.2010 18:36, schrieb Paavo Helde:


That's why my garbage collector keeps a pointer to the allocated object
which is independent from the pointer value.

Even that would be foiled by the coding style used by Numerical
Recipes.
In that book, they allocate, for example, an array of 100 elements
with the indices [10, 109]. To do this, they ask malloc() for an array
of 100 elements and then adjust the returned pointer to point 10
elements *before* the start of the block returned by malloc.
So, if malloc returned address 20 as the start of that block, the GC
engine will later see a pointer referring to address 10 (which may or
may not have been allocated) and no pointers in the application
referring to the memory area [20..119].

Bart v Ingen Schenau
 
J

Joshua Maurice

That's why my garbage collector keeps a pointer to the allocated object
which is independent from the pointer value.

Even that would be foiled by the coding style used by Numerical
Recipes.
In that book, they allocate, for example, an array of 100 elements
with the indices [10, 109]. To do this, they ask malloc() for an array
of 100 elements and then adjust the returned pointer to point 10
elements *before* the start of the block returned by malloc.
So, if malloc returned address 20 as the start of that block, the GC
engine will later see a pointer referring to address 10 (which may or
may not have been allocated) and no pointers in the application
referring to the memory area [20..119].

Just as a technical point, isn't that nonconforming code anyway? I
recall that there are rather strict rules about pointer math, and it
might be the case that pointer math-ing a pointer to point to anything
besides an element in the array or one-past-the-end is undefined
behavior. I can't recall the specifics though.

Finally, that undefined behavior is probably too widespread to ignore
for a reliable garbage collector, so this is rather pedantic.
 
J

Jorgen Grahn

I have nothing to add, but G-JdV had messed up the formatting of his
otherwise interesting posting. I've fixed it below:
C++ always had deterministic destruction. Existing C++ code and C++
idioms rely on this. If you call destructors when finalizing an object
from a GC, it will break C++ code that was not written with GC in
mind.

I have used MS's C++/CLI a bit which tries to provide full C++ in a
garbage collected environment. It separates the standard destructor
and the finalizer of an object. The run time makes sure that only one
of these gets called. The destructor get called as normal, when an
automatic object reaches end of scope or when a new-ed object is
deleted. The finalizer is called when the GC collects the object if it
was not destructed before. The context in which the finalizer runs is
entirely different from that of the destructor: you cannot use members
or base-objects. The fact that it also runs non-deterministicly gave
us a hard time trying to reproduce and debug problems in finalizer
code.

My current conclusion is that all objects that need deterministic life
time management (because they own resources with side-effects, other
than plain memory), must be implemented as non-GC objects. Then add a
trivial GC-ed wrapper object to expose the object to the GC world.
Even objects that need large amounts of memory (we work with 64 MB
datasets in a 32 bit OS), are to large to rely on a GC only.

In the end I have to agree with Leigh: C++ style deterministic life
time management does not mix well with a garbage collector. Still, a
unique advantage of a GC is that a dangling pointer will never refer
to memory other than the object for which it was allocated.

/Jorgen
 
J

James Kanze

[snip] In
other cases, shared_ptr doesn't work, and garbage collection
does. And there are cases where shared_ptr works, and garbage
collection doesn't.
Can you give examples of situations where one works but the
other not?

Garbage collection works even when there are cycles; shared_ptr
doesn't. Garbage collection also allows protecting against
dangling pointers.

The most obvious case where shared_ptr works, and garbage
collection doesn't, is when it does something different---you're
using a custom deleters, which don't delete. Otherwise, there
are a few cases where the object represents some critical
resource, and the destructor must be called immediately: most of
the time, some other type of smart pointer (auto_ptr or
scoped_ptr) is more appropriate in such cases, but in some rare
cases, shared_ptr is simpler---typically when for some reason
you need to put the object into a container.
 
J

James Kanze

On 27/10/2010 12:17, James Kanze wrote:

[...]
Objects with arbitrary lifetimes can be controlled with
a smart pointer, no need for garbage collection.

That's a blatent contradiction: an object with an arbitrary
lifetime has an arbitrary lifetime, not one controlled by some
smart pointer. In many applications, almost all instances of
delete are "delete this". (And as far as I know, "this" is
never a smart poionter.)

Garbage collection is useful, here, not to manage object
lifetime (garbage collection doesn't manage object lifetime),
but as a safety net, in order to be sure of being able to catch
dangling pointers.
shared_ptr employs RAII when taking ownership of an object via
its constructor.

Which is exactly why it cannot be used with objects which have
arbitrary lifetimes.
An object can encapsulate other sub-objects also utilizing
RAII.

An object can encapsulate other sub-objects, period. I'm not
sure I'd refer to that as RAII.
 
J

James Kanze


[...]
Smart pointers are more deterministic than GC,

I like that formulation:). Formally, you can't say "more
deterministic", something is either deterministic or not. (And
in another sense, everything that happens in a computer, at
least in a single thread, is deterministic.) But less formally,
I think you've made a reasonable point. In some specific cases,
it's possible to know more or less when the last pointer will
disappear, and it may be useful to be assured that the object
will be destructed exactly at that moment. (One obvious case is
when you are dealing with a variable number of such objects, all
of which must be destructed on leaving scope. If the objects
aren't copiable---and objects which manage resources normally
aren't---, then you need to use shared_ptr to manage them in
a standard container.)
so they work better if one needs deterministic cleanup of some
resource. This holds even for memory, if the memory chunks are
large and the available memory or address space is limited.
(AFAIK at least some GC implementations can accidentally keep
a memory block in use even if there are no pointers to it.)

I don't think that's the case, but it is easy to accidentally
keep a pointer (which will never be used) to a large block, or
a large graph of blocks.
Also, in some cases it is handy to know the actual reference
count of the object (e.g. when deciding if a separate copy of
the object is needed or not). This can be done easily with
a reference-counting smartpointer, but (I guess) not with GC.

It can be done with garbage collection as well; IIRC, some
implementations of java.lang.StringBuffer used reference
counting (in order to implement copy on write). But if you're
doing reference counting, you control all pointers to the
object, and cyclic references aren't possible (typically because
the object doesn't contain any pointers, e.g. a character
buffer), then garbage collection is rather superfluous.
Also, in some convoluted programming style (e.g. Numeric
Recipes) the actually used pointers do not point to the
allocated blocks, but before or after, which may confuse GC.

Such uses are undefined behavior in C or C++. On the other
hand, they can represent valid optimization techniques for
compilers, at least on certain machines; this is why garbage
collection must be part of the standard. (In practice, such
techniques are not advantageous on most modern architectures, so
compilers don't use them, and the Boehm collector works.)
 
J

James Kanze

On 26/10/2010 18:58, Tiib wrote:
[...]
I replied else-thread that certain resources (of which memory can be
considered one) may be suitable for non-deterministic releasing. C++
and RAII is superiour to C++ and garbage collection; this is my opinion
of course and (hopefully) the opinion of others here too.
And C++ with RAII *and* garbage collection is superior to
either. Why limit your options? (If all you have is a hammer,
everything looks like a nail. It's best to have many different
tools in your toolbox, so you can use the most appropriate.)
The problem is that garbage collection essentially breaks RAII.

That is completely false; the two are completely orthogonal.
With RAII, all resources are treated the same regardless of if
they are memory or file handle or sockets or whatever.

Which isn't necessarily an advantage.
Designing with RAII, your destructor will take care of closing
files or freeing memory, everything works fine. RAII is
absolutely reliable. As long as all resources are owned by
objects and all objects are destroyed, no resources will ever
be leaked. Once you introduce GC, you are in trouble because
you can't rely on the destructor being run for all objects
anymore. It depends, sometimes the destructor will be run,
sometimes the finalise will be run, depends, who knows.

No. You know when the destructor will run, and when it won't.
The result, IMO, is that the great "benefit" of GC which is to reduce
programming complexity by not having to worry about memory management
is totally trumped by the newly introduced complexity which is having
to implement finalise methods and having to worry about leaking
non-memory-only resources due to the destructor not being called.

Actual experience shows otherwise.
 
J

James Kanze


[...]
I don't have a fundamental problem with GC. However, I find it
difficult to use an approach that combine both RAII and GC. The
non-deterministic nature of GC mostly breaks RAII. I'd welcome
enlightment but at the moment, the only way I would find C++ with GC
of benefit would be if I was not using RAII at all and if the only
resources used was memory.

I think the key is to forget about "resources" (as something
special) for a moment, and think in terms of object lifetime.
Different objects have different constraints with regards to
lifetime:

-- For value oriented objects, it's largely irrelevant. The
same is true for a certain number of what I call
agents---objects which (typically) have no state of their
own, but exist only to do things on other objects (visiters,
observers, etc., which "connect" different objects).
Generally, these objects are allocated on the stack, and
copied as needed into the appropriate scope; the exceptions
are when agents need to be polymorphic, or when the objects
are too expensive to copy. Garbage collection is the
simplest solution for these, although normally, shared_ptr
also works fairly well.

-- Entity objects, whose lifetime is controlled by the
application semantics. Sometimes, they can just silently
die, in which case, garbage collection is all you need.
Most of the time, however (at least in my applications),
there has been a need for some sort of definite "death"; the
objects can't (or shouldn't) be used after this death, and
there may be specific behaviors associated with this death.

Neither garbage collection nor RAII (shared_ptr or other)
help in managing this lifetime, since it is determined by
events outside the program. And it's almost always
necessary to use some variant of the observer pattern to
notify all interested parties when its death occurs. (In
some simple cases, a combination of weak_ptr and
a shared_ptr to self in the object might be doable, but the
results are at least as complicated as a manual
implementation of the observer pattern, and can easily be
broken by future evolutions.) In such cases, garbage
collection can be used to improve robustness: the underlying
memory cannot be used for anything else as long as any
pointers to it exist (which shouldn't be the case, but
errors do happen), so you can mark it in some way, and test
for validity at the start of each member function.

-- Objects which manage external resources. These are the few
objects for which the term RAII is really appropriate. They
will normally be non-copiable, and in my experience, almost
always local variables. For the rare cases where local
variables are not appropriate, shared_ptr is (and not
garbage collection).

This is, of course, very general, and there are doubtlessly
other special cases which will occur. Still, my point remains:
you need to understand the role and the responsibilities of the
class (or the class hierarchy) before doing anything with the
class, and once you've done that, the choice of leaving it to
garbage collection, using shared_ptr, never allocating it
dynamically, or yet some other strategy, can be made.
 
J

James Kanze

On 27/10/2010 12:17, James Kanze wrote:
[...]
Just as RAII prevents memory and resource leaks, RAII also
prevents dangling pointers being an issue.
How?
We're talking about objects with arbitrary lifetimes here.
Objects with arbitrary lifetimes can be controlled with
a smart pointer, no need for garbage collection.
That's a blatent contradiction: an object with an arbitrary
lifetime has an arbitrary lifetime, not one controlled by some
smart pointer. In many applications, almost all instances of
delete are "delete this". (And as far as I know, "this" is
never a smart poionter.)
Garbage collection is useful, here, not to manage object
lifetime (garbage collection doesn't manage object lifetime),
but as a safety net, in order to be sure of being able to catch
dangling pointers.
shared_ptr employs RAII when taking ownership of an object via
its constructor.
Which is exactly why it cannot be used with objects which have
arbitrary lifetimes.
An object can encapsulate other sub-objects also utilizing
RAII.
An object can encapsulate other sub-objects, period. I'm not
sure I'd refer to that as RAII.
It is not a contradiction: I use smart pointers to control
arrays in my scripting language and it is up to the script to
determine when arrays are allocated and deallocated; if this
is not "arbitrary" then I don't know what is.

So how do you use smart pointers? shared_ptr obviously doesn't
work, since you cannot delete an object managed by a shared_ptr
at an arbitrary time.
Dangling pointers and memory leaks are problems which
disappear when RAII is used.

So you believe in silver bullets.
RAII is superiour to garbage collection.

RAII is orthogonal to garbage collection. A language with both
(e.g. C++ with the Boehm collector) is better than a language
with just one (C++ without the Boehm collector, or C#).
RAII is deterministic. Destructors exist, live with that
fact.

You seem to be the only person who seems to doubt it.
 
J

James Kanze

On 01/11/2010 12:07, James Kanze wrote:
If by "arbitrary lifetime" you mean the (totally obvious)
effect that GC object deallocation is non-deterministic

What on earth has that to do with anything? Given that garbage
collection has nothing to do with the lifetime of an object.
(It would help if you'd at least read the articles you respond
to.)

Arbirary lifetime means that the lifetime of the object depends
on some external event: a request to a server, a connection, the
state of hardware, a specific user interaction, etc. When that
event occurs, the object's lifetime must end. Regardless of
whether there is a shared_ptr floating around referring to it,
or anything else.

[...]
I would have thought you were less clueless and be able to acertain that
by "arbitrary lifetime" *I* was refering to the fact that one can have
multiple shared_ptr instances controlling the same object instead of
having one place in code to explicitly delete the controlled object.

Exactly. You no longer have arbitrary lifetime, you have
non-determistic lifetime. The object lifetime ends when the
last shared_ptr is destructed. Whenever that is... in all but
a few special cases, you really don't know (and the most common
reason for using shared_ptr is that you don't want to know).

The problem with shared_ptr (and delete in general, but you can
avoid it with delete if you have garbage collection installed)
is that memory allocation is strictly linked to object lifetime.
This is what you don't want; you don't want the memory
underlying the object to be recycled until the last pointer to
the object disappears, but you cannot use that as a criteria for
ending object lifetime; you must end the object lifetime even if
there are pointers to the object.
 
K

Keith H Duggar

Arbirary lifetime means that the lifetime of the object depends
on some external event: a request to a server, a connection, the
state of hardware, a specific user interaction, etc.  When that
event occurs, the object's lifetime must end.  Regardless of
whether there is a shared_ptr floating around referring to it,
or anything else.

So in the program at the end of this post, the integer pointed
to by n has "arbitrary lifetime" correct? Since the user must press
'y' to destroy it. And it would have exactly the same "arbitrary
lifetime" whether I used a naked pointer or a smart_ptr
instead of auto_ptr right?

So how is it the case that smart_ptr does not support objects
with arbitrary lifetime? Object lifetime is determined by the
/program logic/ as a whole not just the type of smart_pointer.

By the way, smart_ptr along with weak_ptr allows arbitrary lifetime
with multiple pointers that survive beyond the controlling pointer(s)
lifetime with error detection (in the form of thrown exceptions) when
a dangling weak_ptr is dereferenced.

Exactly the same level of protection from dangling pointers (though
not
automated obviously) you would have in a gc system that created
"zombie"
objects (by zeroing out memory or whatever).

<code>
#include <iostream>
#include <memory>

int main ( )
{
std::auto_ptr<int> n(new int(42)) ;
while ( std::cin ) {
char c = 0 ;
std::cout << "destroy? : " ;
std::cin >> c ;
if ( c == 'y' ) {
n.reset() ;
break ;
}
}

return 0 ;
}
</code>

KHD
 
J

James Kanze

So in the program at the end of this post, the integer pointed
to by n has "arbitrary lifetime" correct? Since the user must
press 'y' to destroy it. And it would have exactly the same
"arbitrary lifetime" whether I used a naked pointer or
a smart_ptr instead of auto_ptr right?

You can create trivial and uninteresting cases, where the
arbitrary lifetime corresponds to a scope. In realistic cases,
that won't be the case.
So how is it the case that smart_ptr does not support objects
with arbitrary lifetime? Object lifetime is determined by the
/program logic/ as a whole not just the type of smart_pointer.

Object lifetime is should be determined by the program logic,
not by the presence or absence of some pointer to the object.
(For objects where lifetime is an issue.)
By the way, smart_ptr along with weak_ptr allows arbitrary
lifetime with multiple pointers that survive beyond the
controlling pointer(s) lifetime with error detection (in the
form of thrown exceptions) when a dangling weak_ptr is
dereferenced.

There are some applications where that can be made to work,
where you can hierarchize the navigation in some way. At a cost
of a lot of extra complexity and fragility. And it doesn't
always work.
Exactly the same level of protection from dangling pointers
(though not automated obviously) you would have in a gc system
that created "zombie" objects (by zeroing out memory or
whatever).
<code>
#include <iostream>
#include <memory>
int main ( )
{
std::auto_ptr<int> n(new int(42)) ;
while ( std::cin ) {
char c = 0 ;
std::cout << "destroy? : " ;
std::cin >> c ;
if ( c == 'y' ) {
n.reset() ;
break ;
}
}
return 0 ;}
</code>

Replace auto_ptr with shared_ptr. Then make a copy of it in an
object somewhere. Continue operating after the break, handling
other commands.

Using smart pointers like this is a disaster waiting to happen.
 
M

Matthias Meixner

Am 01.11.2010 17:07, schrieb Leigh Johnston:
Lulz, are you being serious? "Test the object for validity at the start
of each member function"? Seriously? Either get a clue Mr Kanze or
stop trolling as it is tedious.

What should be wrong with it? On the contrary this is common practice
when writing security critical code: Check the validity of all
parameters and the this pointer is just one of them. Not obeying this
rule is one source of many security problems (e.g. resulting in buffer
overflows).

- Matthias Meixner
 
M

Matthias Meixner

Am 01.11.2010 23:07, schrieb Leigh Johnston:
Bullshit. The rule that needs to be obeyed is that you don't use
pointers to objects whose lifetime has ended. Once an object's lifetime
has ended the memory it occupied should either be freed or reused, not
marked as "invalid" by the programmer as doing so is plain crazy (this
pointer is not valid). An legitimate exception to this, for example,
would be a debug build of a memory allocator but this is the
implementation's responsibility and not the programmer's.

"invalid" is not the same as "whose lifetime has ended". Objects may be
in an invalid state that forbids to call most of its members but they
may still be alive.
 
Ö

Öö Tiib

Yes and this "invalid state" is just another "valid" state that an
object can be in and forms part of the object's class invariant, i.e. it
has nothing to do with garbage collection.

GC is for to have well-defined "destroyed and deallocated" state for
objects of dynamic storage. What is so bad about it? Real people have
often to maintain ugly C++ code-bases. It is pointless to mention that
good programmer does not write ugly things. Where to take them
"goods"? Check reality. Majority of C++ code on our planet is written
by sub-average programmers.

Static analyze tools can find lot of null pointer dereferences,
uninitialized variable usage, return value ignoring and bounds
violations. Run one on a unknown codebase and it discovers hundreds of
such. Can you point at static analyze tool that finds out noteworthy
fraction of dangling pointer usages in reasonable-sized code base?
There are none such things. That leaves run-time checks.

Touching dangling pointer runtime is UB both in standard and in
reality. Closest tool to check dangling pointers run-time is valgrind.
Debugging with it is close-to impossibly slow. Best performing
alternative to GC is weak_ptr. weak_ptr is so ugly when used that
every time i see one i want to kill it. I would put weak_ptr among
debugging tools not GC. GC provides weak_ptr-like-state for raw
pointers. GC does not affect performance so much like other debugging
tools so it does not deserve to be put among these. Some claim it even
improves performance. I have also observed it results with better
overall performance compared with weak_ptr usage.

The main thing what i do not understand is why you care? Do not use it
if you do not want it. Library like boost. Only changes features of
language slightly.
 
B

Bart van Ingen Schenau

Just as a technical point, isn't that nonconforming code anyway? I
recall that there are rather strict rules about pointer math, and it
might be the case that pointer math-ing a pointer to point to anything
besides an element in the array or one-past-the-end is undefined
behavior. I can't recall the specifics though.

You are completely right in your description of the pointer-math rules
and that my example incurs undefined behaviour.
Finally, that undefined behavior is probably too widespread to ignore
for a reliable garbage collector, so this is rather pedantic.

Bart v Ingen Schenau
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,824
Members
47,369
Latest member
FTMZ

Latest Threads

Top