A garbage collector for C++

M

Mingnan G.

Hello everyone.

I have written a garbage collector for standard C++ application. It has
following main features.

1) Deterministic Finalization
Providing deterministic finalization, the system can manage resources
as well as objects. The programming style is clear and easy, conforming
to RAII (Resource Acquisition Is Initialization) idiom of C++
programmers. The memory usage is very efficient, acyclic garbage is
reclaimed when the last reference to it is removed. A well-designed
application, which eliminates cyclic data structure, does not need
expensive garbage collection and is always running with minimum memory
usage.

2) Accurate GC for C++
It is a fully accurate tracing garbage collector. All garbage objects
are identified by the system, no conservative stack frame guessing.
Fully C++ optimization compiler support.

3) No Pause (less than 1us)
In this system, all application codes automatically become fully
interruptible and GC-Safe. Therefore, scavenge can start at any place
without rendezvous requirement. A special concurrent tracing garbage
collector successfully evades root-set scanning, and does not cause
suspension of any thread at all. In the worst racing case, the latency
is less than one microsecond (not millisecond). It is very satisfied
for real-time systems.

4) Small Overhead
The runtime cost of application threads is far less than a normal
reference counting. If there is no concurrently running scavenging
action, there is no write-barrier overhead, no strong memory ordering
requirement, no synchronization overhead. There is no extra code or
data structure injected for GC safe point. The whole system does not
require strong memory ordering, it is suitable for most modern
processor architecture. Multi-processor concurrency can be further
exploited by the multi-threading property of mutator and collector.

5) Compatible Object Model
As well as conventional C++, the system supports multiple-inheritance,
object as member variables, and object arrays. Support C++ raw pointer,
unions, bit-fields and hidden pointers. Support C++ templates. Support
native object and tracing.

6) Widely Portable
Even conducting an accurate tracing, the system does not require any
special information from compiler. Application can use any standard C++
compiler, such as Visual C++ 8 and GCC. There is no special platform
requirement, such as Win32 system call: SuspendThread, GetWriteWatch,
etc. Even virtual memory support is not necessary. Thus, it can be
ported to a wider area.

Does anyone have any interest in this system? Any comments are welcome!

Mingnan G.
 
R

Roland Pibinger

Hello everyone.

I have written a garbage collector for standard C++ application. It has
following main features.

1) Deterministic Finalization
Providing deterministic finalization, the system can manage resources
as well as objects. The programming style is clear and easy, conforming
to RAII (Resource Acquisition Is Initialization) idiom of C++
programmers. The memory usage is very efficient, acyclic garbage is
reclaimed when the last reference to it is removed.

This probably means some kind of ref-counted "smart" pointer. With
real pointers and new-ed objects you can hardly get RAII??
Does anyone have any interest in this system? Any comments are welcome!

Unless you want to sell your GC as a commercial product you could just
post links to download and documentation.

Best regards,
Roland Pibinger
 
C

Chris Thomasson

Mingnan G. said:
Hello everyone.

I have written a garbage collector for standard C++ application. It has
following main features.
[...]

4) Small Overhead
The runtime cost of application threads is far less than a normal
reference counting.

I posted something on amortizing the cost of reference counting down to
virtually zero by using by using something called "proxy" garbage
collection. Unfortunately, I can't seem to find the exact message on Google,
so I appended my *copy to the end of this message...



If there is no concurrently running scavenging
action, there is no write-barrier overhead, no strong memory ordering
requirement, no synchronization overhead.

What sort of transactions occur between your collector and its mutators
during a "scavenge" operation? Do your mutators have any #StoreLoad
dependencies during a scavenge operation? I take it that the collector can
indeed interrupt mutators during scavenges, correct?



There is no extra code or
data structure injected for GC safe point. The whole system does not
require strong memory ordering, it is suitable for most modern
processor architecture. Multi-processor concurrency can be further
exploited by the multi-threading property of mutator and collector.

I am guessing, but from this, it sounds like your design allows for loads to
pass stores (e.g., #LoadStore dependency ) right? If so, how does your stuff
differ from the following scheme:

http://citeseer.ist.psu.edu/domani00implementing.html

http://groups.google.com/group/comp.programming.threads/msg/16b31df060411bcd?hl=en



[...]

Any comments are welcome!

FWIW, I use a more relaxed version of the synchronization pattern I briefly
described here:

http://groups.google.com/group/comp.programming.threads/msg/e59c2dd1afb8e0a3?hl=en

For my proxy gc that I briefly introduced here:

http://groups.google.com/group/comp.programming.threads/msg/72684aa070964e6a

As you can see, the periodic/episodic synchronization usage pattern
amortizes the cost of memory barriers over successive epochs in time. This
allows mutators to make synchronization-free local reference count
adjustments; no atomic operations or memory barriers are required. The same
goes for collector<->mutator communications. So, a mutator can literally
adjust a reference count by doing this:


++this_mutator->refs[whatever];


Both VZOOM and Joe Seighs RCU-SMR can be implemented in a way that can that
can dissect quiescent-states out of operating system provided
information... Except, as I stated before, the stuff that I invented can be
based on a more relaxed sync pattern I created.


BTW, speaking of inventions, If you think you have something truly novel,
and sets you apart from prior art, I suggest that you think about doing a
patent feasibility study, and possibly going thought the patent application
process. I suggest that you read the following thread:

http://groups.google.com/group/comp...fb64e?q=atomic+patent&rnum=1#0b9072fea09fb64e

:O




I would also suggest that you include comp.programming.threads into the
discussion...










* ----------------
David Hopwood said:
[posting via Google because of news server problems -- sorry if this is
a duplicate]

Chris said:
You don't need to garbage collect everything. Proxy collection can allow
an application to get away from traditional GC, and all of the overhead
that comes along with them.

I always find it odd when people talk about the "overhead of GC" as
though
other memory management techniques had no overheads.

You shouldn't assume that a hybrid of GC and some other memory
management
technique will have lower overhead than applying GC to all objects. If
the
other technique is some form of reference counting (as is commonly the
case
when using proxy GC), then it definitely won't; refcounting is very
inefficient.

I agree, atomic lock-free refcounting can be very expensive. However, you
can simply amortize the cost of reference counting, if your using
ref-counting to begin with, when your implementing a proxy collector. You
don't reference count everything, "just" the collector structures. IMO,
using lock-free atomic reference counting to protect each node of a
data-structure would generate massive overheads. However, as you probably
know, there are other methods that can amortize the cost of actually
reference counting the collector structures. I created a straightforward
usage pattern for my VZOOM collector that is based on per-thread
periodic/episodic synchronizations...




- You don't need to activate the collector every time you access a shared
data-structure... You can setup a runtime configurable dynamic ratio of
accesses:syncs... Unfortunately, algorithms like RCU are not compatible with
this setup because it simply requires an arbitrary number of references to
persist over quiescent states. Your application will drive right off a cliff
if your try to do this with RCU! IMO, even RCU-SMR can't really fit this
bill because it was not really designed to allow the user to grab an
arbitrary number of references...


- Also, the collector I came up can be implemented with POSIX and dependant
load-ordering. So, an application can use it to implement lock-free reader
patterns on many different operating systems and processor combinations. Can
a traditional GC compete with the flexibility of a portable user-space proxy
collector wrt solving the reader/writer problem in lock-free algorithms?
 
M

Mingnan G.

With real pointers and new-ed objects you can hardly get RAII??
I am not sure I understand what you mean. In this system, C++ raw
pointers are treated as weak references as Java. They will not retain
an object alive. Only smart pointers will keep an object alive. New
objects are created similar to C++ new operator. For example:
class Foo { ... };
CLockedPtr<Foo> p = hgc_new(Foo, (arguments...));

The system will invoke the destructor of class Foo at proper time. When
the object is acyclic, the system will reclaim immediately. When the
object is circular referenced, the system will reclaim it by tracing
garbage collection. Programmers need not care about these thing, they
should only puts their native-resource management code in destructor
method, and let the system to do the else things.
... you could just post links to download and documentation.
Thanks for your kind advice. I am doing that currently. I need a place
to put my documentation and source code, and I need some time.
 
C

Chris Thomasson

Mingnan G. said:
I am not sure I understand what you mean. In this system, C++ raw
pointers are treated as weak references as Java. They will not retain
an object alive. Only smart pointers will keep an object alive.

You have to use smart pointers to keep a "specific object alive"? What kind
of "per-object overhead" are we talking about here? How does a mutator
thread use the smart pointer to interact with your collectors "scavenging"
operations?




FWIW, this was one of the reasons why proxy gc was created, mutator threads
can use raw pointers and dependant load-ordering to access shared
data-structures; no per-object smart-pointer logic needed. As you probably
know, dependant load-ordering is supported on every existing arch, except
DEC alpha; it happens to require a rmb(). On Linux this type of "special"
barrier is expressed as read_barrier_depends(); its a no-op on everything
but alpha.

As I stated in an earlier post, your mutator threads do not need to track
each object individually. You can convert your per-object smart pointers,
into per-epoch smart pointers and follow a simple RCU pattern:

http://www.rdrop.com/users/paulmck/RCU/

enter_non_quiescent();

// access shared data-structures

leave_non_quiescent();


This could be expressed in C++ as:

{
gc::scoped_non_quiescent collected_scope;
// access shared data-structures
}


The overhead behind enter_non_quiescent() can be as simple as a normal load,
and leave_non_quiescent() can be as simple as a normal store. No atomic ops
and no #StoreLoad or #LoadStore memory barriers required. No kidding.
 
C

Chris Thomasson

Mingnan G. said:
I am not sure I understand what you mean. In this system, C++ raw
pointers are treated as weak references as Java. They will not retain
an object alive. Only smart pointers will keep an object alive.

Do your smart pointers have an atomic CAS or XCHG operation? I mean, can I
do something like this pseudo-code:


static your_strongptr< Foo > p_Foo = your_new( Foo ... );


{
your_weakptr< Foo > c_Foo, x_Foo = your_new( Foo ... );
do {
c_Foo = p_Foo;
if ( c_Foo ) {
x_Foo->LocalModify( c_Foo );
}
} while ( p_Foo.cas( c_Foo, x_Foo ) );

if ( c_Foo ) {
c_Foo->ProcessExchangedWith( x_Foo );
}
}




Can I do 100% lock-free programming with your smart pointers?
 
M

Mingnan G.

I am glad to see your kindly messages and suggestions, thanks very
much.

I am still carefully reading your posts and links. Followings are
questions I know to answer.
What sort of transactions occur between your collector and its mutators during a "scavenge" operation?
During a "scavenge" operation, the system uses multi-threading
synchronization mechanism between collector and mutators. There is some
overhead. However, since most objects are acyclic, scavenge operation
need not run as frequently as prior-art garbage collector. I recommend
tracing garbage collection runs with higher priority than normal
application threads, but lower than real-time threads. So that,
expensive tracing GC can finished sooner, and real-time threads are not
affected.
Do your mutators have any #StoreLoad dependencies during a scavenge operation? I take it that the collector can indeed interrupt mutators during scavenges, correct?
Mutators have #StoreLoad dependencies during a scavenge operation. The
collector can interrupt any application threads at any place, and start
a tracing collection without delay.
Do your smart pointers have an atomic CAS or XCHG operation?
Yes, but not always. During the "scavenge" operation, there is some
complicate synchronization overhead. These synchronization will use
atomic CAS or XCHG.
your_weakptr< Foo > c_Foo, x_Foo = your_new( Foo ... );
In this system, your_weakptr< Foo > will not keep the newly created Foo
object alive. Therefore, when the execution of this statement is
finished, the newly created object is reclaimed immediately. That maybe
not you wanted.

Mingnan G.
 
M

Mingnan G.

Can I do 100% lock-free programming with your smart pointers?
Maybe, I am studying those links you posted. The problem is that I want
to provide strict deterministic finalization, all acyclic garbage
objects are reclaimed in a predictable ordering. I am not sure
per-epoch smart pointers may work.
 
C

Chris Thomasson

Mingnan G. said:
I am glad to see your kindly messages and suggestions, thanks very
much.

I am still carefully reading your posts and links. Followings are
questions I know to answer.

During a "scavenge" operation, the system uses multi-threading
synchronization mechanism between collector and mutators. There is some
overhead. However, since most objects are acyclic, scavenge operation
need not run as frequently as prior-art garbage collector.
I recommend
tracing garbage collection runs with higher priority than normal
application threads, but lower than real-time threads. So that,
expensive tracing GC can finished sooner, and real-time threads are not
affected.

Mutators have #StoreLoad dependencies during a scavenge operation. The
collector can interrupt any application threads at any place, and start
a tracing collection without delay.

Yes, but not always. During the "scavenge" operation, there is some
complicate synchronization overhead. These synchronization will use
atomic CAS or XCHG.

Okay. So you are using atomic operations for the scavenge operations...
Humm...



In this system, your_weakptr< Foo > will not keep the newly created Foo
object alive. Therefore, when the execution of this statement is
finished, the newly created object is reclaimed immediately. That maybe
not you wanted.

Your correct. I wanted the newly created Foo object (x_Foo) to be CAS'd into
the global "smart-pointer" (p_Foo), therefore keeping it alive. The "old"
object (c_Foo) is the one that can be garbage collected. Humm, I wonder if I
can do atomic lock-free COW or PCOW based algorithms with your setup...
Humm... How about this:


static your_strongptr< Foo > p_Foo = your_new( Foo ... );


{
your_strongptr< Foo > c_Foo, x_Foo = your_new( Foo ... );
do {
c_Foo = p_Foo;
if ( c_Foo ) {
x_Foo->LocalModify( c_Foo );
}
} while ( p_Foo.cas( c_Foo, x_Foo ) );

if ( c_Foo ) {
c_Foo->ProcessExchangedWith( x_Foo );
}
}


Now there is no weak pointers, could this work in your setup?
 
C

Chris Thomasson

Mingnan G. said:
Maybe, I am studying those links you posted. The problem is that I want
to provide strict deterministic finalization, all acyclic garbage
objects are reclaimed in a predictable ordering. I am not sure
per-epoch smart pointers may work.

Well, one of the good things about per-epoch smart pointers is that they are
very friendly to large shared linked data-structures. Reader threads can
walk through a such a data-structure by using normal raw pointers, and
normal loads and stores to access the structures nodes:


{
gc::scoped_non_quiescent collected_scope;

node_t *nx, *n = collection::head;
while ( n ) {
nx = n->nx;
whatever( n );
n = nx;
}
}




When you are using per-object smart pointers, reader threads have to access
the smart pointer logic every time they access a node. The larger the linked
data-structure, the more overheadgets; negative scalability:


{
your_strongptr< node_t > nx, n = collection::head;
while ( n ) {
nx = n->nx;
whatever( n );
n = nx;
}
}



This is a simple example of why per-object stuff can be "very" expensive...
 
C

Chris Thomasson

Mingnan said:
I am glad to see your kindly messages and suggestions, thanks very
much.

I am still carefully reading your posts and links. Followings are
questions I know to answer.



During a "scavenge" operation, the system uses multi-threading
synchronization mechanism between collector and mutators. There is some
overhead. However, since most objects are acyclic, scavenge operation
need not run as frequently as prior-art garbage collector. I recommend
tracing garbage collection runs with higher priority than normal
application threads, but lower than real-time threads. So that,
expensive tracing GC can finished sooner, and real-time threads are not
affected.



Mutators have #StoreLoad dependencies during a scavenge operation. The
collector can interrupt any application threads at any place, and start
a tracing collection without delay.

Humm... I am curious as to how you are implementing the interrupt logic???
 
M

Mingnan G.

Interrupt logic in this system is quite straight. All application code
is fully interruptible, so the mutator can be interrupted at any place.
If an application thread invoke the garbage collection, the thread
immediately enter the scavenge stage and starts traversals. Other
mutators keep running concurrently with the thread, which become the
collector. During the scavenge stage you can do anything, including
creating new threads, creating new GC objects, etc. Higher priority
threads might preempt the collector at any place, but this will cause
the collector running longer to finish a collection.

Mingnan G.
 
M

Mingnan G.

I do not quite understand your "per-epoch" smart pointers. In this
system, smart pointers physically is just a word of object address. It
traps assignment to pointers. Converting a strong smart pointer to a
raw pointer has no overhead through C++ compiler's optimization, just
like normal C/C++ pointer casting. On the contrary, converting a raw
pointer to strong smart pointer will indead cause a little overhead.
The overhead is very small and in most cases programmers should not
convert a raw pointer to a strong pointer. A strong pointer normally
should derived from another strong pointer. Convertion between strong
pointers is faster than converting from raw pointers.

Mingnan G.
 
R

Rui Maciel

Mingnan said:
Thanks for your kind advice. I am doing that currently. I need a place
to put my documentation and source code, and I need some time.

Why not release it as a sourceforge project? Sourceforge was created for
that exact purpose. It offers web hosting, CVS/subversion, project forums,
bug tracking and some other stuff I don't remember right now.

Here is the URL: http://sourceforge.net/


Hope this helps
Rui maciel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top