[added comp.programming.threads]
Well, they obviously leak if cycles are present. And while "an
order of magnitude slower" is obviously false as well, Hans
Boehm has performed a number of benchmarks, and they are slower
in a lot of cases.
You can definitely misuse reference counting. For instance, you would NOT
want to use it if you were iterating over a large linked-lists because you
would have to update the counter on each object visited during the
traversal. Take a lock-free reader/writer pattern based on atomic pointers
into account:
<code-sketch>
_____________________________________________________________________
struct object {
atomic_ptr<object> m_next;
void read_only_operation() const;
};
static atomic_ptr<object> g_head;
void writer_push() {
local_ptr<object> l_obj(new object);
local_ptr<object> l_cmp(g_head);
do {
l_obj->m_next = l_cmp;
} while (! g_head.cas(l_cmp, l_obj));
}
void writer_pop() {
local_ptr<object> l_cmp(g_head);
do {
if (! l_cmp) { return; }
} while (! g_head.cas(l_cmp, l_cmp->m_next));
}
void readers() {
local_ptr<object> l_obj(g_head);
while (l_obj) {
l_obj->read_only_operation();
l_obj = l_obj->m_next;
}
}
_____________________________________________________________________
That would give you horrible performance on the readers _and_ the writers.
However, there is a solution. You can use reference counting to manage
so-called proxy collector objects. I can rewrite the code above using a
proxy garbage collector. I will use this implementation for the refactored
code:
http://appcore.home.comcast.net/misc/pc_sample_h_v1.html
Now I can write the above as:
_____________________________________________________________________
struct object {
object* m_next;
pc_node m_pcn;
object() : m_next(NULL) {
pc_node_init(&m_pcn);
}
void read_only_operation() const;
};
static object_dtor(pc_node* pcn) {
delete CONTAINER_OF(pcn, object, m_pcn);
}
static object* g_head = NULL;
static pc_master g_pcm = PC_MASTER_STATICINIT(&g_pcm, object_dtor);
void writer_push() {
object* const l_obj = new object;
object* l_cmp = g_head;
do {
l_obj->m_next = l_cmp;
} while (! CASPTR(&g_head, &l_cmp, l_obj));
}
void writer_pop() {
pc_region* const pcr = pc_acquire(&g_pcm);
object* l_cmp = g_head;
do {
if (! l_cmp) { break; }
} while (! CASPTR(&g_head, &l_cmp, l_cmp->m_next));
pc_release(&g_pcm, pcr);
if (l_cmp) {
pc_mutate(&g_pcm, &l_cmp->m_pcn);
}
}
void readers() {
pc_region* const pcr = pc_acquire(&g_pcm);
object* l_obj = g_head;
while (l_obj) {
l_obj->read_only_operation();
l_obj = l_obj->m_next;
}
pc_release(&g_pcm, pcr);
}
_____________________________________________________________________
This gives excellent performance because I can use raw pointers for the
readers and writers. This basically proves that reference counting can be
used is highly effective manner with hardly any overheads because of the
amortization that going on in the example above. This is just one example of
how flexible atomic reference counting can be...
A lot depends on how you use dynamic memory. The runtime of a
typical collector depends on the amount of memory actually in
use when the garbage collector runs; the runtime of a typical
manual memory allocator depends on the number of allocations and
frees. An application which uses a lot of small, short lived
objects, and disposes of adequate memory, will run significantly
faster with garbage collection. An application which only
allocates a few, very big object, will run faster with manual
collection or shared pointers.
What allocator algorithms are you talking about? There are allocators that
have virtually zero-overheads in that they have fast-paths which don't make
any use of atomic ops and/or memory barriers...