Thread-safe reference counts.

J

jason.cipriani

I have some code where objects are dynamically allocated by some
thread, and then used by multiple threads and freed when they are no
longer needed. I think that reference counting is the most appropriate
way to handle this in my situation.

However, my code currently has a very obvious problem in it. Here is
how I have reference counting implemented for my objects, basically
(sorry about any errors I am typing this here in this message):

=== BEGIN CODE ===

class A {
public:
A ();
void AddRef ();
void Release ();
private:
~A ();
int refs_; // reference count
pthread_mutex_t mtx_; // protects refs_
};

// constructor inits reference count to 1
A::A ()
: refs_(1), mtx_(PTHREAD_MUTEX_INITIALIZER)
{
}

// increment count
void A::AddRef () {
pthread_mutex_lock(&mtx_);
++ refs_;
pthread_mutex_unlock(&mtx_);
}

// decrement count, destroying object at 0.
void A::Release () {
bool deleteme;
pthread_mutex_lock(&mtx_);
-- _refs;
deleteme = (_refs == 0);
pthread_mutex_unlock(&mtx_);
// <--- and here is the problem!
if (deleteme)
delete this;
}

=== END CODE ===

The problem is in Release(). There is a short period of time where the
mutex is unlocked but the object is already doomed for destruction; if
another thread calls AddRef() during that time, that other thread now
has a pointer to garbage and doesn't know it.

I can't delete this before unlocking the mutex. There is also a second
problem, where even if, hypothetically speaking, I could do the delete
inside the mutex lock in release, there's still the problem where
another thread may have called AddRef() in the mean time, and it is
blocking on that mutex, and by the time Release() returns if the
reference count had decreased to 0, the object is deleted, the thread
blocking in AddRef() continues, and operates on the object that's
already been deleted.

So I have two questions:

1) I have no specific reason for implementing this all by hand. I am
not completely familiar with boost, though. Is there some boost thread-
safe reference counting thing that I can use to take care of this all
for me?

2) In any case, how can I fix the above problems in my own code? I
can't quite get my head around it, it seems like it's not possible to
do reference counting from "within" the object; that I need some kind
of external "wrapper" around it instead. Also I am unsure about the
logic that I'd need to use to protect against the case where AddRef
blocks on the mutex, Release destroys the object, then AddRef
continues on a deleted object. Really I guess what I said in #1 was
kind of a lie: I certainly wouldn't *mind* not using boost, as I'm
currently not using boost for anything else in this project, and
installing boost on all the development machines is a *minor* hassle
(nothing I can't overcome, but I don't mind implementing this by hand
at this stage in this particular project, at least).

Thanks,
Jason
 
J

jason.cipriani

Is there some boost thread-
safe reference counting thing that I can use to take care of this all
for me?

I see boost::shared_ptr, which appears to do exactly what I want. I'll
just accept the fact that it works by magic and move on with my life.

Jason
 
S

Sam

So I have two questions:

1) I have no specific reason for implementing this all by hand. I am
not completely familiar with boost, though. Is there some boost thread-
safe reference counting thing that I can use to take care of this all
for me?

Boost's shared_ptr class does this. The flaw in your logic is that you
require explicit action by a thread to register and deregister a reference
to an instance of a class.

The correct approach is to incorporate registration and deregistration into
any instance of a pointer to an instance of the class, so this is handled
automatically by the pointer implementation. Any time the pointer is passed,
the copy constructor registers another reference to the class. ANy time a
pointer goes out of scope, the reference gets deregisters. When the
reference count goes to 0, by definition that means the last pointer to the
class instance just went out of scope, and the object can be safely
destructed.

That's what Boost's shared_ptr does. Although -- much like the rest of
Boost's code -- it is horribly inefficient, and suffers from certain design
flaws -- it is often a good starting point for your own reference-counted
pointers.
of external "wrapper" around it instead. Also I am unsure about the
logic that I'd need to use to protect against the case where AddRef
blocks on the mutex, Release destroys the object, then AddRef
continues on a deleted object.

You need to wrap your brain around the concept that /every/ time a pointer
to the object gets instantiated, the reference count gets increased, and
/every/ time a pointer to the object goes out of scope, the reference count
gets decreased, so when the reference count goes to 0, by definition, no
more pointers to the object exist and it's safe to delete it.

You don't want to do it manually, by hand, so that's why you want to use an
explicit pointer object class, that handles the reference counting as part
of its constructor, copy constructor, assignment operator, and destructor.
That's the only way to do it reliably.
Really I guess what I said in #1 was
kind of a lie: I certainly wouldn't *mind* not using boost, as I'm
currently not using boost for anything else in this project, and

Then don't. Resist the temptation to use Boost as long as you can. Many
years from now, you'll be thankful for that. Define and implement your own
pointer class, and use this as a learning experience. True, your reference
counting implementation will be much slower than Boost's. Boost does not use
mutexes for this, rather it uses CPU-specific atomic increment/decrement
instructions (and gets it partially wrong, by the way). But this is a good
way to learn some important principles of thread-safe programming.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBH6v/Nx9p3GYHlUOIRAhS0AJwMC7C/6uwYtYNkRDdLk4h6AIHscwCfaoth
JKFZzwlCqo+COdKp3DpAbWk=
=HaO2
-----END PGP SIGNATURE-----
 
J

jason.cipriani

Thanks for the detailed reply, Sam. I appreciate it and it confirms a
lot of my suspicions.

That's what Boost's shared_ptr does. Although -- much like the rest of
Boost's code -- it is horribly inefficient, and suffers from certain design
flaws -- it is often a good starting point for your own reference-counted
pointers.

I can't comment on shared_ptr's efficiency since I have no experience
with it. Blinding efficiency is not important for this particular
application, however, but as far as "certain design flaws"... is there
anything in particular that I should watch out for? I'm not doing
anything weird, I have STL containers of shared_ptr<Object>'s, I
sometimes return shared_ptr<Object>'s from functions, other than that
it's pretty basic stuff. So far in my shared_ptr experiments it seems
to be well-behaved; although using "ptr.reset(new Object)" is a little
strange when you are used to "ptr = new Object" -- but it makes sense
(similarily, "thelist.push_back(MyPtr(new Object))" as opposed to
"thelist.push_back(new Object)" -- but I'm thankful for the explicit
constructor, it's helping me catch some mistakes).

You need to wrap your brain around the concept that /every/ time a pointer
to the object gets instantiated, the reference count gets increased, and
/every/ time a pointer to the object goes out of scope, the reference count
gets decreased, so when the reference count goes to 0, by definition, no
more pointers to the object exist and it's safe to delete it.

Thanks for explaining it this way; I had been thinking about it on a
wider scope, like "when this thread gets a pointer, the reference
count is incremented, and when this thread is done with it, decrement
it" -- which was confusing the heck out of me.
[snip] Define and implement your own
pointer class, and use this as a learning experience. True, your reference
counting implementation will be much slower than Boost's.[snip]
But this is a good
way to learn some important principles of thread-safe programming.

Definitely; at least for a learning experience, I was planning on
doing this at the end of this project. I appreciate the shared_ptr's
magic but it's always good to know what's going on under the hood.

Thanks again,
Jason
 
C

Chris Thomasson

[comp.programming.threads added]

I have some code where objects are dynamically allocated by some
thread, and then used by multiple threads and freed when they are no
longer needed. I think that reference counting is the most appropriate
way to handle this in my situation.

However, my code currently has a very obvious problem in it. Here is
how I have reference counting implemented for my objects, basically
(sorry about any errors I am typing this here in this message):
[...]

Your question about another thread calling AddRef while one is calling
Release and dropping to zero means that you need strong thread-safety which
shared_ptr does not provide. Here are some implementations you can take a
look at:


Joe Seighs work:

http://atomic-ptr-plus.sourceforge.net/

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/3c763698da537d8f



My work:


http://appcore.home.comcast.net/~appcore/vzoom/refcount


http://groups.google.com/group/comp.programming.threads/browse_frm/thread/177609a5eff4a466
(100% portable version (e.g., POSIX Threads)



Here is some information on strong thread-safety:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e5167941d32340c6



If you have any questions, please feel free to ask. I know that we can work
out a soultion.
 
C

Chris Thomasson

Chris Thomasson said:
[comp.programming.threads added]

I have some code where objects are dynamically allocated by some
thread, and then used by multiple threads and freed when they are no
longer needed. I think that reference counting is the most appropriate
way to handle this in my situation.

However, my code currently has a very obvious problem in it. Here is
how I have reference counting implemented for my objects, basically
(sorry about any errors I am typing this here in this message):
[...]

Your question about another thread calling AddRef while one is calling
Release and dropping to zero means that you need strong thread-safety
which shared_ptr does not provide. Here are some implementations you can
take a look at:
[...]

Here is context that I accidentally snipped:


{
The problem is in Release(). There is a short period of time where the
mutex is unlocked but the object is already doomed for destruction; if
another thread calls AddRef() during that time, that other thread now
has a pointer to garbage and doesn't know it.

I can't delete this before unlocking the mutex. There is also a second
problem, where even if, hypothetically speaking, I could do the delete
inside the mutex lock in release, there's still the problem where
another thread may have called AddRef() in the mean time, and it is
blocking on that mutex, and by the time Release() returns if the
reference count had decreased to 0, the object is deleted, the thread
blocking in AddRef() continues, and operates on the object that's
already been deleted.
}


According to the text above, you need strong thread-safety; period. You not
following the rule of basic thread-safety which says that a thread _cannot_
acquire a reference to an object that it does not already own a reference
to. If your application does not follow that rule, then you need something
like Joe's excellent atomic_ptr.
 
G

Gianni Mariani

T

Torsten Robitzki

Chris said:
[comp.programming.threads added]

I have some code where objects are dynamically allocated by some
thread, and then used by multiple threads and freed when they are no
longer needed. I think that reference counting is the most appropriate
way to handle this in my situation.

However, my code currently has a very obvious problem in it. Here is
how I have reference counting implemented for my objects, basically
(sorry about any errors I am typing this here in this message):

[...]

Your question about another thread calling AddRef while one is calling
Release and dropping to zero means that you need strong thread-safety
which shared_ptr does not provide. Here are some implementations you can
take a look at:

This just means, that you are making a copy of an object where the
destructor is in progress. That's simply a bug and should be avoided ;-)

best regards,
Torsten
 
Y

Yannick Tremblay

-=-=-=-=-=-

That's what Boost's shared_ptr does. Although -- much like the rest of
Boost's code -- it is horribly inefficient, and suffers from certain design
flaws -- it is often a good starting point for your own reference-counted
pointers.
[...]

Then don't. Resist the temptation to use Boost as long as you can. Many
years from now, you'll be thankful for that. Define and implement your own
pointer class, and use this as a learning experience. True, your reference
counting implementation will be much slower than Boost's. Boost does not use
mutexes for this, rather it uses CPU-specific atomic increment/decrement
instructions (and gets it partially wrong, by the way). But this is a good
way to learn some important principles of thread-safe programming.

Care to clarify?

And does this also apply to std::tr1::shared_ptr which is based on
boost impelmentation on many platfroms.

Yan
 
S

Sam

Yannick said:
-=-=-=-=-=-

That's what Boost's shared_ptr does. Although -- much like the rest of
Boost's code -- it is horribly inefficient, and suffers from certain design
flaws -- it is often a good starting point for your own reference-counted
pointers.
[...]

Then don't. Resist the temptation to use Boost as long as you can. Many
years from now, you'll be thankful for that. Define and implement your own
pointer class, and use this as a learning experience. True, your reference
counting implementation will be much slower than Boost's. Boost does not use
mutexes for this, rather it uses CPU-specific atomic increment/decrement
instructions (and gets it partially wrong, by the way). But this is a good
way to learn some important principles of thread-safe programming.

Care to clarify?

The entire design is completely braindead. A separate object, a tiny
object that holds the reference count, gets instantiated for every object
tracked by shared_ptr. This is insane. Heavy use of shared_ptr is just going
to hammer your heap like there's no tomorrow. This is horrible design. The
correct approach is to store the reference count in superclass and derive
from it. You can't then attach shared_ptr to an arbitrary libstdc++ object,
of course, so you just multiply-inherit from it. How hard is that?

Then you have the shared_ptr itself, an object that holds two pointers, one
to the original object, the other to the reference count.

Which means that most compilers won't be able to hold a shared_ptr in CPU
registers, like ordinary pointers. Instead, things are going to get tossed
around in memory.

There are also several other design flaws as well. Like the fact that each
time a reference count gets updated, it involves a call to a library
function. Reference count increment/decrement cannot be inlined, due to a
real first class fookup in the templates. After I read through the class
definition, I just shook my head, laughed, then quickly coded my own
reference-counted object implementation, and ended up benchmarking it 15%
faster than Boost's disgusting implementation.
And does this also apply to std::tr1::shared_ptr which is based on
boost impelmentation on many platfroms.

Yes.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBH64FMx9p3GYHlUOIRAlZqAJ4xAdLVbQFcD4u7f++dM+/C/X1K2ACffOh3
z7q9y9KZHR39q9sI8fpUOPU=
=wRH0
-----END PGP SIGNATURE-----
 
A

Alf P. Steinbach

* Sam:
Yannick said:
-=-=-=-=-=-

That's what Boost's shared_ptr does. Although -- much like the rest
of Boost's code -- it is horribly inefficient, and suffers from
certain design flaws -- it is often a good starting point for your
own reference-counted pointers.
[...]

Then don't. Resist the temptation to use Boost as long as you can.
Many years from now, you'll be thankful for that. Define and
implement your own pointer class, and use this as a learning
experience. True, your reference counting implementation will be much
slower than Boost's. Boost does not use mutexes for this, rather it
uses CPU-specific atomic increment/decrement instructions (and gets
it partially wrong, by the way). But this is a good way to learn some
important principles of thread-safe programming.

Care to clarify?

The entire design is completely braindead. A separate object, a tiny
object that holds the reference count, gets instantiated for every
object tracked by shared_ptr. This is insane. Heavy use of shared_ptr is
just going to hammer your heap like there's no tomorrow. This is
horrible design. The correct approach is to store the reference count in
superclass and derive from it. You can't then attach shared_ptr to an
arbitrary libstdc++ object, of course, so you just multiply-inherit from
it. How hard is that?

Then you have the shared_ptr itself, an object that holds two pointers,
one to the original object, the other to the reference count.

Which means that most compilers won't be able to hold a shared_ptr in
CPU registers, like ordinary pointers. Instead, things are going to get
tossed around in memory.

There are also several other design flaws as well. Like the fact that
each time a reference count gets updated, it involves a call to a
library function. Reference count increment/decrement cannot be inlined,
due to a real first class fookup in the templates. After I read through
the class definition, I just shook my head, laughed, then quickly coded
my own reference-counted object implementation, and ended up
benchmarking it 15% faster than Boost's disgusting implementation.

Have you looketh at boost::intrusive_ptr?



Cheers,

- Alf
 
G

gpderetta

Like the rest of Boost's code? That's not my experience.
[...]
Then don't. Resist the temptation to use Boost as long as you can. Many
years from now, you'll be thankful for that. Define and implement your own
pointer class, and use this as a learning experience. True, your reference
counting implementation will be much slower than Boost's. Boost does not use
mutexes for this, rather it uses CPU-specific atomic increment/decrement
instructions (and gets it partially wrong, by the way). But this is a good
way to learn some important principles of thread-safe programming.
Care to clarify?

The entire design is completely braindead. A separate object, a tiny
object that holds the reference count, gets instantiated for every object
tracked by shared_ptr. This is insane. Heavy use of shared_ptr is just going
to hammer your heap like there's no tomorrow.

Use a custom allocator. Or use a platform with a better default
allocator.
Also the draft standard (and I think the next boost release) has
make_shared or something like that that will encapsulate 'new' and
will allocate the reference count and your object with a single
allocation; as a plus it will be binary compatible with the default
separate referece counted shared_ptrs.
This is horrible design. The
correct approach is to store the reference count in superclass and derive
from it. You can't then attach shared_ptr to an arbitrary libstdc++ object,
of course, so you just multiply-inherit from it. How hard is that?

If you do not want to pay for a separate heap allocated reference
count, use boost::intrusive_ptr. Which doesn't force you to derive
from some superclass.

Not requiring changes in user classes is good desing, not horrible
design.

Also note that using a separate reference count gives you weak_ptr
(technically you can have it with an intrusive reference counted smart
pointer, but you would have to defer deallocation of the pointed
object untill the last weak pointer dies, which is Not What You Want
(TM) ).
Then you have the shared_ptr itself, an object that holds two pointers, one
to the original object, the other to the reference count.

Which means that most compilers won't be able to hold a shared_ptr in CPU
registers, like ordinary pointers.

Most compilers? Maybe from the last century. I would be surprised if
any half-recent compiler did that.
Instead, things are going to get tossed
around in memory.

There are also several other design flaws as well. Like the fact that each
time a reference count gets updated, it involves a call to a library
function.

On most plattforms boost shared_ptr uses inline asm or compiler
intrinsics for reference count updates. Calls to platfrom specific
mutex lock/unlock are only a fall back. And you know that, as you said
the same thing in the previous paragraph. Also I would like to know
how it "gets it partially wrong".
Reference count increment/decrement cannot be inlined, due to a
real first class fookup in the templates.

What is that supposed to mean?
After I read through the class
definition, I just shook my head, laughed, then quickly coded my own
reference-counted object implementation, and ended up benchmarking it 15%
faster than Boost's disgusting implementation.

15% is hardly impressive. It is unlikely that shared_ptr operations
are responsible of more than 1% of your application run time (if they
are, you are likely doing something wrong). Improving 15% on that
isn't a great accomplishment nor a good use of your time.

With a custom tailored shared ptr you can do much, much better (as,
I'm sure, Chris Thomasson is going to tell you ;) ). But of course you
lose most of the nice properties of shared_ptr: weak_ptr, support for
incomplete types, custom deleter, aliasing constructor etc, etc... and
of course a peer reviewed, widely tested design (even blessed by the
standard committee).

No.
 
M

Michael Oswald

gpderetta said:
15% is hardly impressive. It is unlikely that shared_ptr operations
are responsible of more than 1% of your application run time (if they
are, you are likely doing something wrong). Improving 15% on that
isn't a great accomplishment nor a good use of your time.

I once made a comparison of boost::shared_ptr and loki::SmartPtr, which uses
a custom allocator (both with non-intrusive reference counting). In debug
mode, loki was slightly slower because of massive policy templates, but
when compiled with -O2 it needed only 50% of the time of boost::shared_ptr.

Of course, their design is rather complementary. Boost::shared_ptr is a more
general and broad approach, while Loki provides possibilities to configure
every tiny bit with policies. Each has it's strength and weaknesses. I have
used both and didn't run into problems so far.

lg,
Michael
 
D

David Schwartz

This just means, that you are making a copy of an object where the
destructor is in progress. That's simply a bug and should be avoided ;-)

best regards,
Torsten

I agree. I would put it simply -- you cannot call 'AddRef' unless you
have an explicit or implicit reference to an object. The 'AddRef'
function is not special, it must be called with a reference just like
every other function.

The puzzle is this -- how did you get a pointer to object to call
'AddRef' on anyway?

I've heard a lot of talk about strong thread safety and the like, but
I have to admit, I don't get it. In order to call 'AddRef' on an
object, you need a pointer to it, and how could you possibly have
gotten that pointer without something that already had a reference?

The existence of a pointer should mean the existence of a reference --
otherwise how can you know that pointer remains valid, whether a call
for AddRef or for any other purpose?

DS
 
A

Alexander Terekhov

David said:
I agree. I would put it simply -- you cannot call 'AddRef' unless you
have an explicit or implicit reference to an object. The 'AddRef'
function is not special, it must be called with a reference just like
every other function.

The puzzle is this -- how did you get a pointer to object to call
'AddRef' on anyway?

I've heard a lot of talk about strong thread safety and the like, but
I have to admit, I don't get it. In order to call 'AddRef' on an
object, you need a pointer to it, and how could you possibly have
gotten that pointer without something that already had a reference?

The existence of a pointer should mean the existence of a reference --
otherwise how can you know that pointer remains valid, whether a call
for AddRef or for any other purpose?

See

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2297.html#atomic

regards,
alexander.
 
T

Tim H

I agree. I would put it simply -- you cannot call 'AddRef' unless you
have an explicit or implicit reference to an object. The 'AddRef'
function is not special, it must be called with a reference just like
every other function.

The puzzle is this -- how did you get a pointer to object to call
'AddRef' on anyway?

I've heard a lot of talk about strong thread safety and the like, but
I have to admit, I don't get it. In order to call 'AddRef' on an
object, you need a pointer to it, and how could you possibly have
gotten that pointer without something that already had a reference?

The existence of a pointer should mean the existence of a reference --
otherwise how can you know that pointer remains valid, whether a call
for AddRef or for any other purpose?

DS

The owner calls AddRef before it hands you a pointer.
 
S

Sam

gpderetta said:
Use a custom allocator. Or use a platform with a better default
allocator.

What a great idea. The implementation is braindead? No problem, just pile on
a heap of workarounds that adds another layer of overhead, that'll fix it!
Also the draft standard (and I think the next boost release) has
make_shared or something like that that will encapsulate 'new' and
will allocate the reference count and your object with a single
allocation; as a plus it will be binary compatible with the default
separate referece counted shared_ptrs.
Ditto!


If you do not want to pay for a separate heap allocated reference
count, use boost::intrusive_ptr. Which doesn't force you to derive
from some superclass.

I'm almost afraid to look.
Not requiring changes in user classes is good desing, not horrible
design.

"User classes" are, by definition, designed by the user. Now that's an
interesting approach: rather than designing a class hierarchy to properly
implement reference-counted objects, nooooooo!!! That makes too much sense.
Instead, forcibly rip out all the reference-counting logic, and use a pile
of duct tape and superglue to attach it to your class hierarchy.

That's a guaranteed recipe for efficiently getting things done, isn't it?
Also note that using a separate reference count gives you weak_ptr
(technically you can have it with an intrusive reference counted smart
pointer, but you would have to defer deallocation of the pointed
object untill the last weak pointer dies, which is Not What You Want
(TM) ).

/me looks at my reference-counted class hierarchy, that supports weak
references and virtual inheritance, refrains from any cockamamie logic like
deferred allocation, and benchmarks way ahead of Boost.

I must be imagining things. Hold on while I pop an aspirin, or two, and
reorient myself.
Most compilers? Maybe from the last century. I would be surprised if
any half-recent compiler did that.

Try gcc 4.1. CPU registers are at a premium. It's very unlikely that the
compiler will put anything into a register that's not a single, native type.
If there's enough registers for everything, then maybe, but this rarely
occurs in practice. A compiler is more likely to keep a pointer or a native
type, rather than clear out a bunch of registers to hold an entire struct.

Using a reference to a single object that's implemented as a single pointer,
versus two pointers, has a far better chance of resulting in faster code.
That's a no-brainer.
On most plattforms boost shared_ptr uses inline asm or compiler
intrinsics for reference count updates. Calls to platfrom specific
mutex lock/unlock are only a fall back. And you know that, as you said
the same thing in the previous paragraph. Also I would like to know
how it "gets it partially wrong".

Except that on x86 it uses a wrong builtin, for which there's no
corresponding CPU instruction, resulting in:

x86_64:
movq %rax, %rdi
movl $-1, %esi
call _ZN9__gnu_cxx18__exchange_and_addEPVii

i386:
movl $-1, 4(%esp)
movl %eax, (%esp)
call _ZN9__gnu_cxx18__exchange_and_addEPVii

That's for every reference decrement. Lovely. Ditto for increment.

Seriously, has anyone actually LOOKED at the code that comes out of Boost,
and checked it for sanity?
What is that supposed to mean?

Wrong gcc builtin!!!!!!

__exchange_and_add() does not get compiled by gcc to an equivalent CPU
instruction on either i386 or x86_64, for the simple reason that there is no
such CPU instruction, and it has to be emulated by libgcc! shared_ptr forces
a function call to libgcc every time you bounce shared_ptr from place A to
place B. Whoever did this, without verifying that the code gets compiled to
an actual atomic x86 instruction, should wear a paper bag on his head for at
least a month. All they had to do is use the correct builtin, and eliminate
all that overhead of setting up a stack frame, invoking a remote library
function, likely resulting in a trashed cacheline, and replace all of that
useless crap with one or two native CPU instructions!

What a joke.
15% is hardly impressive. It is unlikely that shared_ptr operations
are responsible of more than 1% of your application run time (if they
are, you are likely doing something wrong). Improving 15% on that
isn't a great accomplishment nor a good use of your time.

You must not be familiar with developing real world applications for the
financial industry, where every nanosecond of latency counts. Even a 1%
improvement translates to a non-trivial competitive advantage. I'll take it
any day.

Furthermore, this was a rather crude benchmark, which probably ended up
hitting the CPU cache most of the time. A more realistic mock-up will likely
show a greater difference due to doubling of heap allocation activity that
shared_ptr demands, not to mention all the unnecessary calls to libgcc.
Especially since, as a result of deferred allocation by shared_ptr, the
extra allocation won't have a 1:1 relationship with heap allocation of the
referenced objects themselves.
With a custom tailored shared ptr you can do much, much better (as,

I see no need for something heavily customized. Just basic operations,
typical for reference-counted objects, will suffice, as long as you do them
correctly. Really, this is not rocket science.

My impression is that many Boosts classes are what you get in response to a
homework assignment for a junior-level college course on programming
algorithms and concepts, with only a minimum of consideration paid to
practical, real-world scenarios.
I'm sure, Chris Thomasson is going to tell you ;) ). But of course you
lose most of the nice properties of shared_ptr: weak_ptr, support for
incomplete types, custom deleter, aliasing constructor etc, etc... and
of course a peer reviewed, widely tested design (even blessed by the
standard committee).

Gee, then I must've imagined knocking off all this, including weak
references, templated default allocators/deallocators, and supporting
multiply-inherited objects, in just a day or two, last year.

If I was on a committee that apparently accepted and blessed Boost's
shared_ptr, without even bothering to check at what code it generates, and
how, I'd probably be embarassed.

I'm afraid the answer is still yes.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBH7CsPx9p3GYHlUOIRAqigAJsG2SMqpWOx6M1lPUYWxvmyyRJCxwCeJ5zo
ApL4IIvYRMZMR97bQ7VYqHg=
=EAVw
-----END PGP SIGNATURE-----
 
G

gpderetta

What a great idea. The implementation is braindead? No problem, just pile on
a heap of workarounds that adds another layer of overhead, that'll fix it!

If your platform allocator is slow is not shared_ptr fault.
[...]
If you do not want to pay for a separate heap allocated reference
count, use boost::intrusive_ptr. Which doesn't force you to derive
from some superclass.

I'm almost afraid to look.

Nobody is forcing you to.
"User classes" are, by definition, designed by the user. Now that's an
interesting approach: rather than designing a class hierarchy to properly
implement reference-counted objects, nooooooo!!! That makes too much sense.
Instead, forcibly rip out all the reference-counting logic, and use a pile
of duct tape and superglue to attach it to your class hierarchy.

No duct tape nor superglue. Just common C++ idioms.
I prefer to adapt non intrusively my classes to use them with third
party
libraries than design for them.
That's a guaranteed recipe for efficiently getting things done, isn't it?

Actually yes.
/me looks at my reference-counted class hierarchy, that supports weak
references and virtual inheritance, refrains from any cockamamie logic like
deferred allocation, and benchmarks way ahead of Boost.

15% faster doesn't sound that 'way ahead'.
I must be imagining things. Hold on while I pop an aspirin, or two, and
reorient myself.

Certainly not, but I would be curious to see how did you implement
weak pointers.
Try gcc 4.1.

That's what I'm using.
CPU registers are at a premium. It's very unlikely that the
compiler will put anything into a register that's not a single, native type.
If there's enough registers for everything, then maybe, but this rarely
occurs in practice. A compiler is more likely to keep a pointer or a native
type, rather than clear out a bunch of registers to hold an entire struct.

It will reserve registers according to what it is more useful in a
specific place.

Of course small objects means less register spills, but a compiler can
still keep the most used part of an object (in shared pointer case,
the pointer to the held object) and spill the less used part to the
stack (in shared_ptr case, the pointer to the shared count).
[...]
On most plattforms boost shared_ptr uses inline asm or compiler
intrinsics for reference count updates. Calls to platfrom specific
mutex lock/unlock are only a fall back. And you know that, as you said
the same thing in the previous paragraph. Also I would like to know
how it "gets it partially wrong".

Except that on x86 it uses a wrong builtin, for which there's no
corresponding CPU instruction, resulting in:

x86_64:
movq %rax, %rdi
movl $-1, %esi
call _ZN9__gnu_cxx18__exchange_and_addEPVii

i386:
movl $-1, 4(%esp)
movl %eax, (%esp)
call _ZN9__gnu_cxx18__exchange_and_addEPVii

That's for every reference decrement. Lovely. Ditto for increment.

What I see with gcc-4.1.2 and boost 1.33.1 is:

#APP
lock
xadd %eax, 8(%rbx)
#NO_APP

It doesn't use gcc builtins but custom inline assembler.
Maybe you are using a very old boost version?
Wrong gcc builtin!!!!!!

__exchange_and_add() does not get compiled by gcc to an equivalent CPU
instruction on either i386 or x86_64, for the simple reason that there is no
such CPU instruction,

So what? It can be implemented with CAS. It is not unreasonable to
expect that the compiler
would do that for you. Anyways, recent boost releases (at least since
December 2005) do not use the intrinsic.
and it has to be emulated by libgcc! shared_ptr forces
a function call to libgcc every time you bounce shared_ptr from place A to
place B.

Not in the disassembled code I'm looking at.
[...]
15% is hardly impressive. It is unlikely that shared_ptr operations
are responsible of more than 1% of your application run time (if they
are, you are likely doing something wrong). Improving 15% on that
isn't a great accomplishment nor a good use of your time.

You must not be familiar with developing real world applications for the
financial industry, where every nanosecond of latency counts. Even a 1%
improvement translates to a non-trivial competitive advantage. I'll take it
any day.

I develop applications where latency is important, but 1% is hardly
significative. Anyways in your case is more 15% of 1% ;).

If shared_ptr construction is the bottleneck you are likely going to
get a much bigger speed up by not using reference count at all (use a
real garbage collector or maybe an arena allocator).
Furthermore, this was a rather crude benchmark, which probably ended up
hitting the CPU cache most of the time. A more realistic mock-up will likely
show a greater difference due to doubling of heap allocation activity that
shared_ptr demands,

In a more realistic application you hardly do any shared_ptr
construction at all, at least not in the tightest loop of your
program.
not to mention all the unnecessary calls to libgcc.
Especially since, as a result of deferred allocation by shared_ptr, the
extra allocation won't have a 1:1 relationship with heap allocation of the
referenced objects themselves.


I see no need for something heavily customized. Just basic operations,
typical for reference-counted objects, will suffice, as long as you do them
correctly. Really, this is not rocket science.

Then do not complain about performance!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,965
Messages
2,570,148
Members
46,710
Latest member
FredricRen

Latest Threads

Top