why boost:shared_ptr so slower?

J

joseph cook

Even if it's three allocations on initial creation, that isn't changed
when you make additional copies of that object. You only have three heap
allocations, period. Regardless of how many intermediate objects are
sharing it.

OK, my intent isn't to beat up boost::shared_ptr because I use it
frequently, and find it very useful. Unfortunately, I could use it a
lot more if it didn't utilize heap allocations when being constructed,
or I could control those heap allocations better. Probably I can and
just am not aware of how.

I was mistaken about the two heap allocations. Perhaps this was true
in an older boost version?

What I meant about the intermediate objects is this. If I had a
singleton as follows, which self-destructs as soon as all users of it
go out of scope:

template<typename T>
class Single
{
static shared_ptr<T> instance()
{
if(!m_instance)
{
m_instance = shared_ptr<T>(new T);
return m_instance;
}
}
static shared_ptr<T> m_instance;
};

Every time I call "instance()", a shared_ptr is being created (slow
heap allocation).

If instead I had used plain pointers:
template<typename T>
class Single
{
static T* instance()
{
if(!m_instance)
{
m_instance = new T;
}
}
static T* m_instance; //initialize to 0
};

I would have the initial allocation, but could call "instance" an
unlimited amount of times without additional allocations. (I'm
obviously not getting the reference counting either here)

Joe Cook
 
J

James Kanze

And that, in turn, depends heavily on the heap manager that
you're using.

And a lot of other things. An invasive reference counted
pointer will generally be faster, but more because copying it is
only copying one pointer, rather than two, than because of the
allocations.

IIRC, at one point, very early in the development of
boost::shared_ptr, the author actually ran some benchmarks. And
found that the extra allocation wasn't very expensive; not
expensive enough, for example, to justify the added complexity
of implementing its own memory manager. (For various reasons,
most of the usual heap managers today are very efficient when it
comes to allocating a lot of small sized objects of the same
size.) In other words, the authors actually treated the project
as if it were a professional development, and behaved
professionally.
 
J

James Kanze

[...]
I would have expected them to pool these allocations somehow.

IIRC, the authors actually experimented with several
implementations, including one which did use a pooled allocator.
And found that pooling the allocations didn't measurably change
performance.

But of course, any time you do anything professionally, you'll
find a lot of amateurs ready to criticize, because you haven't
fallen victime to their prejudices.
 
J

James Kanze

First you claim that it's not thread-safe, and then you
point to the documentation which says that it is.
[/QUOTE]
No, I pointed to a document that says it is "as thread safe as
..." not that it is "thread safe".
No, you are fundamentally missing my point. Obviously I know
the level of "thread safety" that boost::smart_ptr provides
(since I posted the reference) so you should have at least
thought for a a moment about what my point is and at least
recognize the point (whether you agree or not) before
launching into an elaboration of what the document already
explains.
My point (since you missed it the first time) is that "as
thread safe as a built-in type" is not, in my opinion, what
one commonly thinks of when someone says a construct is
"thread safe".

So what do you think one "commonly thinks of" when one says a
construct is "thread safe". As far as I can tell,
boost::smart_ptr meets the definitions of Posix, and offers
the maximum thread safety which one can reasonably expect.
So when you posted this
it can easily be misunderstood or deceive someone without the
qualification "as a built-in type" which the boost document is
careful to include. To put it simply
"as thread-safe as a built-in type" != "thread safe"

In what way?

The basic definition of "thread safe", at least as I see it used
by the experts in the field, more or less corresponds to the
Posix definition: there is no problem using different instances
of the object in different threads, regardless of what you're
doing with them, there is no problem using the same instance in
different threads as long as none of the threads are modifying
the instance (from a logical point of view). Anything else
requires external synchronization. (I've not analysed
shared_ptr in detail, and it's possible that it doesn't meet
these guarantees. The implementation I have access to uses
__gnu_cxx::__atomic_add_dispatch, for example, and the last time
I looked, the implementation of that function could result in a
deadlock on 32 bit Sparcs under Solaris.)
 
C

Chris M. Thomasson

Alf P. Steinbach said:
* Pete Becker:

I'm with Noah in principle, but with you in practice. :) I've never felt
the need for doing const& for shared_ptr. It's just more to write for no
particular (significant) gain,

In a multiple processor system, avoiding mutations to the reference count
does give you significant gains indeed. Why should I suffer the penalty of a
memory barrier and an atomic RMW operation when I don't have to? The memory
barrier will have a negative effect in the form of blowing away cached data
and the atomic RMW might need to lock the damn bus, or do a cache snoop.
This has MAJOR negative effects. Passing smart pointers by reference is the
way to go. You only need to increment the counter when your passing the
pointer to another thread. Which procedure is going to be more efficient:


void func1(shared_ptr<foo> f) {

}


void func2(shared_ptr<foo>& f) {

}


?


Keep in mind that `func1' is going to have an atomic RMW + membar to
increment the count, and a membar + atomic RMW when the `f' goes out of
scope... That's four expensive operations, versus `func2' which has none of
that.



and it may also convey a misleading impression that this shared_ptr's
reference count or deleter can't be modified, but in principle I think
enforcing on oneself a convention of doing const& for all but basic types
could remove some inefficiencies -- it's just that it's not that
important.

In the context of avoiding membars and atomic RMW's, IMO, it's EXTREMELY
important!


Please read the following which explains why traditional read/write mutex
implementations can be expensive:

http://groups.google.com/group/comp.programming.threads/msg/fdc665e616176dce





;^o
 
C

Christof Donat

Hi,
There is one allocation off the heap when you create an object. There is
one allocation off the heap when you create a boost::shared_ptr object
to manage that object.

No, the shared pointer object will usually be allocated on the stack.
boost::shared_ptr will allocate an intermediate object with the raw pointer
and the refference counter, that is referenced by all instances of
boost::shared_ptr that point to the same object. That is the second
allocation.

The performance could be improved by using a pool of these intermediate
objects. On the other hand you buy that with more memory consumption, since
the pool will always have to be there.

Christof
 
C

Christof Donat

Hi,
This problem has been solved already. The reference count simply needs to
be a virtual superclass, and all reference-counted objects are derived
from the superclass.


Not a showstopper. Can be used with existing object types, and basic
types, by subclassing them.

boost::shared_ptr<std::string> sp = new std::string();

versus

boost::shared_ptr<std::string> sp =
new shared_ptrWrapper<std::string>(new std::string());

Ah, here we have the two allocations again. Just that the user has to do it
himself.

For most people std:: is not the only library they use. All of those
libraries would either have to be changed or their users would need to use
the second allocation and do that himself.

people will start to use the prtWrapper all the time, because it makes the
use of their classes easier in cases where no shared_ptr is needed.
* The shared pointer becomes just a native pointer.

No, it has to have some operators to make sure, the memory will be released
as soon as the reference count reaches 0.
* The additional memory allocation gets eliminated.

Not really, see above. The second allocation has just moved from the library
into the users code.

Christof
 
C

Chris M. Thomasson

Pete Becker said:
Um, there's another qualifier that got left out:

In a multiple processor system, *when shared_ptr objects
are shared between threads*, avoiding mutations to the
reference count does give you significant gains indeed.

Please correct me if I am wrong, but I believe that shared_ptr will be using
membars and atomic RMW on multi-processor systems regardless if they are
shared between threads or not. This can still have negative effects. One
simple example, think of false sharing in the reference count. I do not
believe that Boost eliminates false sharing between reference counts. In
other words, I don't think that Boost implementation pads the reference
count object to the size of a L2 cache line, and I don't think they align
said refcount object in memory on a L2 cache line boundary.



And therein lies a great debate about whether shared_ptr should be
sharable between threads.

In other words:

therein lies a great debate about whether shared_ptr should be thread-safe
at all right? Or am I totally misunderstanding you?
 
C

Chris M. Thomasson

Juha Nieminen said:
First you claim that it's not thread-safe, and then you point to the
documentation which says that it is.


You are confusing two different types of thread-safety.
[...]

FWIW, Boost shared_ptr only provides basic/normal thread-safety. It does NOT
provide strong thread-safety in any way shape or form. Read the entire
following thread:


http://groups.google.com/group/comp..._frm/thread/e5167941d32340c6/1b2e1c98fa9ad7c7


You simply cannot use shared_ptr in a scenario which demands strong thread
safety level. For example, this will NOT work:
_______________________________________________________________
static shared_ptr<foo> global_foo;


void writer_threads() {
for (;;) {
shared_ptr<foo> local_foo(new foo);
global_foo = local_foo;
}
}


void reader_threads() {
for (;;) {
shared_ptr<foo> local_foo(global_foo);
local_foo->read_only_operation()();
}
}
_______________________________________________________________





If you want to use shared_ptr this way, you need a mutex:
_______________________________________________________________
static shared_ptr<foo> global_foo;


void writer_threads() {
for (;;) {
shared_ptr<foo> local_foo(new foo);
// lock
global_foo = local_foo;
// unlock
}
}


void reader_threads() {
for (;;) {
// lock
shared_ptr<foo> local_foo(global_foo);
// unlock
local_foo->read_only_operation()();
}
}
_______________________________________________________________
 
C

Christof Donat

Hi,
I agree with everything in the preceding paragraph except the first
word. Unfortunately, you snipped the rest of what I said, which makes it
clear that I wasn't saying that the shared_ptr object was on the heap,
but that it allocated memory on the heap.

Sorry, actually I have not read your complete post, because after the
beginning I was sure, that you have a misconception there. I retract my first
word ;-)

Christof
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Juha Nieminen said:
First you claim that it's not thread-safe, and then you point to the
documentation which says that it is.


You are confusing two different types of thread-safety.
[...]

FWIW, Boost shared_ptr only provides basic/normal thread-safety. It does
NOT provide strong thread-safety in any way shape or form. Read the entire
following thread:


http://groups.google.com/group/comp..._frm/thread/e5167941d32340c6/1b2e1c98fa9ad7c7


You simply cannot use shared_ptr in a scenario which demands strong thread
safety level. For example, this will NOT work:
_______________________________________________________________
static shared_ptr<foo> global_foo;


void writer_threads() {
for (;;) {
shared_ptr<foo> local_foo(new foo);
global_foo = local_foo;
}
}


void reader_threads() {
for (;;) {
shared_ptr<foo> local_foo(global_foo);
local_foo->read_only_operation()();
}
}
_______________________________________________________________

[...]

Here is a pre-alpha implementation of my experimental strongly thread-safe
reference counting algorithm:

http://webpages.charter.net/appcore/vzoom/refcount


Here is some very crude documentation of the C API:

http://webpages.charter.net/appcore/vzoom/refcount/doc




I proposed this algorithm for Boost a while ago, but I never really followed
up on it and did not make a formal proposal. Here are the relevant posts:

http://search.gmane.org/?query=&group=gmane.comp.lib.boost.devel&author=chris thomasson



Here is a fairly detailed description of exactly how the "strong
thread-safety aspect" of the algorithm works:

http://article.gmane.org/gmane.comp.lib.boost.devel/149803



Enjoy!
 
J

Juha Nieminen

Keith said:
No, you are fundamentally missing my point.

And you missed mine.

My original post was in reply to someone complaining that
boost::shared_ptr is slow. I asked if that person would have a better
solution for a reference-counting smart pointer which would have the
same features as boost::shared_ptr.

One of these features is that it's thread-safe: You can have
shared_ptr instances in different threads pointing to the same object,
and inside that thread you can copy, assign and destroy these instances
without malfunction (in other words, even if the other thread is doing
that as well).

This is in contrast to most naive smart pointer implementations out
there which are *not* thread safe and cannot be used to share an object
among several threads. The reason for this is simple: These naive smart
pointers don't lock accesses to the reference counter and thus will
invariably malfunction when used with objects shared among threads.
 
J

Juha Nieminen

Chris said:
FWIW, Boost shared_ptr only provides basic/normal thread-safety. It does
NOT provide strong thread-safety in any way shape or form.

I don't even understand the difference between "basic/normal" and
"strong".

boost::shared_ptr is thread-safe because instance of it can be used in
different threads even if these instances point to the same object.
boost::shared_ptr doesn't malfunction if two threads manipulate these
instances at the same time.

You seem to be talking about thread-safety of the shared object
itself, rather than the thread-safety of boost::shared_ptr. That's a
completely different issue. Whether the shared object is thread-safe is,
of course, up to the object. Why should boost::shared_ptr have anything
to do with that? That's exactly as silly as saying that if the shared
object contains a pointer to dynamically allocated memory, it's
boost::shared_ptr's duty to free that memory as well, rather than the
object's duty.
 
J

Juha Nieminen

Sam said:
Not a showstopper. Can be used with existing object types, and basic
types, by subclassing them.

Care to show me how you subclass a basic type?

Care to show me how you make an intrusive smart pointer work with
incomplete types?

For example, when the intrusive smart pointer needs to increment the
reference counter, how do you do it when the object type is incomplete?
* The shared pointer becomes just a native pointer.

It's not like non-intrusive reference-counting smart pointers cannot
be made the size of one single pointer.
 
C

Chris M. Thomasson

Juha Nieminen said:
I don't even understand the difference between "basic/normal" and
"strong".

Please read here:

http://www.boost.org/doc/libs/1_39_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety


Please take note of the first word in the following sentence:

"Different shared_ptr instances can be "written to" (accessed using mutable
operations such as operator= or reset) simultaneously by multiple threads
(even when these instances are copies, and share the same reference count
underneath.)"



See? The SAME instance of a shared pointer CANNOT be written to by multiple
threads. The same instance of a shared pointer cannot be written to by
thread A, and simultaneously read from by thread B. This describes
limitation of basic/normal thread-safety level. On the other hand, this is
perfectly legal with a strong thread-safety level.





Take note of example 5:

//--- Example 5 ---

// thread A
p3.reset(new int(1));

// thread B
p3.reset(new int(2)); // undefined, multiple writes




If shared_ptr was strongly thread-safe, then example 5 works fine. In fact,
if it honored strong thread-safety, then examples 3-5 would also work fine.
I dare you to try and get the following example pseudo-code to work without
using mutual exclusion:
_______________________________________________________________
static shared_ptr<foo> global_foo;


void writer_threads() {
for (;;) {
shared_ptr<foo> local_foo(new foo);
global_foo = local_foo;
}
}


void reader_threads() {
for (;;) {
shared_ptr<foo> local_foo(global_foo);
local_foo->read_only_operation()();
}
}
_______________________________________________________________






boost::shared_ptr is thread-safe because instance of it can be used in
different threads even if these instances point to the same object.
boost::shared_ptr doesn't malfunction if two threads manipulate these
instances at the same time.

NO! See, a single instance of boost::shared_ptr CANNOT be simultaneously
written to (e.g., operator = or reset()) by more than one thread. A single
instance of boost::shared_ptr CANNOT be written to and read from by more
than one thread simultaneously. This is due tot he fact that shared_ptr is
not strongly thread-safe.



You seem to be talking about thread-safety of the shared object
itself, rather than the thread-safety of boost::shared_ptr.

I am not writing about that in any way, shape or form.



That's a completely different issue.
Agreed.




Whether the shared object is thread-safe is,
of course, up to the object. Why should boost::shared_ptr have anything
to do with that?

It should not have anything to do with that; period.



That's exactly as silly as saying that if the shared
object contains a pointer to dynamically allocated memory, it's
boost::shared_ptr's duty to free that memory as well, rather than the
object's duty.

Agreed, that would be silly!
 
C

Chris M. Thomasson

Juha Nieminen said:
Care to show me how you subclass a basic type?


Care to show me how you make an intrusive smart pointer work with
incomplete types?

For example, when the intrusive smart pointer needs to increment the
reference counter, how do you do it when the object type is incomplete?


It's not like non-intrusive reference-counting smart pointers cannot
be made the size of one single pointer.

Do you mean that the overhead of a non-intrusive reference-counting smart
pointer will be sizeof(void*)? Are you stealing some bits in the pointer to
keep the reference count or something? I must be totally misunderstanding
you. I can think of a way in which the counters are located in a static
array and objects are queued for destruction when that count goes to zero.
But that might mean non-deterministic destruction properties.
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Please correct me if I am wrong, but I believe that shared_ptr will be
using membars and atomic RMW on multi-processor systems regardless if they
are shared between threads or not.

`BOOST_SP_DISABLE_THREADS' aside for a moment.


[...]
 
H

Howard Hinnant

Replying to no one in particular:

For those concerned about needing two heap allocations to create a
shared_ptr, you might want to check out boost:: (soon to be std::)
make_shared:

struct A
{
A(int i, char c);
...
};

shared_ptr<A> p = make_shared<A>(i, c);

The A and the refcount are allocated within one heap allocation (just
like an intrusive implementation). If you still need more control
than that (e.g. allocate out of static buffer) there is:

shared_ptr<A> p = allocate_shared<A>(my_allocator<A>(), i, c);

The memory footprint is still larger than most homegrown reference
counted implementations, in order to support features such as weak_ptr
(to break cyclic ownership). But like earlier posters have implied;
shared_ptr is a good tool for when you need to share ownership. It is
not a "silver bullet" one should blindly substitute every raw pointer
for. There is no substitute for thoughtful design. Nor would I even
call shared_ptr the last reference counted pointer that should ever be
crafted. For special purpose applications, one can always create a
custom tool that will outperform a tool built for general use.

shared_ptr has (at least) 2 things going for it:

1. It is soon to be standard (std::shared_ptr). It will be the *one*
reference counted pointer people will become familiar with to share
ownership, especially across shared library boundaries. I.e. the fact
that it only has one flavor (one template parameter) is a strength.
You can customize the innards of your shared_ptr<A> however you like
(special allocators, special deallocators, allocate in one go, or in
two), but all your clients sees is shared_ptr<A>. They don't have to
know how you created this animal, or even how to destroy it. It will
be to your benefit to know how to construct and use one.

2. Reference counted pointers are notoriously difficult to get
right. They can look right for years before it gets used in a way
that will bite you. shared_ptr has this experience behind it. It is
fully vetted, and ready for you to use. I'm writing this as someone
who has written slightly buggy reference counted pointers in the past,
and as someone who has implemented the full C++0x shared_ptr spec.

All that being said, I should stress that I find unique-ownership
smart pointers just as useful (if not more so) than shared-ownership
smart pointers. They are a pain to implement and use correctly in C++
today, but that will change in C++0x (shameless plug for
std::unique_ptr - in a C++0x context - see Ion's boost intrusive
library for a high quality preview you can use today).

-Howard
 
R

Richard

[Please do not mail me a copy of your followup]

Thanks for that very informative reply, Howard!
 
J

Juha Nieminen

Chris said:
Do you mean that the overhead of a non-intrusive reference-counting
smart pointer will be sizeof(void*)?

I'm saying that sizeof(NonintrusiveSmartPointer) == sizeof(void*),
which is what the original poster seemed to be concerned about.

Of course the reference counter will still take its own memory, but it
will not make instances of the non-intrusive smart pointer any larger
than one single raw pointer.
Are you stealing some bits in the
pointer to keep the reference count or something?

No.
I must be totally
misunderstanding you. I can think of a way in which the counters are
located in a static array and objects are queued for destruction when
that count goes to zero. But that might mean non-deterministic
destruction properties.

The solution is much simpler than that. Rather than storing the
pointer to the managed object in the smart pointer, store it alongside
the reference counter. The smart pointer will only have a pointer to
this structure as member.

Yes, there will be an additional level of indirection when accessing
the managed object, but at least the size of the smart pointer will be
the size of one raw pointer. This might in some cases be advantageous if
the level of sharing is significant.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,415
Latest member
PeggyCramp

Latest Threads

Top