SSO

J

James Kanze

If you are going to copy a string from one thread to another,
I believe you will need a locking mechanism regardless of how
the string has been implemented. I don't see how SSO would
avoid the need. (Unless member arrays are always assigned
atomically, which I don't think is the case.)

You need the lock when you pass the string (or a copy) from one
thread to the other. You don't need any sort of locking when
accessing the string within a thread. With CoW, the
implementation needs locking for each manipulation within the
thread, since the actual implementation data might be shared.
(That's somewhat of a simplification, and implemented correctly,
there are some accesses which don't need the lock. But anything
that reads or updates the use count needs a lock.)
 
J

James Kanze

Could you please elaborate more on *why*? I just can't see
that happening.
Let's assume we have two threads running, A and B. Thread A
owns a string which it wants to pass to thread B. How does it
do that?

It copies the string, and passes a copy to B. There's no need
to lock for the copy, unless the copy is part of a larger
transfer mechanism; it's the transfer mechanism which will
require the lock. More sigificantly, once B has the copy, there
is no need for a lock in order to use it, since nothing is
shared with the original. If CoW is being used, then some types
of access (most, given the semantics of std::string) must be
locked. Typically, internally in the implementation of
std::string, since the sharing is supposed to be transparent to
the client code.
I assume that thread A sees some string that is owned by B,
and then goes and assigns it to that string. Let's assume the
string uses a deep-copy.

If more than one thread is accessing the same string object,
and any thread is modifying it, all threads need to synchronize.
Nothing changes here---that's the same rule as for int, and
affects CoW and other implementations equally.

If two threads are accessing different string objects, however,
where one was originally a copy of the other, you don't expect
to have to lock. With CoW, however, the two string objects
actually share a single internal object containing the
representation, so some sort of locking is required in the
string class itself.
If *during* that deep-copying thread B starts reading the
string, it will get incorrect data (because it has only been
partially copied).
The other possibility I can think of is that thread B makes
the copy. In other words, it sees the string owned by thread
A, and assigns it to its own string.
Again: If thread A goes and modifies its string while this
deep-copying is happening, the data which is copied to B will
become corrupted.
I just can't see how deep-copying solves any synchronization
problems.

It means that all cases where a single object is accessed by
more than one thread, and modified by at least one thread, are
visible to the client (who is responsible for locking in such
cases). With CoW, there is a hidden object, which may be
accessed from different threads without the client code seeing
it, so the string class itself must assume the responsibility
for locking.
The only way to copy the string from A to B safely is to lock
access to it for the duration of the copying (regardless of
how the copying is done). I can't see any other way.

That's not the issue. The issue is what happens once you've
made the copy, when you're using one string object in thread A,
and a completely different string object in B.
 
J

Juha Nieminen

James said:
It copies the string, and passes a copy to B.

How is passing the copy to another thread typically done? How does B
"know" when it can safely read that copy?

Anyways, even if using CoW (at least in some of your own classes,
where you have full knowledge and control of how it works), you could
simply *force* a deep-copy to be done when needed. This is rather
trivial to do with CoW objects. It's not like (unlocked) CoW would be
fully incompatible with multi-threading.

(I wonder if it would be possible to implement some kind of automation
mechanism where each object somehow knows which thread it belongs to,
and performs deep-copying automatically if it's transferred to another
thread...)
 
B

Bo Persson

Juha said:
Most std::sort() implementations use introsort, which in most
situations triggers an insertion sort, which copies elements,
doesn't swap them. I checked the implementation used in gcc, and it
indeed uses straightforward assignments rather than any swapping.

That's an implementation detail.
Also in quicksort there is a pivot value used to partition the
array. How do you keep this pivot value with swapping only?

You can swap it into the right partition, and then keep a reference to
it.
I checked the implementation in gcc, and it seems that it uses the
median-of-three method for choosing the pivot value, and then
passes it to the partitioning function by value, hence triggering a
copy.

Passing by value seems expensive for non-COW types. Creating extra
copies of some objects seems a bit expensive, when sort is supposed to
swap them.


Bo Persson
 
J

Jerry Coffin

How is passing the copy to another thread typically done? How does B
"know" when it can safely read that copy?

Typically, the data will be passed via a queue. The queue will
typically have two mutexes: one that's used by producers that says
whether there's a slot free in the queue, and one that's used by
consumers that says where there's at least one item available.

B will wait on the latter mutex, and when it's signaled, it'll read
the next item from the queue, and process it accordingly.
Anyways, even if using CoW (at least in some of your own classes,
where you have full knowledge and control of how it works), you could
simply *force* a deep-copy to be done when needed. This is rather
trivial to do with CoW objects. It's not like (unlocked) CoW would be
fully incompatible with multi-threading.

(I wonder if it would be possible to implement some kind of automation
mechanism where each object somehow knows which thread it belongs to,
and performs deep-copying automatically if it's transferred to another
thread...)

Back when Herb Sutter wrote a GoTW on this subject, I had roughly the
same thought, and tried to implement it. It wasn't very successful.

The problem I ran into was pretty simple: at least if you try to
automate this, you can't really figure out that the object is going
to be transferred to another thread -- you can only detect _after_
the fact that it already has been transferred to the new thread. By
then, you already have shared access to the object, so to get any
kind of sensible results, you need to synchronize access just like
you would otherwise.

Even so, I do think that this might be a fairly useful technique
under some circumstances. Even though the string object still needs
synchronization to support shared access, it still fixes (or at least
ameliorates) one problem that can arise: if an object is shared
between a lot of threads, contention over access to the shared data
can become a serious problem. Using this technique, each thread has
to synchronize access to the shared object, but contention is still
reduced because a child thread only ever accesses the shared object
one time, to make its own copy of the data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,414
Latest member
GayleWedel

Latest Threads

Top