How to create a volatile std::string?

J

Juha Nieminen

I have some data inside a struct instance in a namespace and I want to
initialize that data before spawning a thread which will read that data.
In order to make sure that the data will indeed get written before the
other thread reads it (ie. to make sure the compiler doesn't perform
some funny optimizations which will mess things up) I make that struct
instance volatile.

The problem is: The struct has some std::strings inside it. I can't
assign anything to these strings (at least not with gcc). I just get an
error that there's no matching operator=. Without the 'volatile' there's
no error.

Is it simply impossible to assign anything to a volatile std::string?
 
C

Chris Thomasson

I have some data inside a struct instance in a namespace and I want to
initialize that data before spawning a thread which will read that data.
[...]

For POSIX and Windows, as long as the data is prepared _before_ you create
a new thread, everything will be rendered visible and the thread will be
able to see a coherent view; pthread_create(...) acts as a release-membar.
 
S

Sam

Juha said:
I have some data inside a struct instance in a namespace and I want to
initialize that data before spawning a thread which will read that data.
In order to make sure that the data will indeed get written before the
other thread reads it (ie. to make sure the compiler doesn't perform
some funny optimizations which will mess things up) I make that struct
instance volatile.

The problem is: The struct has some std::strings inside it. I can't
assign anything to these strings (at least not with gcc). I just get an
error that there's no matching operator=. Without the 'volatile' there's
no error.

Is it simply impossible to assign anything to a volatile std::string?

I presume that you're using POSIX threads, but the same should apply to
equivalent APIs.

Generally, unless the compiler can prove to itself that a called function
will not access a specific object, the compiler has to emit instructions
that commit any modified content to memory, before invoking the function. I
don't see any possible way for the compiler to prove that to itself, in
pthread_create()s case.

This leaves issues with CPU cache-specific issues. It's a fairly safe bet
that, as part of doing its business, pthread_create() is going to force the
CPU, in some CPU-specific way, to flush out anything that's cached.

So that, pretty much, covers all bases.

In other, less obvious cases, there might be some compiler-specific ways to
control some of these low-level details. gcc, for example, has
_GLIBCXX_READ_MEM_BARRIER and _GLIBCXX_WRITE_MEM_BARRIER.

Finally, even if you're not modifying an object, you should use a mutex to
protect all access to the object. Just because you're invoking some function
that's logically defined as not making any changes to the object, that
doesn't mean that the function won't monkey around with the object's
internal contents. Some particular implementation of std::string::c_str(),
for example, might go through the trouble of explicitly adding a trailing
'\0', before returning the resulting string. So, if there's a possibility
that multiple threads might access the object simultaneously, even for
supposed read-only purposes, the access should be protected by a mutex. And
the act of invoking pthread_mutex_lock/unlock should also take care of
flushing out any modified object content.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBHomrux9p3GYHlUOIRAsIxAJwNDJDCHt/gyoXN3tiSUwX7me3rqACfXwjG
0YLBBX5+uvglb/4CZ6HuC8o=
=9Cmb
-----END PGP SIGNATURE-----
 
J

James Kanze

I have some data inside a struct instance in a namespace and I
want to initialize that data before spawning a thread which
will read that data. In order to make sure that the data will
indeed get written before the other thread reads it (ie. to
make sure the compiler doesn't perform some funny
optimizations which will mess things up) I make that struct
instance volatile.

Which will change exactly nothing. If the compiler is Posix
compliant (or Windows compliant), and claims that it can be used
for multithreaded code, then none of its optimizations will move
writes accross pthread_create (or its Windows equivalent).
Posix forbids it. If the compiler isn't compliant, or doesn't
claim usability in a multithreaded environment (e.g. g++
pre-3.0), then you can't use it, volatile or not. (In g++
2.95.2, for example, the constructor of std::string modified
static variables---without a lock, of course.)
The problem is: The struct has some std::strings inside it. I
can't assign anything to these strings (at least not with
gcc). I just get an error that there's no matching operator=.
Without the 'volatile' there's no error.
Is it simply impossible to assign anything to a volatile
std::string?

Yup. If volatile were to mean anything (it doesn't, really, in
g++, at least on a Sparc or an Intel platform), then the code
generated in operator= would have to be different. In general,
volatile isn't intended to have any real meaning for complex
types, and in practice, at least for the compilers I have access
to(Sun CC, g++, and VC++), it doesn't have any real meaning for
basic types either.
 
J

James Kanze

I presume that you're using POSIX threads, but the same should
apply to equivalent APIs.
Generally, unless the compiler can prove to itself that a
called function will not access a specific object, the
compiler has to emit instructions that commit any modified
content to memory, before invoking the function. I don't see
any possible way for the compiler to prove that to itself, in
pthread_create()s case.

pthread_create has Posix specified semantics. The compiler
knows very well that it doesn't access user defined objects,
except those which it accesses expressedly. On the other hand,
the compiler also knows that it guarantees memory
synchronization, so won't generate code which would negate those
guarantees.
This leaves issues with CPU cache-specific issues. It's a
fairly safe bet that, as part of doing its business,
pthread_create() is going to force the CPU, in some
CPU-specific way, to flush out anything that's cached.

There's a lot more to it than just caches. But the Posix
specification guarantees full memory synchronization with
pthread_create, so the implementation will do whatever is
necessary.
So that, pretty much, covers all bases.
In other, less obvious cases, there might be some
compiler-specific ways to control some of these low-level
details. gcc, for example, has _GLIBCXX_READ_MEM_BARRIER and
_GLIBCXX_WRITE_MEM_BARRIER.
Finally, even if you're not modifying an object, you should
use a mutex to protect all access to the object. Just because
you're invoking some function that's logically defined as not
making any changes to the object, that doesn't mean that the
function won't monkey around with the object's internal
contents.

The Posix standard explicitly says that you can access an object
from multiple threads without a lock as long as no thread
modifies the object.
Some particular implementation of std::string::c_str(), for
example, might go through the trouble of explicitly adding a
trailing '\0', before returning the resulting string.

If the implementation does so in a way that won't work in a
multiple threaded environment, then that implementation is not
Posix conformant. (Note that the last time I looked, the
implementation of std::string in g++ was not Posix conformant,
and that more generally, g++ did not offer the Posix guarantees
for its library. The cases where there is really a problem,
however, are very, very few.)
So, if there's a possibility that multiple threads might
access the object simultaneously, even for supposed read-only
purposes, the access should be protected by a mutex. And the
act of invoking pthread_mutex_lock/unlock should also take
care of flushing out any modified object content.

That's the guarantee that g++ gives for its library. Posix (and
Windows, I think) requires more. (I'm fairly sure that the next
version of the C++ standard will, as well.)
application_pgp-signature_part

(You really should get rid of the above. It's not allowed under
the relevant RFC's.)
 
J

James Kanze

I have some data inside a struct instance in a namespace and
I want to initialize that data before spawning a thread
which will read that data.

For POSIX and Windows, as long as the data is prepared
_before_ you create a new thread, everything will be rendered
visible and the thread will be able to see a coherent view;
pthread_create(...) acts as a release-membar.

Note that he was talking about std::string with g++. The last
time I looked, g++ did NOT give the Posix guarantees for
std::string, and you did need a lock for all accesses, even if
none modified the string.

See http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#5_6,
and in particular, the last paragraph:

All library objects are safe to use in a multithreaded
program as long as each thread carefully locks out
access by any other thread while it uses any object
visible to another thread, i.e., treat library objects
like any other shared resource. In general, this
requirement includes both read and write access to
objects; unless otherwise documented as safe, do not
assume that two threads may access a shared standard
library object at the same time.

Also see bug report 21334.

According to the g++ documentation and guarantees, Juha will
have to acquire a lock each time he accesses his strings,
regardless of the fact that he doesn't modify them. In
practice, given the actual implementation, he will not need the
locks if all of the accesses are through lvalues with const
types, or if he simply takes the precaution of calling the
non-const operator[] once before calling pthread_create.
 
J

Juha Nieminen

James said:
and in practice, at least for the compilers I have access
to(Sun CC, g++, and VC++), it doesn't have any real meaning for
basic types either.

Really? That would be quite strange.

I can think of situations where 'volatile' might make a big
difference. For example something like:

percentage = 0;
for(int i = 1; i <= 100; ++i)
{
// perform some lengthy calculations here, which do not
// use the 'percentage' variable for anything

percentage = i;
}

I assume a compiler could look at that and think "hmm, the
'percentage' variable is never used for anything inside the loop,
so I might as well take it out of the loop and simply assign it its
final value of 100 after the loop has ended". Of course if another
thread was reading that variable at regular intervals and eg. showing
it to the user, things would be messed up. And that's exactly what
'volatile' is for: To tell the compiler that it cannot optimize those
assignments away.

If those compilers completely ignore the 'volatile' keyword, how do
they know when they can optimize things away and when not to?
 
J

James Kanze

Really? That would be quite strange.

That was my reaction as well. The major motivation for
introducing the keyword in C was to support memory mapped I/O.
As currently implemented on all of the Sparc compilers I have
access to, however, it doesn't do even that; I think this is the
case on Intel as well, but I'm less familiar with the Intel
architecture memory model.
I can think of situations where 'volatile' might make a big
difference. For example something like:
percentage = 0;
for(int i = 1; i <= 100; ++i)
{
// perform some lengthy calculations here, which do not
// use the 'percentage' variable for anything
percentage = i;
}
I assume a compiler could look at that and think "hmm, the
'percentage' variable is never used for anything inside the
loop, so I might as well take it out of the loop and simply
assign it its final value of 100 after the loop has ended".

That's the case. Most compilers will generate the machine
instructions to store to percentage if it is declared volatile,
but that doesn't help much.
Of course if another thread was reading that variable at
regular intervals and eg. showing it to the user, things would
be messed up. And that's exactly what 'volatile' is for: To
tell the compiler that it cannot optimize those assignments
away.

But most compilers still only generate a store instruction of
some kind, which on modern hardware doesn't guarantee that the
value can be read in another thread. On a Sparc, you need an
additional membar instruction, and on an Intel, at the very
least, that the instruction be prefixed with a lock prefix. The
compilers I've seen do neither.

And of course, at least under Posix and Windows, if you have the
above code in one thread, accessing percentage in any other
thread causes undefined behavior. Volatile or not.
If those compilers completely ignore the 'volatile' keyword,
how do they know when they can optimize things away and when
not to?

They don't completely ignore it, but they don't do anything
useful with it either. All it does is turn off some
optimizations at the compiler level; that doesn't do anything to
ensure that the values propagate or are refreshed at the
hardware level.
 
J

Juha Nieminen

James said:
But most compilers still only generate a store instruction of
some kind, which on modern hardware doesn't guarantee that the
value can be read in another thread.

If I'm not completely mistaken, processor caches are designed so that
if one processor modifies a value in its own local cache, the cache
logic will tell the other cache logics of the other processors that that
section of memory has been invalidated by a change. The value is then
propagated to any other processor which needs to read that section of
memory (I don't know exactly how, but probably by flushing that portion
of memory in the local cache to the highest-level cache which is common
to all processors or, if there isn't such cache, to RAM, from which the
other caches can read the value).

I even remember reading somewhere that at least Intel assures that
each such modification is an atomic operation and thus no two processors
will ever modify the same memory location exactly at the same time,
causing a conflict. I might be mistaken about this, though. (No, I have
no idea how they manage to do it if it's true.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,183
Messages
2,570,968
Members
47,518
Latest member
TobiasAxf

Latest Threads

Top