single producer, single consumer

J

James Kanze

On Oct 6, 6:23 pm, goodfella <[email protected]> wrote:
I have read the following Wikipedia article about cache
coherrency:

and it seems that cache coherency guarantees the following
behavior on reads and writes of memory in the same location:
A read made by processor P1 to location X that follows a write
by another processor P2 to X must return the written value
made by P2 if no other writes to X made by any processor occur
between the two accesses. This condition defines the concept
of coherent view of memory. If processors can read the same
old value after the write made by P2, we can say that the
memory is incoherent.

The article you site is rather vague: what does it mean by "the
behavior of reads and writes to the same memory location"? Most
modern processors use not only cache in the classical sense, but
also pipelines. And none of the processors I'm familiar with
(Intel and Sparc) guarantee any sort of consistency between
processors with regards to what is in the read and write
pipelines; that would defeat the whole purpose of the pipelines.
You need special instructions (fence or memory barriers) to
guaranteed synchronization.
The above statements about cache coherency are enough to show
that my code works on systems that guarantee cache coherency.

Not knowing exactly what you mean by cache coherency, I can't
say. The code you posted does not work on Intel or Sparc
architectures.
 
J

James Kanze

[...]
So in general, the writes made by the producer are not
guaranteed to propagate to the consumer, and writes made by
the consumer are not guaranteed to propagate to the producer.

They are guaranteed to propagate sooner or later, but not
necessarily in the same order they occur in the code. Thus, for
example, when you write the date, then update the pointer,
the pointer update may be seen by the consumer thread before the
data write.
A simpler demonstration of this problem is an example of how
to use the volatile keyword.

Be aware that volatile doesn't really have a defined meaning,
and that most compilers don't do anything special with it, which
makes it worthless for communicating between threads (and at
least on a Sparc, for memory mapped IO---the reason it was
originally introduced into the language).
You have a variable which gets set to 1 by one thread, called
the writer, while another thread, called the reader, is in a
loop waiting for the variable to be equal to one. The use of
volatile insures that the compiler will not optimize away
checking the value of the variable because the value of the
variable may be changed in ways unknown by the compiler. So
in this example, the write performed by the writer thread is
not guaranteed to propagate to the reader thread. Or in other
words, the reader thread will never exit because it will never
see the variable equal to 1.

It will exit sooner or later, but... Why was the reader thread
waiting? If it was waiting for some other data to be written,
1) volatile on the flag doesn't ensure that the compiler hasn't
moved the writes to the other data, and 2) even if the compiler
hasn't moved them, the other writes may show up later than the
write to the flag in the consumer thread.
 
R

REH

So in general, the writes made by the producer are not guaranteed to
propagate to the consumer, and writes made by the consumer are not
guaranteed to propagate to the producer.  A simpler demonstration of
this problem is an example of how to use the volatile keyword.  You
have a variable which gets set to 1 by one thread, called the writer,
while another thread, called the reader, is in a loop waiting for the
variable to be equal to one.  The use of volatile insures that the
compiler will not optimize away checking the value of the variable
because the value of the variable may be changed in ways unknown by
the compiler.  So in this example, the write performed by the writer
thread is not guaranteed to propagate to the reader thread.  Or in
other words, the reader thread will never exit because it will never
see the variable equal to 1.

Volatile simply isn't good enough. Volatile is necessary for atomic
operations, but it does not provide strong enough guarantees. Assume
for a moment that your logic is correct: that your pointers are
accessed atomically and their values are either NULL or correct. There
are still things that can go wrong:

1) Because of instruction reordering either be the compiler or by the
processor, the pointer could get updated before the data. The read
thread could then read old data, garbage, a mix, etc. Now, because you
are updating the data before calling the function that updates the
pointer, the chances of this are slim but not 0.

2) Even if the pointer is always correct, what about the data. How are
you guaranteeing that you are not reading old, cached data?

3) Even though your code "works" now, what happens months or years
down the road when you or your successor need to port it to a new
compiler, OS, processor, etc.? It can take a lot of time to track down
the subtle errors that will manifest themselves. The little time you
save by taking short-cuts will be completely overwhelmed by the time
spent debugging later. I hear this all the time: "but it will never
have to be ported." At work, I have yet to hear that and it be true.
Tools and hardware go end-of-life. Requirements change. You can't
predict the future.

4) Even if the threads will eventually "see" the pointer update, why
introduce this non-determinism? It is unacceptable in a real-time
system, and is simply a waste of the processors time in any case.

REH
 
J

Joshua Maurice

So in general, the writes made by the producer are not guaranteed to
propagate to the consumer, and writes made by the consumer are not
guaranteed to propagate to the producer.  A simpler demonstration of
this problem is an example of how to use the volatile keyword.  You
have a variable which gets set to 1 by one thread, called the writer,
while another thread, called the reader, is in a loop waiting for the
variable to be equal to one.  The use of volatile insures that the
compiler will not optimize away checking the value of the variable
because the value of the variable may be changed in ways unknown by
the compiler.  So in this example, the write performed by the writer
thread is not guaranteed to propagate to the reader thread.  Or in
other words, the reader thread will never exit because it will never
see the variable equal to 1.

goodfella, please actually read C++ And The Perils Of Double Checked
Locking.
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
It explains a lot of this most clearly.

A lot of people in here are still describing the problems in terms of
reordering. This is a bad way of thinking about it. Let me again
emphasize what I wrote earlier: Two writes made by one thread may be
seen to happen in different orders by two reader threads, on the exact
same execution, on the exact same writes. One cannot correctly reason
about threading by examining the possible interleavings of
instructions. It does not work when discussing portable threading.

Some people, yourself included goodfella, have noted that without
proper synchronization, yes it may take a long time for a write to be
visible to other threads. However, it's also possible that the write
will \never\ be visible without proper synchronization.

Just to add to what others have said, correct use of volatile is only
one of the following three cases:
1- Memory Mapped IO, like writing device drivers.
2- Signal handlers.
3- setjump longjump.
If your code has volatile in a context of not one of those 3, your use
of volatile is superfluous and/or wrong (or not portable).

Volatile is not defined as a threading construct by the C standard,
the C++ standard, the POSIX standard, nor basically any standard apart
from some newer versions of MSVC. (I don't have an exhaustive list of
standards in front of me, so maybe I'm missing some. The point is,
almost none give any threading guarantees.) There is no guarantee that
there is a global order of volatile reads and writes. The meaning of
volatile itself is implementation defined. Its intent is to allow
MMIO, but a implementation is free to ignore the volatile keyword
alltogether, as long as it's documented that this is the behavior.
Most compilers do not actually guarantee a global order on volatile
reads and writes across threads. !! Two different threads may see two
volatile writes done by a third happen in different orders, even if
the reads are also volatile !!
 
M

Michael Doubez

So in general, the writes made by the producer are not guaranteed to
propagate to the consumer, and writes made by the consumer are not
guaranteed to propagate to the producer.  A simpler demonstration of
this problem is an example of how to use the volatile keyword.  You
have a variable which gets set to 1 by one thread, called the writer,
while another thread, called the reader, is in a loop waiting for the
variable to be equal to one.  The use of volatile insures that the
compiler will not optimize away checking the value of the variable
because the value of the variable may be changed in ways unknown by
the compiler.  So in this example, the write performed by the writer
thread is not guaranteed to propagate to the reader thread.  Or in
other words, the reader thread will never exit because it will never
see the variable equal to 1.

goodfella, please actually read C++ And The Perils Of Double Checked
Locking.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
It explains a lot of this most clearly.

A lot of people in here are still describing the problems in terms of
reordering. This is a bad way of thinking about it.
[snip]
If your code has volatile in a context of not one of those 3, your use
of volatile is superfluous and/or wrong (or not portable).
[snip]

It is not so much reordering the concurrent execution of code than the
reordering of code generated i.e. without memory barrier (or some
language support) the compiler is free to remove/add or displace
operations as an optimisation and the current standard is silent on
the subject.

You might want to look up the link I posted and some of Hans Boehm
papers on the subject:
http://www.hpl.hp.com/personal/Hans_Boehm/

In particular, "Threads Cannot be Implemented as a Library" on the
correctness issues:
http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf
 
J

James Kanze

On Oct 8, 12:43 am, goodfella <[email protected]> wrote:

Just a couple of nits...

[...]
goodfella, please actually read C++ And The Perils Of Double
Checked
Locking.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
It explains a lot of this most clearly.
A lot of people in here are still describing the problems in
terms of reordering.

The issue is related to reordering---to the writes occuring or
being seen in a different order than what you wrote.
This is a bad way of thinking about it.
Let me again emphasize what I wrote earlier: Two writes made
by one thread may be seen to happen in different orders by two
reader threads, on the exact same execution, on the exact same
writes. One cannot correctly reason about threading by
examining the possible interleavings of instructions. It does
not work when discussing portable threading.

Exactly. The two threads see the writes in different orders.
Reordering.
Some people, yourself included goodfella, have noted that
without proper synchronization, yes it may take a long time
for a write to be visible to other threads. However, it's also
possible that the write will \never\ be visible without proper
synchronization.

In theory, perhaps. The time it takes for the write to become
visible is unspecified, and formally unbound, but in practice,
it will become visible "fairly quickly".
Just to add to what others have said, correct use of volatile
is only one of the following three cases:
1- Memory Mapped IO, like writing device drivers.
2- Signal handlers.
3- setjump longjump.
If your code has volatile in a context of not one of those 3,
your use of volatile is superfluous and/or wrong (or not
portable).

The latter two are specified by the standard, and since they do
only concern "memory" in a single thread, should work. For the
first (and possible other uses), see the documentation of your
implementation. The standard actually says very little with
regards to volatile, and the implementation of volatile on some
compilers (Sun CC and g++ on a Sparc, for example) does not
support memory mapped IO (which is arguably a violation of the
intent of the standard, but that's the way it is).
 
J

Joshua Maurice

On Oct 8, 12:43 am, goodfella <[email protected]> wrote:

Just a couple of nits...

    [...]
goodfella, please actually read C++ And The Perils Of Double
Checked
Locking.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
It explains a lot of this most clearly.
A lot of people in here are still describing the problems in
terms of reordering.

The issue is related to reordering---to the writes occuring or
being seen in a different order than what you wrote.
This is a bad way of thinking about it.
Let me again emphasize what I wrote earlier: Two writes made
by one thread may be seen to happen in different orders by two
reader threads, on the exact same execution, on the exact same
writes. One cannot correctly reason about threading by
examining the possible interleavings of instructions. It does
not work when discussing portable threading.

Exactly.  The two threads see the writes in different orders.
Reordering.

One can think about some cases in terms of reordering, but one cannot
prove correctness in the general case by considering all possible
reorderings. Again, take this example:

//Initially:
x = 0
y = 0

//Thread 1
x = 1
y = 2

//Thread 2, 3, 4, 5
print x, y

If threads 1 through 5 are started at the same time, on a single
execution you might end up with all 4 possible combinations printed,
(0, 0), (1, 0), (0, 2), (1, 2). One cannot describe this in terms of
reorderings. There is no global ordering, no global interleaving, of
instructions which can produce the result of all 4 combinations
printed. However, in practice, this can happen. This is why I again
emphasize that it is inherently broken to think about about portable
threading by considering possible interleavings and reorderings.
In theory, perhaps.  The time it takes for the write to become
visible is unspecified, and formally unbound, but in practice,
it will become visible "fairly quickly".

The compiler could optimize away a load, and keep a variable in a
register, like in a busy waiting loop on a flag. I would not rely upon
"eventually". For sufficiently complex code, such that it must reload
it from memory, you are correct. However, "sufficiently complex" is
entirely dependent on the machine on which its running. I don't feel
comfortable looking at a piece of code and saying "Yeah, that's
sufficiently complex to require spilling the register holding that
value."
 
R

REH

The compiler could optimize away a load, and keep a variable in a
register, like in a busy waiting loop on a flag. I would not rely upon
"eventually". For sufficiently complex code, such that it must reload
it from memory, you are correct. However, "sufficiently complex" is
entirely dependent on the machine on which its running. I don't feel
comfortable looking at a piece of code and saying "Yeah, that's
sufficiently complex to require spilling the register holding that
value."

What I also go back and forth on is: should the target variable of an
atomic operation be volatile? Every atomic API I've ever seen uses
volatile. For example:

bool atomic_flag_test_and_set(volatile atomic_flag*);

So, is it enough that the formal argument is volatile, or must the
actual argument be volatile also?

REH
 
G

goodfella

Two writes by one thread being visible by another thread in a
different order is not a problem with my code, because the conditions
in the push and pop functions at any given time rely on one shared
memory location and each thread does something different depending on
the value of the memory location. Also, both threads only write to
one shared memory location at any given time. If the producer thread
writes to memory location X and then writes to memory location Y, and
the consumer thread sees those writes in the opposite order that will
not cause a problem because if the consumer is reading memory location
Y he will do his work immediately, and if the consumer is reading
memory location X he will eventually see the write and do his work.
Likewise for the producer.

However, as some of you have pointed out, I may not have the guarantee
that writes by one processor to a memory location will eventually be
seen by another processor reading that location. In which case my
code will not work. Although this problem seems invalid because I
don't understand why a particular architecture that supports multiple
processors or cores would fail to provide such a guarantee. The only
thing I can think of, and I'm just throwing this out here because I'm
hoping someone could give me a good example, is if you know that a
particular thread is only going to run on one core and will only need
access to memory it alone will modify, then you don't need to suffer
the overhead of insuring cache coherency with the other processors or
cores. In addition, since C++ does not have a notion of multiple
threads, I'm guessing that the compiler is at liberty to not use the
cache coherency mechanisms provided by the architecture? If someone
could clarify how cache coherency is guaranteed, i.e. by the compiler,
threading library or kernel, I would appreciate it.

I also agree with REH that I need to have a way to make sure the
pointers in the data array point to completely initialized data.
 
G

goodfella

The following paragraphs enumerate the possible situations and related
cases that could occur with my code. I make an argument in regards to
the validity of each situation and case. To sum things up, the only
problem is the possibility of the consumer seeing a null value in the
same memory location where the producer sees a non null value. The
cause of this problem is the cache coherency issues that have been
mentioned in this thread. Of course I'm assuming that reads and
writes to the memory locations mentioned are atomic. See below for
more detail.

There are two possible situations to consider when the consumer and
producer are reading from the same memory location. The first
situation is both threads see different values. So, if the consumer
sees a non null value at location X and the producer sees a null value
at location X, both threads could write to the memory location which
would be a problem. The opposite case where the consumer sees a null
value at location X and the producer sees a non null value at location
X would result in both threads doing no work which could lead to a
process that never exits. The situation where they see the same value
at a memory location works correctly. Either the consumer is going to
write to the memory location or the producer is going to write to the
memory location.

If both threads see different values at the same memory location such
that the consumer sees a non null pointer and the producer sees a null
pointer, the consumer would have to be able to read what the producer
wrote to the memory location before the producer can read that value
and the producer would have to be able to read what the consumer wrote
to the memory location before the consumer can read that value. This
is because only the producer writes a non null value to a memory
location and only the consumer writes a null value to a memory
location. Consequently, this case implies that a read followed by a
write only from the same processor to the same location would not
produce the value that was written. If a processor cannot write to a
memory location and read the value it just wrote given that no other
process has written to the same memory location in between the read
and the write you would not be able to write a for loop whose
terminating condition is based on the value of a counter that is
incremented at each iteration. Therefore, this situation is invalid.
This of course is assuming that the initial configuration of the data
array is valid, i.e. the values of the elements are null pointers.
I'm assuming that this could be accomplished by initializing the
elements in a mutex lock and unlock block before starting the producer
and consumer threads.

The case where the consumer and producer read different values at the
same memory location such that the consumer sees a null pointer and
the producer sees a non null pointer corresponds to the cache
coherency issues mentioned in this thread. The writes by the consumer
may not be visible by the producer and vice versa. So if I cannot
guarantee that the producer will eventually read what the consumer
wrote to a memory location and vise versa, my code will eventually get
into a state where the producer always sees a non null value at the
memory location while the consumer always sees a null value at the
memory location. Consequently, the process encompassing the two
threads would never end.

Other possible situations involve the producer and consumer reading
from different memory locations. All these situations work correctly
because the fact that they are working on different memory locations
provides the necessary exclusiveness.

Finally, I would like to say to thanks to everyone for the posts. It
has been a good learning experience and I'm looking forward to going
over the code supplied as well as the articles mentioned. I have
already read the, "C++ And The Perils Of Double Checked Locking"
article. Thank you Joshua.
 
G

goodfella

In regards to volatile, if compilers are at liberty to ignore it as
some of you have posted, then any usage of volatile will lead to non-
portable code because volatile might do whats expected on one compiler
and not do whats expected on another compiler.
 
I

Ian Collins

goodfella said:
In regards to volatile, if compilers are at liberty to ignore it as
some of you have posted, then any usage of volatile will lead to non-
portable code because volatile might do whats expected on one compiler
and not do whats expected on another compiler.

Compilers are not at liberty to ignore volatile. As long as the
programmer does not make assumptions beyond the standard, there won't be
any portability problems.
 
R

REH

However, as some of you have pointed out, I may not have the guarantee
that writes by one processor to a memory location will eventually be
seen by another processor reading that location.  In which case my
code will not work.  Although this problem seems invalid because I
don't understand why a particular architecture that supports multiple
processors or cores would fail to provide such a guarantee.

Not all architectures that have more than one processor are of the
homogeneous "multi-core" variety. There are many variations. In multi-
core, you have a group of identical processors that share memory and
act as a "pool" of processors. For the most part, you can take
advantage of the additional processors without any extra work on your
part (i.e., the OS will schedule the work across the processor, load-
balancing the work). Other system have homogeneous processors that may
or may not share memory (i.e., each has its own) and have to be
independently scheduled by the programmer. Some even has a number of
"hardware threads" built into the processor. Still others have various
types of processors, buses, etc. The variations are endless. If you
want to see just how varied they can get, just look at the differences
in architecture amongst the popular gaming consoles (XBox, PSP, etc.).
Their individual strategies for multi-threading are very different.

The systems I write code for have heterogeneous, independent
processors. Actually they are completely independent computers called
SBCs (single board computers) that communicate over a VME bus. Between
the CPU and the bus, is a PCI bus. For me to use atomic operations for
implementing a test-and-set across shared memory, for example, I need
to ensure that the data is coherency and atomic across all the
processors, their caches, and both buses. There is absolutely no way
that the architecture can provide any guarantees. The processors don't
even know about each other, there is no way they will synchronize
their caches for me. It is completely within the realm of possibility
(because I've seen it) that a write done by one processor will never
be seen by the other, without taking proper steps to ensure it.

REH
 
G

goodfella

Not all architectures that have more than one processor are of the
homogeneous "multi-core" variety. There are many variations. In multi-
core, you have a group of identical processors that share memory and
act as a "pool" of processors. For the most part, you can take
advantage of the additional processors without any extra work on your
part (i.e., the OS will schedule the work across the processor, load-
balancing the work). Other system have homogeneous processors that may
or may not share memory (i.e., each has its own) and have to be
independently scheduled by the programmer. Some even has a number of
"hardware threads" built into the processor. Still others have various
types of processors, buses, etc. The variations are endless. If you
want to see just how varied they can get, just look at the differences
in architecture amongst the popular gaming consoles (XBox, PSP, etc.).
Their individual strategies for multi-threading are very different.

The systems I write code for have heterogeneous, independent
processors. Actually they are completely independent computers called
SBCs (single board computers) that communicate over a VME bus. Between
the CPU and the bus, is a PCI bus. For me to use atomic operations for
implementing a test-and-set across shared memory, for example, I need
to ensure that the data is coherency and atomic across all the
processors, their caches, and both buses. There is absolutely no way
that the architecture can provide any guarantees. The processors don't
even know about each other, there is no way they will synchronize
their caches for me. It is completely within the realm of possibility
(because I've seen it) that a write done by one processor will never
be seen by the other, without taking proper steps to ensure it.

REH

Yeah, I can see where that could be a problem. I also see a problem
with my code where the compiler can reorder it in such a way that is
correct, but will not work with multiple threads. For example, if the
compiler was to re-write the push function as follows, I would be
screwed and not even know it:

push( T* el)

T* old = data[write_idx];
data[write_idx] = el;

if(old == NULL)
{
++write_idx;
return true;
}
else
{
data[write_idx] = old;
return false;
}

The code above has the same effect as my original, post but will not
work with a single producer and single consumer. The compiler does
not know that this kind of change is going to break another thread
because the compiler from what I understand does not have a notion of
threads.
 
J

James Kanze

On Oct 8, 12:43 am, goodfella <[email protected]> wrote:
Just a couple of nits...
[...]
goodfella, please actually read C++ And The Perils Of Double
Checked
Locking.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
It explains a lot of this most clearly.
A lot of people in here are still describing the problems in
terms of reordering.
The issue is related to reordering---to the writes occuring or
being seen in a different order than what you wrote.
This is a bad way of thinking about it.
Let me again emphasize what I wrote earlier: Two writes
made by one thread may be seen to happen in different
orders by two reader threads, on the exact same execution,
on the exact same writes. One cannot correctly reason
about threading by examining the possible interleavings of
instructions. It does not work when discussing portable
threading.
Exactly. The two threads see the writes in different
orders. Reordering.
One can think about some cases in terms of reordering, but one
cannot prove correctness in the general case by considering
all possible reorderings. Again, take this example:
//Initially:
x = 0
y = 0
//Thread 1
x = 1
y = 2
//Thread 2, 3, 4, 5
print x, y
If threads 1 through 5 are started at the same time, on a
single execution you might end up with all 4 possible
combinations printed, (0, 0), (1, 0), (0, 2), (1, 2). One
cannot describe this in terms of reorderings.

But that's exactly what it is. The different threads see the
writes in a different order (reordered) from one another.
There is no global ordering,

No. Thread 1 executes the writes in a specific order, say first
x, then y. The other threads see these writes in a different
order---reordered, with respect to the thread which actually did
the writing.
no global interleaving, of instructions which can produce the
result of all 4 combinations printed.

Who said anything about instructions. It's when the different
threads see the effects which is reordered, not the execution
order of the instructions (which is, obviously, invisible to
the other threads).
However, in practice, this can happen. This is why I again
emphasize that it is inherently broken to think about about
portable threading by considering possible interleavings and
reorderings.

The compiler could optimize away a load, and keep a variable in a
register, like in a busy waiting loop on a flag.

We're discussing a level below what the compiler does. You can
verify manually what the compiler does, and once it has done it,
the order of the instructions won't change. Things like cache
handling and pipelining, on the other hand, are run-time
phenomena, and can easily result in different behavior each time
you execute the sequence.
I would not rely upon "eventually".

If I execute a store instruction, a write will be inserted into
the write pipeline. Sooner or later, it will come out at the
other end, and global memory will be written (assuming cache
write through, which is fairly universal).
For sufficiently complex code, such that it must reload it
from memory, you are correct. However, "sufficiently complex"
is entirely dependent on the machine on which its running. I
don't feel comfortable looking at a piece of code and saying
"Yeah, that's sufficiently complex to require spilling the
register holding that value."

The write will always eventually end up in global memory. I'm
less sure that a read will always end up going to global memory;
I can easily imagine a processor keeping the results of a read
around, and in some exotic scenarios, never even going to cache.
(I'm talking here about the case where there are multiple
executions of a load instruction, not what the compiler might
do.)
 
J

James Kanze

In regards to volatile, if compilers are at liberty to ignore
it as some of you have posted, then any usage of volatile will
lead to non- portable code because volatile might do whats
expected on one compiler and not do whats expected on another
compiler.

With one or two very limited exceptions (e.g. when using
setjmp/longjmp), the effects of volatile are implementation
defined. Almost by definition, code using volatile is
non-portable.
 
J

James Kanze

What I also go back and forth on is: should the target
variable of an atomic operation be volatile? Every atomic API
I've ever seen uses volatile. For example:
bool atomic_flag_test_and_set(volatile atomic_flag*);
So, is it enough that the formal argument is volatile, or must
the actual argument be volatile also?

What does the documentation of the interface say?
 
P

Pallav singh

What does the documentation of the interface say?

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
following could be varient of this problem

single producer ......single consumer

1. single thread producer .......single Thread Consumer
2. single Process producer ............ single process Consumer
3. producer and consumer are Running on different terminal

Thanks
Pallav
 
J

James Kanze

Compilers are not at liberty to ignore volatile. As long as
the programmer does not make assumptions beyond the standard,
there won't be any portability problems.

According to the standard, accessing through a const lvalue is
"observable behavior". The standard also says, however, that
what constitutes an access is implementation defined. Which
means that in practice, the only assumption you can make based
on the standard is that what constitutes an access will be
documented. Which is nice, but... it means that just about
anything you do which involves volatile is implementation
defined, and thus non-portable. (There's also the fact that
I've yet to find this documentation---required by the
standard---for any compiler, which means per se that none of the
compilers I'm familiar with are standard conform.)

In practice, the general rule with regards to volatile is that
the compiler turns off all optimization with regards to that one
access (and doesn't change any other accesses). Which is about
as useful on a modern machine as simply saying that volatile is
a no-op.
 
J

Joshua Maurice

Just a couple of nits...
    [...]
goodfella, please actually read C++ And The Perils Of Double
Checked
Locking.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
It explains a lot of this most clearly.
A lot of people in here are still describing the problems in
terms of reordering.
The issue is related to reordering---to the writes occuring or
being seen in a different order than what you wrote.
This is a bad way of thinking about it.
Let me again emphasize what I wrote earlier: Two writes
made by one thread may be seen to happen in different
orders by two reader threads, on the exact same execution,
on the exact same writes. One cannot correctly reason
about threading by examining the possible interleavings of
instructions. It does not work when discussing portable
threading.
Exactly.  The two threads see the writes in different
orders.  Reordering.
One can think about some cases in terms of reordering, but one
cannot prove correctness in the general case by considering
all possible reorderings. Again, take this example:
//Initially:
x = 0
y = 0
//Thread 1
x = 1
y = 2
//Thread 2, 3, 4, 5
print x, y
If threads 1 through 5 are started at the same time, on a
single execution you might end up with all 4 possible
combinations printed, (0, 0), (1, 0), (0, 2), (1, 2). One
cannot describe this in terms of reorderings.

But that's exactly what it is.  The different threads see the
writes in a different order (reordered) from one another.
There is no global ordering,

No.  Thread 1 executes the writes in a specific order, say first
x, then y.  The other threads see these writes in a different
order---reordered, with respect to the thread which actually did
the writing.

Specifically, my example was thread 1 does write X then write Y.
Thread 2 may see write X without seeing write Y, and thread 3 may see
write Y without seeing write X.
Who said anything about instructions.  It's when the different
threads see the effects which is reordered, not the execution
order of the instructions (which is, obviously, invisible to
the other threads).

Useless semantic quibbling, though I shall try to be more precise in
the future.
We're discussing a level below what the compiler does.  You can
verify manually what the compiler does, and once it has done it,
the order of the instructions won't change.  Things like cache
handling and pipelining, on the other hand, are run-time
phenomena, and can easily result in different behavior each time
you execute the sequence.

To artificially limit the discussion to the "compiler", and not the
rest of the implementation is .. "un-good".

To repeat what you said earlier in your post:
But that's exactly what it is. The different threads see the
writes in a different order (reordered) from one another.

We're partially having an argument over semantics, over definitions,
and I do ever so hate such arguments. However, I will stick by my
claim on this one: I strongly dislike using the terms "order of writes
and reads" and "reorder of writes as reads" as you use them because it
suggests that one can correctly reason about threading by examining
the possible interleavings of reads and writes, and this is simply not
true for POSIX implementations, win32 threading, Java threading, and
all other threading known to me (which honestly isn't much). There are
cases where the algorithm is correct over all possible reorderings of
reads and writes, but not when you remove the assumption that writes
done by some thread will be visible to other threads in some unique,
specific order. Reader thread A may see some order, and reader thread
B may see an entirely different order. Sure, some hardware may make
that assumption true, but I won't rely on that, at least not for
supposedly portable code.

I reject so strongly because almost all of the threading mistakes I
see by my colleagues are because they are using this naive
understanding of threading. They attempt to look at the possible
interleavings of reads and writes in the absence of synchronization,
and then they claim that the algorithm is correct. I've had to fix
such code many times, and I've had to fix such bad understanding from
otherwise good programmers many times as well.

Unfortunately, this probably goes back to how they were taught threads
in college, or in their book. I'm sure it starts off with a simple
description of a machine, without a pipeline or anything crazy,
"faithfully" executing one instruction at a time, and then they just
add two executors to explain threading. This is how many people
understand threading in C++ and Java, and it is inherently broken.
When talking of pipelines, caching, hardware reordering, etc., they
attempt to account for it in their simple model, considering more
reorderings, but at the end of the day it still doesn't work. Talking
of reorderings only perpetuates this misunderstanding. One must
understand the guarantees made by your primitives, and POSIX, win32,
and Java guarantees are not made in terms of reorderings as you use
the term.
If I execute a store instruction, a write will be inserted into
the write pipeline.  Sooner or later, it will come out at the
other end, and global memory will be written (assuming cache
write through, which is fairly universal).


The write will always eventually end up in global memory.  I'm
less sure that a read will always end up going to global memory;
I can easily imagine a processor keeping the results of a read
around, and in some exotic scenarios, never even going to cache.
(I'm talking here about the case where there are multiple
executions of a load instruction, not what the compiler might
do.)

Except when you're on different processors, as mentioned by REH in-
thread.

Or except when you're in a tight busy loop:
while (!flag)
thread_yield();
and the compiler or the hardware decides to not re-load the flag from
memory and keeps it in a register. Also, it doesn't have to be this
simple. For something slightly more complex, the compiler may optimize
away the memory loads and stores, and keeps the value in a register or
on the stack. I frankly do not want to decide what is "sufficiently
complex" that the optimizer will not break it by examining C++ source
code.

Or your compiler decides that the write is dead code through some
(possibly "broken") analysis which doesn't take into account other
threads.

Or a variety of other situations which I don't know offhand. I will
program to the spec and not to any particular implementation, lest the
implementation changes, "breaking" my code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top