single producer, single consumer

G

goodfella

Here is some code I came up with for a lock free single producer
single consumer algorithm. I have done some tests using GCC with -O3
optimizations and all seems well. I would like some constructive
criticism of the code as I am not yet an expert on concurrency.

I'm assuming that reading and writing the value of a pointer is
atomic. Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types. Any suggestions and criticisms are
welcomed thanks. The code is below.


bool push_element(T* el);
T* pop_element();

const size_t Size = 1024;
T* volatile data[Size];

bool push_element(T* el)
{
static int write_idx = 0;

if( !data[write_idx] )
{
data[write_idx] = el;
write_idx = (write_idx + 1) % Size;
return true;
}
else
return false;
}

T* pop_element()
{
static int read_idx = 0;

T* element;

if( element = data[read_idx] )
{
data[read_idx] = 0;
read_idx = (read_idx + 1) % Size;
return element;
}
else
return 0;
}
 
R

REH

Here is some code I came up with for a lock free single producer
single consumer algorithm.  I have done some tests using GCC with -O3
optimizations and all seems well.  I would like some constructive
criticism of the code as I am not yet an expert on concurrency.

I'm assuming that reading and writing the value of a pointer is
atomic.  Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types.  Any suggestions and criticisms are
welcomed thanks.  The code is below.

bool push_element(T* el);
T* pop_element();

const size_t Size = 1024;
T* volatile data[Size];

bool push_element(T* el)
{
    static int write_idx = 0;

    if( !data[write_idx] )
    {
        data[write_idx] = el;
        write_idx = (write_idx + 1) % Size;
        return true;
    }
    else
        return false;

}

T* pop_element()
{
    static int read_idx = 0;

    T* element;

    if( element = data[read_idx] )
    {
        data[read_idx] = 0;
        read_idx = (read_idx + 1) % Size;
        return element;
    }
    else
        return 0;

}

You have a few potential issues:
1) Even if pointer accesses are atomic on your machine, your array
element accesses most certainly aren't.
2) If you run this on a multi-processor system, you will have cache-
coherency issues.

You really need to use proper atomic operations. Most systems provide
these. GCC 4.1 or higher has atomic built-ins. Windows has atomic
APIs. Most processors provide them (e.g., Intel has lock+cmpxchg).

REH
 
R

REH

Here is some code I came up with for a lock free single producer
single consumer algorithm.  I have done some tests using GCC with -O3
optimizations and all seems well.  I would like some constructive
criticism of the code as I am not yet an expert on concurrency.

I'm assuming that reading and writing the value of a pointer is
atomic.  Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types.  Any suggestions and criticisms are
welcomed thanks.  The code is below.

Incidentally, I've written a lock-free multi-read/multi-write queue
for multi-core machines that runs on both Windows and Linux, if you
need an example. It probably does more than you want/need, but you
could strip it down.

REH
 
G

goodfella

Incidentally, I've written a lock-free multi-read/multi-write queue
for multi-core machines that runs on both Windows and Linux, if you
need an example. It probably does more than you want/need, but you
could strip it down.

REH

REH, thanks for your replay, and yes I would be interested in seeing
your lock-free multi-read/multi-write queue. However, I do not see
how the array element access would not be atomic. From my
understanding, the following line of code:

element = data[read_idx]

Is converted to something like:

T* temp = data + read_idx
element = temp

The first line might not be atomic, but if reading the value from a
pointer is atomic, my code is still valid because its the second line
that counts. Neither data nor read_idx are going to be updated by
another thread. data is never pointed to another array of T* and
read_idx is only updated by the consumer thread.

Also, could you elaborate more on your second point, thanks.
 
R

REH

REH, thanks for your replay, and yes I would be interested in seeing
your lock-free multi-read/multi-write queue.  However, I do not see
how the array element access would not be atomic.  From my
understanding, the following line of code:

element = data[read_idx]

Is converted to something like:

T* temp = data + read_idx
element = temp

The first line might not be atomic, but if reading the value from a
pointer is atomic, my code is still valid because its the second line
that counts.  Neither data nor read_idx are going to be updated by
another thread.  data is never pointed to another array of T* and
read_idx is only updated by the consumer thread.

I glossed over the part where you said you had single readers and
writers. In that limited case, and if you can guarantee that your
pointer accesses are atomic, I think you are OK.

Also, could you elaborate more on your second point, thanks.

If your code is running on a system with more than one processor
(e.g., a multi-core processor), each processor will have its own
cache. If one thread updates a pointer, the other processor(s) may not
see it, because they are accessing a value stored in their cache. One
of the things an atomic operation needs to do is ensure the value it
is working with is the most recent one, regardless of any caches that
may lay between the processor and memory. You normally don't give this
a second thought in multi-threaded code because your synchronization
objects (i.e., mutexes, semaphores, etc.) do this for you. They also
ensure that optimizations such as code reordering do not change the
logic. Code that must occur before or after the atomic operation must
not be move across that operation. You cannot provide these guarantees
in normal C++ (yet).

REH
 
R

REH

REH, thanks for your replay, and yes I would be interested in seeing
your lock-free multi-read/multi-write queue.

It extracted the relevant code, and wrote a README file to explain all
the pieces. I can email it to you tonight when I get home. Is the
email you have above the correct one to use?

REH
 
J

Joshua Maurice

Here is some code I came up with for a lock free single producer
single consumer algorithm.  I have done some tests using GCC with -O3
optimizations and all seems well.  I would like some constructive
criticism of the code as I am not yet an expert on concurrency.

I'm assuming that reading and writing the value of a pointer is
atomic.  Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types.  Any suggestions and criticisms are
welcomed thanks.  The code is below.
[Snip code]

As already mentioned, multi-processor systems will probably not run
your code as you intended because of cache coherency. I'd strongly
suggest looking it up and getting some good references on threading.
The short version is that BOTH the reader and the writer threads must
execute some kind of synchronization primitive to have any guarantee
WHATSOEVER on the order that writes become visible to another thread
(where a synchronization primitive might be a full blown lock, acquire
and release memory barriers, read and write memory barriers, etc.).
Ex:

#include <string>
#include <fstream>
using namespace std;

//insert threading library
template <typename callable_t>
void startThread(callable_t callable);

int x = 0;
int y = 0;

struct foo
{ foo(string const& f) : filename(f) {}
string filename;
void operator() ()
{ ofstream fout(filename.c_str());
fout << x << " " << y << endl;
}
};

int main()
{ startThread(foo("a"));
startThread(foo("b"));
startThread(foo("c"));
startThread(foo("d"));
x = 1;
y = 2;
}

This program, on a \single\ execution, can produce 4 files with the
contents (though highly unlikely for this contrived example):
0 0
0 2
1 0
1 2

This is the cache coherency problem. Let me emphasize again: two
different threads may see the same writes occur in different orders.
On modern machines, there is no single view of main memory. Everyone
has their own local copy because of their own independent caches. One
thread may execute the instructions "load register to x" then "load
register to y", but another thread may see it happen in the opposite
order.

Synchronization instructions make the view consistent when used
properly. Atomic is insufficient. The pointer might have been updated
but what it points to has not been updated. See C++ And The Perils Of
Double Checked Locking
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
for an excellent description of modern threading, and common
pitfalls.

This is of course ignoring the other two big "culprits" of making non-
conformant code do what the user does not expect: compiler
optimizations and single core pipeline reorderings / optimizations.

Also, even on a single core single processor machine, you might still
hit related issues due to compiler optimizations and hardware pipeline
optimizations, especially with signal handlers.
 
G

goodfella

single consumer algorithm.  I have done some tests using GCC with -O3
optimizations and all seems well.  I would like some constructive
criticism of the code as I am not yet an expert on concurrency.
I'm assuming that reading and writing the value of a pointer is
atomic.  Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types.  Any suggestions and criticisms are
welcomed thanks.  The code is below.

[Snip code]

As already mentioned, multi-processor systems will probably not run
your code as you intended because of cache coherency. I'd strongly
suggest looking it up and getting some good references on threading.
The short version is that BOTH the reader and the writer threads must
execute some kind of synchronization primitive to have any guarantee
WHATSOEVER on the order that writes become visible to another thread
(where a synchronization primitive might be a full blown lock, acquire
and release memory barriers, read and write memory barriers, etc.).
Ex:

#include <string>
#include <fstream>
using namespace std;

//insert threading library
template <typename callable_t>
void startThread(callable_t callable);

int x = 0;
int y = 0;

struct foo
{   foo(string const& f) : filename(f) {}
    string filename;
    void operator() ()
    {   ofstream fout(filename.c_str());
        fout << x << " " << y << endl;
    }

};

int main()
{   startThread(foo("a"));
    startThread(foo("b"));
    startThread(foo("c"));
    startThread(foo("d"));
    x = 1;
    y = 2;

}

This program, on a \single\ execution, can produce 4 files with the
contents (though highly unlikely for this contrived example):
    0 0
    0 2
    1 0
    1 2

This is the cache coherency problem. Let me emphasize again: two
different threads may see the same writes occur in different orders.
On modern machines, there is no single view of main memory. Everyone
has their own local copy because of their own independent caches. One
thread may execute the instructions "load register to x" then "load
register to y", but another thread may see it happen in the opposite
order.

Synchronization instructions make the view consistent when used
properly. Atomic is insufficient. The pointer might have been updated
but what it points to has not been updated. See C++ And The Perils Of
Double Checked Lockinghttp://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
for an excellent description of modern threading, and common
pitfalls.

This is of course ignoring the other two big "culprits" of making non-
conformant code do what the user does not expect: compiler
optimizations and single core pipeline reorderings / optimizations.

Also, even on a single core single processor machine, you might still
hit related issues due to compiler optimizations and hardware pipeline
optimizations, especially with signal handlers.

The machine I'm testing this on has a dual core processor, and I have
yet to hit a problem like that. I would expect the process to hang
because the producer would constantly be getting a false value
returned from the push function and the consumer would constantly be
getting a null pointer from the pop function due to the fact that the
pointer value would be stored in their respective cache's and a write
to the pointer in one cache would not propagate to the pointer value
in the other cache. Are there any guarantees in regard to cache
coherency on certain platforms?
 
J

Joshua Maurice

[Snip code]
As already mentioned, multi-processor systems will probably not run
your code as you intended because of cache coherency. I'd strongly
suggest looking it up and getting some good references on threading.
The short version is that BOTH the reader and the writer threads must
execute some kind of synchronization primitive to have any guarantee
WHATSOEVER on the order that writes become visible to another thread
(where a synchronization primitive might be a full blown lock, acquire
and release memory barriers, read and write memory barriers, etc.).
Ex:
#include <string>
#include <fstream>
using namespace std;
//insert threading library
template <typename callable_t>
void startThread(callable_t callable);
int x = 0;
int y = 0;
struct foo
{   foo(string const& f) : filename(f) {}
    string filename;
    void operator() ()
    {   ofstream fout(filename.c_str());
        fout << x << " " << y << endl;
    }

int main()
{   startThread(foo("a"));
    startThread(foo("b"));
    startThread(foo("c"));
    startThread(foo("d"));
    x = 1;
    y = 2;

This program, on a \single\ execution, can produce 4 files with the
contents (though highly unlikely for this contrived example):
    0 0
    0 2
    1 0
    1 2
This is the cache coherency problem. Let me emphasize again: two
different threads may see the same writes occur in different orders.
On modern machines, there is no single view of main memory. Everyone
has their own local copy because of their own independent caches. One
thread may execute the instructions "load register to x" then "load
register to y", but another thread may see it happen in the opposite
order.
Synchronization instructions make the view consistent when used
properly. Atomic is insufficient. The pointer might have been updated
but what it points to has not been updated. See C++ And The Perils Of
Double Checked Lockinghttp://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
for an excellent description of modern threading, and common
pitfalls.
This is of course ignoring the other two big "culprits" of making non-
conformant code do what the user does not expect: compiler
optimizations and single core pipeline reorderings / optimizations.
Also, even on a single core single processor machine, you might still
hit related issues due to compiler optimizations and hardware pipeline
optimizations, especially with signal handlers.

The machine I'm testing this on has a dual core processor, and I have
yet to hit a problem like that.

Probably just luck. No one side it's likely to fail. Depending on the
application it could be quite rare. You don't want such a Heisen-bug
to show up only once a month on a customer's machine.
I would expect the process to hang
because the producer would constantly be getting a false value
returned from the push function and the consumer would constantly be
getting a null pointer from the pop function due to the fact that the
pointer value would be stored in their respective cache's and a write
to the pointer in one cache would not propagate to the pointer value
in the other cache.

Hardware, from the perspective of a software programmer, is a black
art. You seem to have a very poor understanding of how modern
operating systems work, and how basic processors and caches work. Just
follow the rules and you should be OK. Here are some examples from
real machines.

Caches are not infinite in size. Eventually they get full and things
must be evicted, aka spilled, to main memory. Generally they use a
heuristic of least recently referenced, which means first in is not
guaranteed to be first out. (To answer your question, surely you have
other processes running on your machine, and swapping between these
processes many times a second means that the cache will be cleared
many times a second from this context switching.)

The reader's cache may contain a stale entry for one but not the
other, which means it'll fetch the new version from main memory for
one but not the other.

The infamous DEC Alpha had a split cache, meaning that (something
like ??) all even addresses went to one cache, and all odd addresses
went to the other cache. This is much more likely to produce effects
like above, as something much more recently referenced will be flushed
to main memory first because that subportion of the cache is full.
(The DEC Alpha also has crazy look ahead, where **x may be the newest
value in main memory, but *x may be a stale value. At least, that's
how it's been explained to me. I have no clue how or why you would
implement that.)

I'd assume there exist some caches which use other heuristics, like
locality heuristics. Maybe some caches only contain ~20 byte units of
main memory, in effect like a look-ahead optimization, pulling data
from main memory before it's required, resulting in possibly stale
data for some data if it's near other used data, but if it's far away
in main memory, then it will be much more likely to be the newest
version of main memory.

I'm sure there are even "weirder" optimizations which will mess with
you if you don't follow the rules. The programming language gives you
a contract, and that contract is both threads must execute the proper
synchronization primitives to give any guarantee of the order that
writes become visible from one thread to the other. The programming
language, hardware, and the operating system fulfill this contract,
and they will do their best to optimize within the confines of this
contract. If you do not follow the contract, then you violated this
contract, and violated assumptions made by the implementation, and in
which case you're SOL.
Are there any guarantees in regard to cache
coherency on certain platforms?

Probably. I don't know offhand.
 
C

Chris M. Thomasson

goodfella said:
Here is some code I came up with for a lock free single producer
single consumer algorithm. I have done some tests using GCC with -O3
optimizations and all seems well. I would like some constructive
criticism of the code as I am not yet an expert on concurrency.

I'm assuming that reading and writing the value of a pointer is
atomic. Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types. Any suggestions and criticisms are
welcomed thanks. The code is below.

[...]

Check this example code out:

(IA-32/MSVC/PThread)

http://pastebin.com/f693b64b3


It's a wait-free fast-pathed single-consumer/producer bounded ring buffer
that allows for efficient in place writes and reads. A producer can directly
reserve and utilize buffer space, then atomically commit it such that a
consumer can read and process the buffer in place. Nothing gets overwritten,
everything works, and it has conditional blocking ability.

The code has the ability to block because it makes efficient use of my
eventcount algorithm. The eventcount is as important to non-blocking code as
condition variables are to PThread's. This uses my original algorithm which
I disclosed to the public here:


http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380
 
G

goodfella

Here is some code I came up with for a lock free single producer
single consumer algorithm.  I have done some tests using GCC with -O3
optimizations and all seems well.  I would like some constructive
criticism of the code as I am not yet an expert on concurrency.
I'm assuming that reading and writing the value of a pointer is
atomic.  Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types.  Any suggestions and criticisms are
welcomed thanks.  The code is below.

[...]

Check this example code out:

(IA-32/MSVC/PThread)

http://pastebin.com/f693b64b3

It's a wait-free fast-pathed single-consumer/producer bounded ring buffer
that allows for efficient in place writes and reads. A producer can directly
reserve and utilize buffer space, then atomically commit it such that a
consumer can read and process the buffer in place. Nothing gets overwritten,
everything works, and it has conditional blocking ability.

The code has the ability to block because it makes efficient use of my
eventcount algorithm. The eventcount is as important to non-blocking code as
condition variables are to PThread's. This uses my original algorithm which
I disclosed to the public here:

http://groups.google.com/group/comp.programming.threads/browse_frm/th...

Thanks, I'll take a look at that as well.
 
G

goodfella

I have read the following Wikipedia article about cache coherrency:


http://en.wikipedia.org/wiki/Cache_coherence


and it seems that cache coherency guarantees the following behavior on
reads and writes of memory in the same location:


A read made by processor P1 to location X that follows a write by
another processor P2 to X must return the written value made by P2 if
no other writes to X made by any processor occur between the two
accesses. This condition defines the concept of coherent view of
memory. If processors can read the same old value after the write made
by P2, we can say that the memory is incoherent.


So, from what I have gathered, a read made by processor P1 of memory
location X, following the write of A by processor P2 to memory
location X, should, at some point in time, return the value A. In
addition, if P1 writes A to memory location X, P1 will read A from
memory location X until either it or another processor writes another
value to X.

The above statements about cache coherency are enough to show that my
code works on systems that guarantee cache coherency. The consumer and
producer at any given time are only concerned with the writing and
reading of one particular memory location. The producer is only going
to write to a memory location if the value of the memory location is
zero and the consumer is only going to read from a memory location if
the value of the memory location is not zero.

At any given time, the producer thread is either going to see a null
pointer or a valid pointer at memory location X. If it sees a valid
pointer at memory location X, it will do nothing. If it sees a null
pointer at memory location X it will write a valid memory address to
that memory location.

At any given time, the consumer thread, is either going to see a null
pointer or a valid pointer at memory location X. If it sees a null
pointer it will do nothing. If it sees a valid pointer at memory
location X it will read that value and write zero to the memory
location.

Only the producer writes a valid address to memory location X and only
the consumer writes zero to memory location X. The cache coherency
protocals insure that when one processor writes to a memory location,
the new value of that memory location will eventually be available to
the other processors.

The ordering of reads and writes on different memory locations does
not apply in my case because my code is not dependent on the values of
two different memory locations at the same time. The if statments in
the push and pop functions rely on one particular memory location at a
time, and assuming for the moment that reading the value of a pointer
is atomic, the read of the pointer is going to either return zero or a
valid memory address.

So unless the architecture that my code is ran on fails to guarantee
cache coherency, I don't see how my code could suffer from cache
coherency problems.
 
C

Chris M. Thomasson

I have read the following Wikipedia article about cache coherrency:

http://en.wikipedia.org/wiki/Cache_coherence [snip]
So unless the architecture that my code is ran on fails to guarantee
cache coherency, I don't see how my code could suffer from cache
coherency problems.
True. Which processors are those again? Are you fine writing non-
portable code?

http://groups.google.com/group/comp.arch/browse_thread/thread/df6f520f7af13ea5
(read all...)
 
R

REH

I have read the following Wikipedia article about cache coherrency:

http://en.wikipedia.org/wiki/Cache_coherence

and it seems that cache coherency guarantees the following behavior on
reads and writes of memory in the same location:

You seem to be confused. Cache coherency is not something that is
guaranteed by the architecture. It is something your implementation
must guarantee in order to be deterministic, reliable, and portable.
You can make all the attempts at a logical argument you want, and run
tests as many times as you want, but unless it has the proper memory
barriers, you simply cannot guarantee your code is correct.

I write real-time, embedded systems with safety critical requirements.
Without the proper atomic operations, your code would be rejected at
the first code review.

REH
 
J

Joshua Maurice

I have read the following Wikipedia article about cache coherrency:
http://en.wikipedia.org/wiki/Cache_coherence [snip]
So unless the architecture that my code is ran on fails to guarantee
cache coherency, I don't see how my code could suffer from cache
coherency problems.
True. Which processors are those again? Are you fine writing non-
portable code?

http://groups.google.com/group/comp.arch/browse_thread/thread/df6f520...
(read all...)

That doesn't really answer the question. It gives a single example of
one processor which is not cache coherent, nowhere near the listing of
common (and uncommon) processors necessary to make an informed
decision.

"Which processors are those again?" was also intended as a rhetorical
question. The intent was "Are you even sure on which platforms this
will work?".
 
C

Chris M. Thomasson

On Oct 6, 6:23 pm, goodfella <[email protected]> wrote:
I have read the following Wikipedia article about cache coherrency:
[snip]
So unless the architecture that my code is ran on fails to guarantee
cache coherency, I don't see how my code could suffer from cache
coherency problems.
True. Which processors are those again? Are you fine writing non-
portable code?

http://groups.google.com/group/comp.arch/browse_thread/thread/df6f520...
(read all...)
That doesn't really answer the question. It gives a single example of
one processor which is not cache coherent, nowhere near the listing of
common (and uncommon) processors necessary to make an informed
decision.

Here is what I take away from the thread:




Local Cache-Coherency on a per-Socket/Node basis.

Non-Cache-Coherent Socket/Node Netowrk; basically, message passing.




BTW, you write:
It gives a single example of one processor which is not cache coherent

IMHO, the information contained within the linked thread in `comp.arch'
gives a simple example of per-socket/node cache coherency vs
inter-socket/node NUMA. What happens if the memory for the volatile variable
is on CPU E on SocketB, and the reader/mutator is on CPU X on SocketT?



"Which processors are those again?" was also intended as a rhetorical
question. The intent was "Are you even sure on which platforms this
will work?".

One simple example. SiCortex systems. You have to use message passing (e.g.,
MPI) for inter-socket/node comms, and you can use cache coherent shared
memory for intra-socket/node comms. This is fairly standard setup. Cray?
 
C

Chris M. Thomasson

Okay. I just wanted to mention that there are systems in which it (e.g.,
simple volatile flag with no data-dependencies) will not work without using
MPI or something. BTW, this setup is not uncommon. Advanced multi-core
designs are getting more and more NUMA like. IMVHO, you have to use NUMA
rules on advanced multi-core chips. Should CoreA really be communicating
with CoreZ? Perhaps CoreA and CoreB has access to the same local memory
bank.
 
G

goodfella

You seem to be confused. Cache coherency is not something that is
guaranteed by the architecture. It is something your implementation
must guarantee in order to be deterministic, reliable, and portable.
You can make all the attempts at a logical argument you want, and run
tests as many times as you want, but unless it has the proper memory
barriers, you simply cannot guarantee your code is correct.

I write real-time, embedded systems with safety critical requirements.
Without the proper atomic operations, your code would be rejected at
the first code review.

REH

So in general, the writes made by the producer are not guaranteed to
propagate to the consumer, and writes made by the consumer are not
guaranteed to propagate to the producer. A simpler demonstration of
this problem is an example of how to use the volatile keyword. You
have a variable which gets set to 1 by one thread, called the writer,
while another thread, called the reader, is in a loop waiting for the
variable to be equal to one. The use of volatile insures that the
compiler will not optimize away checking the value of the variable
because the value of the variable may be changed in ways unknown by
the compiler. So in this example, the write performed by the writer
thread is not guaranteed to propagate to the reader thread. Or in
other words, the reader thread will never exit because it will never
see the variable equal to 1.
 
M

Michael Doubez

So in general, the writes made by the producer are not guaranteed to
propagate to the consumer, and writes made by the consumer are not
guaranteed to propagate to the producer.  A simpler demonstration of
this problem is an example of how to use the volatile keyword.  You
have a variable which gets set to 1 by one thread, called the writer,
while another thread, called the reader, is in a loop waiting for the
variable to be equal to one.  The use of volatile insures that the
compiler will not optimize away checking the value of the variable
because the value of the variable may be changed in ways unknown by
the compiler.

Just nitpicking:
1. there is no formal guarantee either from C or C++: volatile
primarily exists for hardware location, if a compiler can determine a
pointer in in memory, it could in theory not apply the constraints of
volatile qualification.
2. In practice, it may do what you say but more importantly is the
way it reorganizes the code, volatile operation should be executed as
they appear in the code.
 So in this example, the write performed by the writer
thread is not guaranteed to propagate to the reader thread.

Well, it should. A write to a volatile location should make a write to
the location (each time, without optimising it away). It may cause
cache incoherency if there is multiple write but at one point, it will
be visible.

The problem is more related to portability across compiler and how
they intepreted 'volatile'.
 Or in
other words, the reader thread will never exit because it will never
see the variable equal to 1.

I thought this article may be of interest:
http://www.ddj.com/cpp/212701484
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top