Subtle difference between C++0x MM and other MMs

D

Dmitriy V'jukov

I can describe how I currently implement seq_cst fence in Relacy Race
Detector in Java/CLR mode:http://groups.google.com/group/relacy

I will try to formalize the rules.
Preconditions:
1. There is store A with memory_order_release (or stronger) on atomic
object M.
2. There is store B with memory_order_relaxed (or stronger) on atomic
object M.
3. A precedes B in modification order of M, A and B are adjacent in
modification order of M.
4. There is seq_cst fence C.
5. B is sequenced-before C.

Postcondition:
A synchronizes-with C.

My reasoning is following. Preconditions 2-5 effectively equivalent
to:
2*. There is load B with memory_order_acquire on atomic object M.
3*. B loads value stored by A.


Or put this way.
'synchronizes-with' definition contains something like '... acquire
operation which loads value stored by X ...'. Reasoning behind this
formulation is (as I understand it):
1. 'loads value stored by X' means that operation happens after X.
It's obvious, it's a casual relation.
2. 'acquire operation' means that subsequent operations can't hoist
above this operation.
Now consider 'store operation which follows X in modification order of
M, followed by full fence'. Here:
1. 'operation which follows X in modification order of M' means that
operation happens after X. It's a kind of casual relation too.
2. 'followed by full fence' means that subsequent operations can't
hoist above this fence.
So effectively those two definitions equal wrt 'synchronizes-with'
relation.

Anthony, Peter, Hans, what do you think?

Dmitriy V'jukov
 
A

Anthony Williams

If the store B in step 2 was a RMW operation, then yes. If it's just a
plain store then no. Peter has proposed that the definition of release
sequence now includes relaxed RMW operations too.
Or put this way.
'synchronizes-with' definition contains something like '... acquire
operation which loads value stored by X ...'. Reasoning behind this
formulation is (as I understand it):
1. 'loads value stored by X' means that operation happens after X.
It's obvious, it's a casual relation.
2. 'acquire operation' means that subsequent operations can't hoist
above this operation.
Now consider 'store operation which follows X in modification order of
M, followed by full fence'. Here:
1. 'operation which follows X in modification order of M' means that
operation happens after X. It's a kind of casual relation too.

Yes it's later in the modification order of M, but that doesn't mean
that other operations that happen-before the first store (A) also
happen-before the second store (B). I can imagine an architecture
where the stores are just issued to the memory, and one clobbers the
value written by the second, with no synchronization between the
processors at all.
2. 'followed by full fence' means that subsequent operations can't
hoist above this fence.
So effectively those two definitions equal wrt 'synchronizes-with'
relation.

I disagree.

Anthony
 
D

Dmitriy V'jukov

Yes it's later in the modification order of M, but that doesn't mean
that other operations that happen-before the first store (A) also
happen-before the second store (B).


Other operations can happen after store (B). It's perfectly Ok. They
must not happen after seq_cst fence!

I can imagine an architecture
where the stores are just issued to the memory, and one clobbers the
value written by the second, with no synchronization between the
processors at all.


I disagree.


Can you then point to error in my reasoning.


Dmitriy V'jukov
 
A

Anthony Williams

Dmitriy V'jukov said:
Other operations can happen after store (B). It's perfectly Ok. They
must not happen after seq_cst fence!

If they are stores to other memory locations that's not guaranteed.

std::atomic_int x,m;

void thread1()
{
x.store(1,std::memory_order_relaxed);
m.store(1,std::memory_order_release); // A
}

void thread2()
{
m.store(2,std::memory_order_relaxed); // B
std::atomic_thread_fence(std::memory_order_seq_cst); // C
int z=x.load(std::memory_order_relaxed);
}

First off, thread2() does no loads, so there is no way it can tell
whether A or B comes first in the modification order of m.

Since we don't know which comes first, we don't know whether z will
have the value 1 or 0.

Even if we read back the value of m and found that it had the value
"2", we wouldn't have any extra information, since the store A could
just not have become visible to thread2 yet.

Stores clobber release sequences. If you need the ordering
information, use a RMW operation such as exchange().

Anthony
 
D

Dmitriy V'jukov

Other operations can happen after store (B). It's perfectly Ok. They
must not happen after seq_cst fence!

First off, thread2() does no loads, so there is no way it can tell
whether A or B comes first in the modification order of m.

Since we don't know which comes first, we don't know whether z will
have the value 1 or 0.

It's irrelevant to the question. See original Peterson's algorithm
example. There are situation where you don't need to know which store
comes first.

Dmitriy V'jukov
 
A

Anthony Williams

Dmitriy V'jukov said:
Yes it's later in the modification order of M, but that doesn't mean
that other operations that happen-before the first store (A) also
happen-before the second store (B).
Other operations can happen after store (B). It's perfectly Ok. They
must not happen after seq_cst fence!

First off, thread2() does no loads, so there is no way it can tell
whether A or B comes first in the modification order of m.

Since we don't know which comes first, we don't know whether z will
have the value 1 or 0.

It's irrelevant to the question. See original Peterson's algorithm
example. There are situation where you don't need to know which store
comes first.

True, there are circumstances where it doesn't matter which came
first, but in order to get synchronization, the processor must
know.

You only get a release sequence with a release store and subsequent
RMW operations. A subsequent store destroys the original sequence
(though it may start its own). If the processor doesn't load the old
value it doesn't have to do any synchronization, and can just throw
the value at the memory hardware and leave the memory unit to pick an
arbitrary ordering of the stores.

There is no happens-before relationship between the store to x and the
load from x in this example. This use of a fence is not equivalent to
the use of load-acquire.

I'm not sure that even x86 CPUs guarantee the ordering in this case:
it isn't the normal transitive visibility because thread 2 doesn't
load the value of m.

Anthony
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,920
Members
47,462
Latest member
ChanaLipsc

Latest Threads

Top