Just a couple of nits...
[...]
goodfella, please actually read C++ And The Perils Of Double
Checked
Locking.
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
It explains a lot of this most clearly.
A lot of people in here are still describing the problems in
terms of reordering.
The issue is related to reordering---to the writes occuring or
being seen in a different order than what you wrote.
This is a bad way of thinking about it.
Let me again emphasize what I wrote earlier: Two writes
made by one thread may be seen to happen in different
orders by two reader threads, on the exact same execution,
on the exact same writes. One cannot correctly reason
about threading by examining the possible interleavings of
instructions. It does not work when discussing portable
threading.
Exactly. The two threads see the writes in different
orders. Reordering.
One can think about some cases in terms of reordering, but one
cannot prove correctness in the general case by considering
all possible reorderings. Again, take this example:
//Initially:
x = 0
y = 0
//Thread 1
x = 1
y = 2
//Thread 2, 3, 4, 5
print x, y
If threads 1 through 5 are started at the same time, on a
single execution you might end up with all 4 possible
combinations printed, (0, 0), (1, 0), (0, 2), (1, 2). One
cannot describe this in terms of reorderings.
But that's exactly what it is. The different threads see the
writes in a different order (reordered) from one another.
There is no global ordering,
No. Thread 1 executes the writes in a specific order, say first
x, then y. The other threads see these writes in a different
order---reordered, with respect to the thread which actually did
the writing.
Specifically, my example was thread 1 does write X then write Y.
Thread 2 may see write X without seeing write Y, and thread 3 may see
write Y without seeing write X.
Who said anything about instructions. It's when the different
threads see the effects which is reordered, not the execution
order of the instructions (which is, obviously, invisible to
the other threads).
Useless semantic quibbling, though I shall try to be more precise in
the future.
We're discussing a level below what the compiler does. You can
verify manually what the compiler does, and once it has done it,
the order of the instructions won't change. Things like cache
handling and pipelining, on the other hand, are run-time
phenomena, and can easily result in different behavior each time
you execute the sequence.
To artificially limit the discussion to the "compiler", and not the
rest of the implementation is .. "un-good".
To repeat what you said earlier in your post:
But that's exactly what it is. The different threads see the
writes in a different order (reordered) from one another.
We're partially having an argument over semantics, over definitions,
and I do ever so hate such arguments. However, I will stick by my
claim on this one: I strongly dislike using the terms "order of writes
and reads" and "reorder of writes as reads" as you use them because it
suggests that one can correctly reason about threading by examining
the possible interleavings of reads and writes, and this is simply not
true for POSIX implementations, win32 threading, Java threading, and
all other threading known to me (which honestly isn't much). There are
cases where the algorithm is correct over all possible reorderings of
reads and writes, but not when you remove the assumption that writes
done by some thread will be visible to other threads in some unique,
specific order. Reader thread A may see some order, and reader thread
B may see an entirely different order. Sure, some hardware may make
that assumption true, but I won't rely on that, at least not for
supposedly portable code.
I reject so strongly because almost all of the threading mistakes I
see by my colleagues are because they are using this naive
understanding of threading. They attempt to look at the possible
interleavings of reads and writes in the absence of synchronization,
and then they claim that the algorithm is correct. I've had to fix
such code many times, and I've had to fix such bad understanding from
otherwise good programmers many times as well.
Unfortunately, this probably goes back to how they were taught threads
in college, or in their book. I'm sure it starts off with a simple
description of a machine, without a pipeline or anything crazy,
"faithfully" executing one instruction at a time, and then they just
add two executors to explain threading. This is how many people
understand threading in C++ and Java, and it is inherently broken.
When talking of pipelines, caching, hardware reordering, etc., they
attempt to account for it in their simple model, considering more
reorderings, but at the end of the day it still doesn't work. Talking
of reorderings only perpetuates this misunderstanding. One must
understand the guarantees made by your primitives, and POSIX, win32,
and Java guarantees are not made in terms of reorderings as you use
the term.
If I execute a store instruction, a write will be inserted into
the write pipeline. Sooner or later, it will come out at the
other end, and global memory will be written (assuming cache
write through, which is fairly universal).
The write will always eventually end up in global memory. I'm
less sure that a read will always end up going to global memory;
I can easily imagine a processor keeping the results of a read
around, and in some exotic scenarios, never even going to cache.
(I'm talking here about the case where there are multiple
executions of a load instruction, not what the compiler might
do.)
Except when you're on different processors, as mentioned by REH in-
thread.
Or except when you're in a tight busy loop:
while (!flag)
thread_yield();
and the compiler or the hardware decides to not re-load the flag from
memory and keeps it in a register. Also, it doesn't have to be this
simple. For something slightly more complex, the compiler may optimize
away the memory loads and stores, and keeps the value in a register or
on the stack. I frankly do not want to decide what is "sufficiently
complex" that the optimizer will not break it by examining C++ source
code.
Or your compiler decides that the write is dead code through some
(possibly "broken") analysis which doesn't take into account other
threads.
Or a variety of other situations which I don't know offhand. I will
program to the spec and not to any particular implementation, lest the
implementation changes, "breaking" my code.