C
Chris M. Thomasson
[...]
What modifications of the DCL algorithm itself are needed in order to get it
to work in C/C++? I don't have to modify the actual algorithm in order to
get it to work in C/C++. I only have to implement the algorithm using highly
non-portable methods.
Unless I am misunderstanding you, I have to disagree here. The algorithm
needs proper synchronization on the fast-path and on the slow-path. On the
fast-path it needs a data-dependant load memory barrier, and on the slow
path it needs the mutex along with release semantics. There is no "single
check outside of all synchronization". AFAICT, that's a misunderstanding on
how the algorithm actually works. Skipping the mutex acquisition/release is
the only thing that DCL actually optimizes. Just because you can skip a call
to the mutex (e.g., fast-path) does not mean that you're somehow "outside of
all synchronization".
DCL is DCL. Even if I implement it in pure ASM, I still cannot get away from
using proper synchronization. If I implement DCL in C/C++ using
platform/compiler specific guarantees, well, it's still DCL, not something
different.
DCL is the most common application I see of this flawed reasoning. The
heart of DCL is that one relies on investigating the possible
interleavings of source code "instructions" to see if it will work
out. IMHO, as soon as you modify DCL to be correct in C or C++, it is
no longer DCL.
What modifications of the DCL algorithm itself are needed in order to get it
to work in C/C++? I don't have to modify the actual algorithm in order to
get it to work in C/C++. I only have to implement the algorithm using highly
non-portable methods.
You have fundamentally changed the basis of your belief
of its correctness when you add in proper synchronization. No longer
are you doing a single check outside of all synchronization,
the hallmark of DCL. It only superficially resembles DCL.
Unless I am misunderstanding you, I have to disagree here. The algorithm
needs proper synchronization on the fast-path and on the slow-path. On the
fast-path it needs a data-dependant load memory barrier, and on the slow
path it needs the mutex along with release semantics. There is no "single
check outside of all synchronization". AFAICT, that's a misunderstanding on
how the algorithm actually works. Skipping the mutex acquisition/release is
the only thing that DCL actually optimizes. Just because you can skip a call
to the mutex (e.g., fast-path) does not mean that you're somehow "outside of
all synchronization".
Now, arguably, your post Jerry and mine is merely a definition over
terms. I hope I've generated more light than heat, but I don't think
it will be useful to get into a discussion of whether DCL is by
definition broken or merely usually incorrectly implemented. Define it
however you want. I still think that DCL by definition is incorrect in
C and C++, and modifications to make it correct render it no longer
DCL.
DCL is DCL. Even if I implement it in pure ASM, I still cannot get away from
using proper synchronization. If I implement DCL in C/C++ using
platform/compiler specific guarantees, well, it's still DCL, not something
different.