FWIW, I was thinking of C++03 with pthread (GCC) or Windows (VC++)
threading models.
For most intents and purposes, the memory models of C++0x, POSIX
pthreads, win32, and Java 1.5 and higher, are the same. There's some
important differences between them (like Java "final" keyword, and the
Java "out of thin air" guarantee), but for the basics like visibility
guarantees w.r.t. mutexes, it's the same all around.
You said that "I was under the impression that [...] it is guaranteed
that <blah1> happens before <blah2>, [...]". Why is that? More
particularly, what does that mean?
Let me try to explain. Taking a step back, consider:
int a = 1;
int b = 2;
global_c = a + b;
global_d = a - b;
We have no guarantees that the assignment to global_c happens before
global_d. Why? This is equivalent to the phrasing that no conforming
program could tell the difference if in fact the compiler "switched"
the two. It's the basic "as-if" rule.
Understood and agreed -- without locks. I was under the impression that
locks change this, by adding "observable" points to the program.
With multi-threading, the program no longer has exactly one possible
allowed execution. (In practice, I would guess most single threaded
programs have more than one allowed execution given things like
implementation defined behavior, and various QoI and system specs like
stack size, heap space, and so on.)
With multi-threading, this changes. No longer are you guaranteed a
particular execution. Instead, there is a set of allowed executions.
The set of allowed executions is quite annoying and complex to define
formally, but you are right that locks restrict this set. Some
executions are not allowed because they violate some of the ordering
rules of the memory model, such as the ordering rules required by
locks. However, you are still left with a rather large set of allowed
executions, and a conforming implementation is free to choose to do
whichever it wants.
How about the typical situation: short-running, locked synchronizing
code with long-running, unlocked working code? blah1 (retrieving tasks
out of a shared queue) and blah3 (placing results into a shared queue)
each a few milliseconds, blah2 (calculating the results) a few hours or
days, and a few threads doing this in parallel. I'm sure I can tell the
difference between the two versions.
I was using an idiomatic meaning of "tell the difference". Of course
one could tell the difference in practice between if a compiler does
tail-call recursion optimization and not, but a conforming program
would not have its formal observable behavior changed. That's what I
meant by "You couldn't tell the difference". I meant that a conforming
program with or without tail call recursion would still have its
formal observable behavior be one of the allowed behaviors.
Similarly, if the compiler does something "bad" like combine critical
sections when it shouldn't, you could tell the difference in practice,
just like you could tell the difference in practice between tail-call
recursion optimization and no tail-call recursion optimization.
Whether the compiler does or does not do those things is QoI. The
formal behavior with or without combining the critical sections is one
of the allowed behaviors.
I understand that this may be a QoI issue, but then again, if I can't
guarantee that the two days that blah2 runs are in fact unlocked, I
really can't use this system for this task.
Then you have a QoI issue. By the same token, you would have QoI
issues if the compiler otherwise did something else stupid. However,
combining or not combining those sections results in a formally
allowed observable behavior, and thus the compiler is allowed to do
it.
QoI of the compiler here
becomes QoI of the application, and if the compiler is free to do this,
it doesn't allow me to guarantee my application spec -- which I assume
to be a typical threaded application.
Then harass your compiler writer to fix it, or the scheduler writer to
fix it, or use a different compiler or scheduler. There are so many
ways that the implementation can be malicious. This is just one of
them.
In my example, which is nothing strange, it is required, and to
implement it, I need to use a compiler with a threading/memory model
that guarantees me that blah2 runs unlocked, so that it can run in
parallel.
You also need a compiler that properly does data flow analysis
optimizations, constant folding, constant propagation, common sub-
expression elimination, and so on. You also need a compiler that
doesn't simply produce a maliciously convoluted object file.
It's basically impossible for any general purpose portable memory
model to do as you want. How could you possibly specify that to work
with different kinds of scheduler, unknown schedulers that aren't even
written yet? It depends too much on the vagaries of the scheduler and
other such things. I wouldn't worry about it. At least, I would worry
about it as much as I would worry about stack sizes in programs. Most
of the time you don't need to. Sometimes you do, and you need to check
the outputted assembly.
Also, if you did do that, completely disallow combining nearby
critical sections, then you just removed some optimization
opportunities for IMHO no particularly good reason. I want the
compiler to be able to optimize the OP's problem by combining an empty
critical section with a nearby non-empty one.
This of course if the scheduler allows it to do so -- but
since the compiler doesn't control the scheduler, it can't really assume
much about it and IMO needs to unlock after blah1, even though it
doesn't know what the scheduler does with this. Whether it makes sense
to unlock and relock around blah2 /I/ need to decide when writing the
program, considering the particularities of the scheduler I'm intending
to run it on.
Again, perhaps it's bad QoI, and the compiler ought to fix it, but it
would result in an allowed formal observable behavior, an allowed
execution, and thus the compiler would be free to do it under the "as
if" rule.
This doesn't sound good
This relates to my earlier comment that I
think I need to look into the guarantees that the individual
implementations provide -- if this here reflects the C++03 guarantees,
IMO they alone are not very useful in practice.
C++03 doesn't have any relevant guarantees. AFAIK, this here reflects
all of the usual memory models (C++0x, POSIX pthreads, Java 1.5+,
win32).
To go back to the
example, I need to have the guarantee that blah2 runs unlocked (within
the overall scheduling granularity of the target system, of course).
As I said earlier, good luck attempting to write a portable
specification to that end, and you don't really "need" it. You need it
as much as you need more than 10 bytes of stack space, and you need it
as much as you need a compiler that won't maliciously pessimize your
programs. If you need more than that, you need something more than
what a portable standard can give you, and you need to check / write
the assembly yourself or maybe use some realtime POSIX extensions.