out of order execution / reoredering of instructions

junky_fellow · Dec 2, 2006

Hi,

I read in some of the articles that there are two types of
"reordering"
of instructions. One at the compiler level and other at the
processor/CPU
level. What is the difference between these two ?

How can a C programmer, prevent the reordering of instructions by
compiler
and by the CPU ?

In what cases, the programmer should take care of these reorderings ?

Can I suppress reordering (by compiler ) by using memory barriers ?

For instance, I want to update two memory locations in order, can
I do it as follows:

func()
{
int *ptr1; int *ptr2;

/*code where ptr1 and ptr2 are initialized */

*ptr1 = some_val; /* *ptr1 should be updated before *ptr2
*/

mb(); /* memory barrier routine */

/* some code */

*ptr2 = some_other_val;
}

I want to know if calling mb(), ensures ordering of the two updates.

Richard Heathfield · Dec 2, 2006

(e-mail address removed) said:

How can a C programmer, prevent the reordering of instructions by
compiler
and by the CPU ?

You can enforce the program logic you want by careful use of sequence
points. From a language perspective, there is no mechanism for dictating to
the compiler how it should translate your code.

In what cases, the programmer should take care of these reorderings ?

As long as the program computes the result properly, why should we care what
order things happen in?

jacob navia · Dec 2, 2006

(e-mail address removed) a écrit :

Look junky_fellow, why is this a problem for you?
Why do you want to order the instructions your way?

You suspect a bug in the compiler?

Please explain.

junky_fellow · Dec 2, 2006

Richard said:
(e-mail address removed) said:

You can enforce the program logic you want by careful use of sequence
points. From a language perspective, there is no mechanism for dictating to
the compiler how it should translate your code.

As long as the program computes the result properly, why should we care what
order things happen in?

In a device driver code, we need to write some specific value to
the device registers in some specific sequence. If that sequence
is changed, the device may not work as desired. In such cases,
how can we prevent reordering ?

In a multiprocessor environment, where multiple threads may
execute parallelly on different processors, it is required that
the global variables (that multiple threads may access) may
be modified in some sequence by a particular thread. How,
such can this be achieved ?

pete · Dec 2, 2006

In a device driver code, we need to write some specific value to
the device registers in some specific sequence. If that sequence
is changed, the device may not work as desired. In such cases,
how can we prevent reordering ?

Like Richard said, "sequence points"

In a multiprocessor environment, where multiple threads may
execute parallelly on different processors, it is required that
the global variables (that multiple threads may access) may
be modified in some sequence by a particular thread. How,
such can this be achieved ?

That would probably be on topic in a newsgroup
that deals with threads.

junky_fellow · Dec 2, 2006

pete said:
Like Richard said, "sequence points"

I would try to explain the problem with an example.
It may be possible that there is something wrong in my
understanding.

suppose we need to write two different values to a
two different device registers that are memory mapped.
Let the address of regsiter1 is in ptr1 and address of
register2 in ptr2.

int init_device()
{
/* some code here for initialization of pointers */
*ptr1 = some_value;
*ptr2 = some_other_value;
}

As, the two stores are not dependent on each other, compiler
may reverse the order in which these regsisters are updated.
I mean to say, register2 may be updated first and register1
may be updated later.
How can this be avoided ?
Can, an insertion of memory barrier in between these two
statments prevent this reordering ?

Guest · Dec 2, 2006

In a device driver code, we need to write some specific value to
the device registers in some specific sequence. If that sequence
is changed, the device may not work as desired. In such cases,
how can we prevent reordering ?

In a multiprocessor environment, where multiple threads may
execute parallelly on different processors, it is required that
the global variables (that multiple threads may access) may
be modified in some sequence by a particular thread. How,
such can this be achieved ?

In both cases, the volatile keyword can be used to say that accesses to
your registers and variables may have effects the compiler does not
know about.

Eric Sosman · Dec 2, 2006

I would try to explain the problem with an example.
It may be possible that there is something wrong in my
understanding.

suppose we need to write two different values to a
two different device registers that are memory mapped.
Let the address of regsiter1 is in ptr1 and address of
register2 in ptr2.

int init_device()
{
/* some code here for initialization of pointers */
*ptr1 = some_value;
*ptr2 = some_other_value;
}

As, the two stores are not dependent on each other, compiler
may reverse the order in which these regsisters are updated.
I mean to say, register2 may be updated first and register1
may be updated later.
How can this be avoided ?

This is what the `volatile' keyword is for. However, even
though the keyword itself is portable, its precise meaning is
implementation-dependent. (Anybody who wants to dispute this
is encouraged to read the final sentence of 6.7.3/6 first.)

Can, an insertion of memory barrier in between these two
statments prevent this reordering ?

C doesn't have "memory barriers." <off-topic> But you may
(or may not) need them anyhow. If you do, they'll be entirely
platform-specific and not a topic for this forum. </off-topic>

jacob navia · Dec 2, 2006

(e-mail address removed) a écrit :

In a device driver code, we need to write some specific value to
the device registers in some specific sequence. If that sequence
is changed, the device may not work as desired. In such cases,
how can we prevent reordering ?

In a multiprocessor environment, where multiple threads may
execute parallelly on different processors, it is required that
the global variables (that multiple threads may access) may
be modified in some sequence by a particular thread. How,
such can this be achieved ?

It would be highly surprising that your compiler doesn't provide a
switch to turn instruction reordering OFF. Look again in your compiler's
documentation. Also, try to turn optimizations OFF and see if this changes.

As a last resort use the volatile keyword, as recommended by the other
posters

junky_fellow · Dec 2, 2006

jacob said:
It would be highly surprising that your compiler doesn't provide a
switch to turn instruction reordering OFF. Look again in your compiler's
documentation. Also, try to turn optimizations OFF and see if this changes.

As a last resort use the volatile keyword, as recommended by the other
posters

I am using cygwin/gcc. I dont want to suppress all other optimizations
that are done by the compiler. I just want to avoid any reordering.
Can you please suggest which optimization option should I pass to gcc ?

Guest · Dec 2, 2006

I am using cygwin/gcc. I dont want to suppress all other optimizations
that are done by the compiler. I just want to avoid any reordering.
Can you please suggest which optimization option should I pass to gcc ?

Don't do that. I'm not sure why jacob navia discouraged the use of
volatile (perhaps he can explain), but from gcc's point of view, if you
don't use volatile, the code is broken, and even if you find the
command-line options that make it do what you want currently (I'm not
sure if there are any), there will be no effort to make sure that those
same command-line options continue to work the same way in future gcc
versions.

CBFalconer · Dec 2, 2006

[email protected] said:
I am using cygwin/gcc. I dont want to suppress all other
optimizations that are done by the compiler. I just want to avoid
any reordering. Can you please suggest which optimization option
should I pass to gcc ?

Isolate the accesses to one C file, and compile that file with
optimisation inhibited.

Please do NOT remove attribution lines for material you quote.
Attribution lines are those initial lines that say "Joe wrote:".

--
Some informative links:
< <http://www.geocities.com/nnqweb/>
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/>

Kenny McCormack · Dec 2, 2006

(e-mail address removed) a écrit :

Look junky_fellow, why is this a problem for you?
Why do you want to order the instructions your way?

You suspect a bug in the compiler?

Please explain.

He's only explained it about 3 times now.

junky_fellow · Dec 2, 2006

Harald said:
from gcc's point of view, if you
don't use volatile, the code is broken, and even if you find the
command-line options that make it do what you want currently (I'm not
sure if there are any), there will be no effort to make sure that those
same command-line options continue to work the same way in future gcc
versions.

Sorry guys, I may be getting off topic, but any hints/help or
any pointer to any link would do.
I was looking at some code that is written
for an SMP system. My problem is again related to the
reordering of instructions.

some_function(struct some_struct *strptr)
{
take_spin_lock(&global_variable); /* line 1 */

strptr->some_member = some_value; /* line 2 */
some_other_global_var++; /* line 3 */

release_spin_lock(&global_variable); /* line 4 */
}

In the code, neither "global_variable" nor "strptr" and
"some_other_global_var" are volatile.
Also, any member of "strptr" and "some_other_global_var"
should only be changed after holding the spin_lock.

Now, my question is that, is it possible that compiler
may reorder these instructions so that line 2 and line 3 are
executed before line 1 ? Note: from compiler point of view the
operations are independednt.
In that case, taking spin lock is of no use and the code
may not work as desired.
In all other functions, I am finding the similar issues but
nowhere any variable is declared to be volatile. Also,
I believe the optimization is not switched off.

I also believe that there is no bug in this code, but
dont know how it works ?
Can anybody, please tell me what is that I am missing ?

Guest · Dec 2, 2006

Sorry guys, I may be getting off topic, but any hints/help or
any pointer to any link would do.
I was looking at some code that is written
for an SMP system. My problem is again related to the
reordering of instructions.

some_function(struct some_struct *strptr)
{
take_spin_lock(&global_variable); /* line 1 */

strptr->some_member = some_value; /* line 2 */
some_other_global_var++; /* line 3 */

release_spin_lock(&global_variable); /* line 4 */
}

In the code, neither "global_variable" nor "strptr" and
"some_other_global_var" are volatile.
Also, any member of "strptr" and "some_other_global_var"
should only be changed after holding the spin_lock.

Now, my question is that, is it possible that compiler
may reorder these instructions so that line 2 and line 3 are
executed before line 1 ? Note: from compiler point of view the
operations are independednt.

In the general case, if they are ordinary functions for which no
definition is available, the compiler must assume that they may read
and/or modify strptr's members and other global variables, so
reordering is not possible. If a definition is available, it depends on
that definition. For example, it's possible that
take_spin_lock(&global_variable); converts &global_variable from
pointer-to-unqualified-type to pointer-to-volatile-qualified-type, in
which case it has (almost) the same effect as declaring global_variable
volatile directly. For another example, take_spin_lock may make use of
implementation-specific extensions (which are OT here) which
effectively cause the same behaviour.

jacob navia · Dec 2, 2006

Kenny McCormack a écrit :

He's only explained it about 3 times now.

Yes, thanks I can read...
When I sent the message he didn't

Dan Henry · Dec 2, 2006

On 2 Dec 2006 04:07:26 -0800, "(e-mail address removed)"

In a device driver code, we need to write some specific value to
the device registers in some specific sequence. If that sequence
is changed, the device may not work as desired. In such cases,
how can we prevent reordering ?

CPUs having execution units that perform reordering or having
bus/memory controllers that perform write reordering also have
instructions for controlling and synchronizing such operations. You
may have to write your own C-callable functions or macros to access
these instructions.

In a multiprocessor environment, where multiple threads may
execute parallelly on different processors, it is required that
the global variables (that multiple threads may access) may
be modified in some sequence by a particular thread. How,
such can this be achieved ?

Same as above.

Jean-Marc Bourguet · Dec 2, 2006

There is something wrong in this sentence: it implies that there is an
ordering of instructions deductible from the C source. There is no such
thing. The nearest existing thing is a partial ordering on memory access.

In a device driver code, we need to write some specific value to
the device registers in some specific sequence. If that sequence
is changed, the device may not work as desired. In such cases,
how can we prevent reordering ?

The first thing is to indicate to the compiler which objects you care
about. All access to those should be qualified as volatile (use volatile
variable or pointer to volatile) so that they are considered as observable
behavior and so can not be reordered or removed as easily.

The second thing is to understand the rules of the partial ordering. A
short version: use one access whose orderubg you care about per statement.
For a long version search about sequence points.

Even if the requirements of the standard are fussy (it at least leaves the
definition of what an access is implementation defined), these two steps
should insure that at the machine langage level all access are present (ie
that the optimiser will not delete an access it considers as redundant) and
in the correct order.

The final step is ensuring that the processor will not play tricks with
your accesses. Using memory barriers will remove some of the tricks but
not other. Notifying the processor that the memory region is used for IO
and should not be cached should be enough. If you are writing a device
driver, your OS should provide an API for that.

In a multiprocessor environment, where multiple threads may execute
parallelly on different processors, it is required that the global
variables (that multiple threads may access) may be modified in some
sequence by a particular thread. How, such can this be achieved ?

Volatile is of no use in this context as for volatile to be effective, you
need non cached memory and OS usually don't provide an API allowing
unprivileged application to have some. But in this context you have weaker
ordering constaints and while C does not provide anything which help, OS
synchronization primitives and memory barriers provides what is needed.
Ask on comp.programming.threads.

Yours,

Eric Sosman · Dec 2, 2006

Sorry guys, I may be getting off topic, but any hints/help or
any pointer to any link would do.
I was looking at some code that is written
for an SMP system. My problem is again related to the
reordering of instructions.

some_function(struct some_struct *strptr)
{
take_spin_lock(&global_variable); /* line 1 */

strptr->some_member = some_value; /* line 2 */
some_other_global_var++; /* line 3 */

release_spin_lock(&global_variable); /* line 4 */
}

In the code, neither "global_variable" nor "strptr" and
"some_other_global_var" are volatile.
Also, any member of "strptr" and "some_other_global_var"
should only be changed after holding the spin_lock.

Now, my question is that, is it possible that compiler
may reorder these instructions so that line 2 and line 3 are
executed before line 1 ? Note: from compiler point of view the
operations are independednt.
In that case, taking spin lock is of no use and the code
may not work as desired.

From C's point of view, the undesired reorderings are
permitted if and only if nothing in take_spin_lock() and/or
release_spin_lock() affects or is affected by or even "sees"
strptr->some_member and some_other_global_var.

From the compiler's point of view, it is very difficult
to know what tsl() and rsl() might do, and hence to "prove"
that they operate independently of s->sm and sogv. This
usually means that the compiler will not reorder operations
across function calls if they involve potentially non-local
variables. ("Potentially non-local" isn't a phrase you'll
find in the Standard, but I hope it makes sense anyhow.) The
conservative approach is pretty much obligatory when calling
an external function whose innards are not "visible."

From the point of view of other standards that apply to
parallel programs, certain functions carry special guarantees
and it is the compiler's duty not to violate them. For example,
the POSIX threading standard requires pthread_mutex_lock() to
behave as everything initiated before the call completes before
the function returns, and that nothing initiated after the
return begins before the call. (I'm speaking very loosely here;
go to comp.programming.threads if you want more detail.) POSIX
doesn't say how the implementation achieves this effect, just
that an implementation that fails to achieve it is broken.

It may be that take_spin_lock() and release_spin_lock() are
covered by similar guarantees on the system where they're being
used, or it may simply be that whoever invented them relied on
the compiler being "conservative" in the sense mentioned above.

In all other functions, I am finding the similar issues but
nowhere any variable is declared to be volatile. Also,
I believe the optimization is not switched off.

On c.p.t. it's pretty much a FAQ that `volatile' is neither
necessary nor sufficient for thread synchronization. (It is
still necessary for some other purposes, though. For example,
you need it for the situation in your original question, which
does not seem related the question you're asking now.)

I also believe that there is no bug in this code, but
dont know how it works ?
Can anybody, please tell me what is that I am missing ?

The code relies on guarantees that are not part of the C
language, that's all. It may come as a shock to some, but the
C Standard is not the only Standard in the universe, and the
universe holds many useful things not found in Standards.

Stephen Sprunk · Dec 2, 2006

In a device driver code, we need to write some specific value to
the device registers in some specific sequence. If that sequence
is changed, the device may not work as desired. In such cases,
how can we prevent reordering ?

Declare the variables to be volatile and the compiler won't be allowed
to play games with the ordering.

The CPU is still allowed to reorder things, but the hardware folks spend
a lot of effort making sure that the result acts like they didn't.
Don't worry about CPU reordering.

In a multiprocessor environment, where multiple threads may
execute parallelly on different processors, it is required that
the global variables (that multiple threads may access) may
be modified in some sequence by a particular thread. How,
such can this be achieved ?

The C Standard doesn't know about threads. However, some other
standard, e.g. POSIX, may apply to your system and have functions that
provide such guarantees. For instance, any code that comes after
pthread_mutex_lock() will be guaranteed not to be executed before that
function call. In general, that guarantee is true for any function call
unless the compiler can prove that the reordering is safe (and most
compilers don't even try to prove it, since they rarely have the
information they need to do so).

You are right to worry about reordering and such, but there are much
simpler ways to achieve what you're trying to do. Use the tools that
the various standards provide for you, and examine the code of other
folks that have solved the same problems before. You're not the first
person to try to write multithreaded code or device drivers in C; study
what others have done before you.

S

Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Order of execution	42	Jun 28, 2006
freeing memory....partially	12	Oct 4, 2003
Which functions act as memory barriers?	4	Nov 29, 2007
Bare metal.	0	Sep 14, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
The Semantics of 'volatile'	73	Jun 2, 2009
Paging Fault and Loops order	6	Nov 12, 2006

out of order execution / reoredering of instructions

junky_fellow

Richard Heathfield

jacob navia

junky_fellow

pete

junky_fellow

Guest

Eric Sosman

jacob navia

junky_fellow

Guest

CBFalconer

Kenny McCormack

junky_fellow

Guest

jacob navia

Dan Henry

Jean-Marc Bourguet

Eric Sosman

Stephen Sprunk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads