Optimizing compiler reordering statements

B

Boon

Hello,

I'm involved in a discussion on comp.programming.threads regarding optimizing
compilers reordering statements.

Consider the following code.

struct foo { int a, b, c; };

int f1(int);
int f2(int);
int f3(int);

void set_foo(struct foo *bar)
{
bar->a = f1(bar->a);
bar->b = f2(bar->b);
bar->c = f3(bar->c);
}

As far as I understand, an optimizing compiler may determine that each statement
modifies a different field in the structure, and said compiler may schedule each
operation in any order.

It would be conceivable for a compiler to generate code that executes f1(bar->a)
on one co-processor, f2(bar->b) on another, and f3(bar->c) on yet another, then
stores the results sequentially on a "master" CPU.

temp1 = f1(bar->a); /* on copro1 */
temp2 = f2(bar->b); /* on copro2 */
temp3 = f3(bar->c); /* on copro3 */
/* the three above statements are executed in parallel */
bar->c = temp3;
bar->b = temp2;
bar->a = temp1;
/* the three above statements are executed sequentially */

AFAICT, this would be valid, right?

This comes from a discussion regarding semaphores. I don't understand how
optimizing C compilers deal with semaphores.

Consider

struct foo
{
semaphore_t sem;
int i;
};

void increment(struct foo *bar)
{
sem_lock(&bar->sem);
++bar->i;
sem_unlock(&bar->sem);
}

Someone writing this code expects their compiler to output code with no
reordering allowed.

"i" must be incremented after sem_lock() is called, and before sem_unlock() is
called. How is an optimizing compiler made aware that the increment statement
may not be reordered? Where is the magic?

Regards.
 
C

Chris Dollin

Boon said:
I'm involved in a discussion on comp.programming.threads regarding optimizing
compilers reordering statements.

Consider the following code.

struct foo { int a, b, c; };

int f1(int);
int f2(int);
int f3(int);

void set_foo(struct foo *bar)
{
bar->a = f1(bar->a);
bar->b = f2(bar->b);
bar->c = f3(bar->c);
}

As far as I understand, an optimizing compiler may determine that each
statement modifies a different field in the structure, and said compiler may
schedule each operation in any order.

Only if it /knows/ that `f1` (etc) don't modify the struct
`bar` points to. There's not enough information in the
snippet above for it to make that deduction, so it can't
indulge in statement dancing without additional support.
 
B

Boon

Chris said:
Only if it /knows/ that `f1` (etc) don't modify the struct
`bar` points to. There's not enough information in the
snippet above for it to make that deduction, so it can't
indulge in statement dancing without additional support.

If I understand correctly, function calls "prevent" optimizations when the
compiler has no information about the semantics of said function (in other
words, most of the time).

Therefore

some_code1 (potentially a lot)
local_x = foob();
some_code2 (potentially a lot)

leaves less room for optimizations than

local_temp = foob();
some_code1
local_x = local_temp;
some_code2

because the compiler is "more free" to move code around in the second form than
in the first form? (Because code cannot be moved across the function call.)

Regards.
 
B

Ben Bacarisse

Boon said:
I'm involved in a discussion on comp.programming.threads regarding
optimizing compilers reordering statements.

Consider the following code.

struct foo { int a, b, c; };

int f1(int);
int f2(int);
int f3(int);

void set_foo(struct foo *bar)
{
bar->a = f1(bar->a);
bar->b = f2(bar->b);
bar->c = f3(bar->c);
}

As far as I understand, an optimizing compiler may determine that each
statement modifies a different field in the structure, and said
compiler may schedule each operation in any order.

Not without detailed knowledge of the three functions. The code must
behave as if the three assignments occur in the order given, but there
could be some re-arrangement if the compiler can prove that no
conforming program can detect the re-arrangement. In this case, it
can't because you don't show the function definitions.
It would be conceivable for a compiler to generate code that executes
f1(bar->a) on one co-processor, f2(bar->b) on another, and f3(bar->c)
on yet another, then stores the results sequentially on a "master"
CPU.

temp1 = f1(bar->a); /* on copro1 */
temp2 = f2(bar->b); /* on copro2 */
temp3 = f3(bar->c); /* on copro3 */
/* the three above statements are executed in parallel */
bar->c = temp3;
bar->b = temp2;
bar->a = temp1;
/* the three above statements are executed sequentially */

AFAICT, this would be valid, right?

Not in this specific case, no.
This comes from a discussion regarding semaphores. I don't understand
how optimizing C compilers deal with semaphores.

Consider

struct foo
{
semaphore_t sem;
int i;
};

void increment(struct foo *bar)
{
sem_lock(&bar->sem);
++bar->i;
sem_unlock(&bar->sem);
}

Someone writing this code expects their compiler to output code with
no reordering allowed.

Not exactly. They expect that all the effects to sem_lock will be
done by the time the call finishes and this is guaranteed at least to
the extent permitted by the "as if" rule. But the "as if" rule is
about the single-threaded abstract machine and this the compiler is
free (in theory) to all sorts of things that mess up multi-threaded
programs. For example, as far as the C standard is concerned, it may
be able to prove the sem_lock and sem_unlock have no visible effect on
the rest of the program and that, therefore, they can be removed
altogether.
"i" must be incremented after sem_lock() is called, and before
sem_unlock() is called. How is an optimizing compiler made aware that
the increment statement may not be reordered? Where is the magic?

The magic comes from other standards that define the meaning of
programs like yours. The compiler writer is aware of a whole lot more
than just the C standard. Of course, it helps that the compiler can
almost always never prove anything about external function like
sem_lock and sem_unlock so the calls a simply compiled without any
re-ordering.
 
F

Francis Moreau

Not without detailed knowledge of the three functions.  The code must
behave as if the three assignments occur in the order given, but there
could be some re-arrangement if the compiler can prove that no
conforming program can detect the re-arrangement.  In this case, it
can't because you don't show the function definitions.




Not in this specific case, no.






Not exactly.  They expect that all the effects to sem_lock will be
done by the time the call finishes and this is guaranteed at least to
the extent permitted by the "as if" rule.

What's "as if" rule ?

Thanks
 
K

Kaz Kylheku

Consider

struct foo
{
semaphore_t sem;
int i;
};

void increment(struct foo *bar)
{
sem_lock(&bar->sem);
++bar->i;
sem_unlock(&bar->sem);
}

Someone writing this code expects their compiler to output code with no
reordering allowed.

"i" must be incremented after sem_lock() is called, and before sem_unlock()
is called. How is an optimizing compiler made aware that the increment
statement may not be reordered? Where is the magic?

Nowhere. The standard C language has no notions of threads, and gives no
reliable way to write semaphore routines.

Synchronization primitives work in a machine and compiler specific way.

Behind the sem_lock and sem_unlock routines, there may be use of special
compiler-specific extensions (like gcc's __asm__ __volatile__ expressions).

These primitives have to ensure that not only will not the compiler reorder the
operations, but also that the memory hardware will not. On some
multiprocessors, store operations can complete out of order, and even
prefetched reads can be reordered. Special ``memory barrier'' instructions have
to be inserted to prevent such reordering.
 
B

Ben Bacarisse

Francis Moreau said:
On 23 juin, 17:19, Ben Bacarisse <[email protected]> wrote:

What's "as if" rule ?

C programs must behave "as if" there were run on the abstract machine
defined in the standard; but they don't have to behave in exactly this
way if no program could ever tell the difference without straying into
undefined behaviour.

In the context of the quote, the standard says all the side effects of
sem_lock() must have happened by the sequence point at the end of the
function call statement, but the real program only has to behave "as
if" this is the case. If a well-defined program can't tell the
difference, the implementation can do something else -- it can delay
some of the side effects or remove the call altogether or do pretty
much anything else that is undetectable by the rest of the program.
 
J

jameskuyper

Francis said:
What's "as if" rule ?

The "as-if" rule is a nick-name for 5.1.2.3p5:
The least requirements on a conforming implementation are:
— At sequence points, volatile objects are stable in the sense that previous accesses
are complete and subsequent accesses have not yet occurred.
— At program termination, all data written into files shall be identical to the result that
execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place as specified in
7.19.3. The intent of these requirements is that unbuffered or line-buffered output
appear as soon as possible, to ensure that prompting messages actually appear prior to
a program waiting for input.

It's called the "as-if" rule, because it means that an implementation
of C doesn't need to translate your program into code that works the
same way as the abstract semantics, so long the code produces exactly
the same observable behavior AS-IF the abstract semantics were
obeyed. The C standard doesn't actually use the term "observable
behavior" anywhere except 6.7.3p7 in the description of the restrict
qualifier, and it never defines the term. What I mean by that phrase
is "accesses to volatile objects, data written to files, and i/o with
interactive devices". For example, the following C code:

a = b;
c = d;

is supposed to result in the first statement being executed
completely, before the second statement is executed. However, if 'a',
'b', 'c' and 'd' are all names of non-volatile objects, 5.1.2.3p5
allows a compiler to reorder the statements as

c = d;
a = b;

because the reordering has no effect on the observable behavior of the
code.

There's a similar rule in C++ (1.9p1), which says "... conforming
implementations are required to emulate (only) the observable behavior
of the abstract machine as explained below". It then defines "The
observable behavior of the abstract machine is its sequence of reads
and writes to volatile data and calls to library I/O functions." This
is close to the definition I used above for "observable behavior" when
describing the C rule, except that the C++ concept includes reading
from files, and not merely writing to them, and it concentrates on the
calls to the I/O functions, rather than the I/O that occurs as a
result of those calls.

I think that using the concept of "observable behavior" makes it
easier to understand the as-if rule, and I hope that the C committee
considers modifying 5.1.2.3p5 in some future version of the standard
to use a similar approach. The C++ standard contains a footnote which
identifies section 1.9p1 as the "as-if" rule, which is another idea I
think the C standard should borrow.
 
F

Francis Moreau

Hello James,

The "as-if" rule is a nick-name for 5.1.2.3p5:


It's called the "as-if" rule, because it means that an implementation
of C doesn't need to translate your program into code that works the
same way as the abstract semantics, so long the code produces exactly
the same observable behavior AS-IF the abstract semantics were
obeyed.  The C standard doesn't actually use the term "observable
behavior" anywhere except 6.7.3p7 in the description of the restrict
qualifier, and it never defines the term. What I mean by that phrase
is "accesses to volatile objects, data written to files, and i/o with
interactive devices". For example, the following C code:

    a = b;
    c = d;

is supposed to result in the first statement being executed
completely, before the second statement is executed. However, if 'a',
'b', 'c' and 'd' are all names of non-volatile objects, 5.1.2.3p5
allows a compiler to reorder the statements as

    c = d;
    a = b;

because the reordering has no effect on the observable behavior of the
code.

There's a similar rule in C++ (1.9p1), which says "... conforming
implementations are required to emulate (only) the observable behavior
of the abstract machine as explained below". It then defines  "The
observable behavior of the abstract machine is its sequence of reads
and writes to volatile data and calls to library I/O functions." This
is close to the definition I used above for "observable behavior" when
describing the C rule, except that the C++ concept includes reading
from files, and not merely writing to them, and it concentrates on the
calls to the I/O functions, rather than the I/O that occurs as a
result of those calls.

I think that using the concept of "observable behavior" makes it
easier to understand the as-if rule, and I hope that the C committee
considers modifying 5.1.2.3p5 in some future version of the standard
to use a similar approach. The C++ standard contains a footnote which
identifies section 1.9p1 as the "as-if" rule, which is another idea I
think the C standard should borrow.

Thanks for your deep and usefull explanation.
 
F

Francis Moreau

C programs must behave "as if" there were run on the abstract machine
defined in the standard; but they don't have to behave in exactly this
way if no program could ever tell the difference without straying into
undefined behaviour.

In the context of the quote, the standard says all the side effects of
sem_lock() must have happened by the sequence point at the end of the
function call statement, but the real program only has to behave "as
if" this is the case.  If a well-defined program can't tell the
difference, the implementation can do something else -- it can delay
some of the side effects or remove the call altogether or do pretty
much anything else that is undetectable by the rest of the program.

thanks for your explanation.
 
C

Chris M. Thomasson

Boon said:
Hello,

I'm involved in a discussion on comp.programming.threads regarding
optimizing compilers reordering statements. [...]

This comes from a discussion regarding semaphores. I don't understand how
optimizing C compilers deal with semaphores.

Consider

struct foo
{
semaphore_t sem;
int i;
};

void increment(struct foo *bar)
{
sem_lock(&bar->sem);
++bar->i;
sem_unlock(&bar->sem);
}

Someone writing this code expects their compiler to output code with no
reordering allowed.

"i" must be incremented after sem_lock() is called, and before
sem_unlock() is called. How is an optimizing compiler made aware that the
increment statement may not be reordered? Where is the magic?

The magic comes into play when you use a POSIX compliant compiler. However,
compilers are know to have bugs from time to time:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/63f6360d939612b3
(read all!)


especially:

http://groups.google.com/group/comp.programming.threads/msg/729f412608a8570d
 
C

Chris M. Thomasson

Boon said:
If I understand correctly, function calls "prevent" optimizations when the
compiler has no information about the semantics of said function (in other
words, most of the time).
[...]

Beware of link time optimizations. Here is a very small excerpt of some
caveats wrt a certain library of mine that uses advanced synchronization
techniques:



" In order to create a proper implementation of my library you must define
exactly how it solves two very important and fundamental problems:


1: Compiler reordering
2: Hardware reordering


The compiler reordering issue can "usually" be resolved by adhering to a
stringent design policy. It declares that any function that contains a
"critical-sequence" of instructions that must be executed in a precise order
has to be externally assembled and its declaration must be properly
decorated. This requirement exists because the current C/C++ Standard
doesn't even think threads exist. A typical C compiler is usually forced to
treat any call into an "unknown and external" function in a fairly
"pessimistic" manor", which in turn basically renders its behavior to
something that is analogous to a so-called "compiler barrier". However,
please note that some compilers are exploring link-time optimizations which
can, and probably will, turn out to be an annoying and potentially hazardous
scenario to any function that simply expects every instruction it's made up
of will be executed exactly as-is. Unfortunately, this definitely includes
basically all externally assembled functions that a lock/wait-free library
may export by default. However, all is not lost because it does seem like
the compilers that do support link-time optimizations' also provide some
method for turning it on/off. Usually, they allow you to decorate your
"critical-function" declarations with something that guarantees that they
will never be subjected to this particular type of optimization. Now you now
should know a reason or two why I trust my synchronization to be handled by
an Assembler.


Hardware reordering is solved by placing the proper the memory barrier
instructions in the correct places throughout your externally assembled
lock/wait-free library. Basically, Assembly Language is the only real
solution to the reordering issues attributed to the hardware itself. An
Assembler is in a completely different universe than C/C++ when it comes
down to this kind of stuff. This is because you need to have full access to
the instruction-set that is provided by the architecture you're targeting
and you need to have a guarantee that no critical sequence of assembly
statements will ever be reordered. C/C++ doesn't provide a solution to this;
however you can assemble a library and base its ABI on the C calling
convention for the architecture you're targeting. You then create a C header
file and properly decorate all of your libraries API functions. Now, you can
use it with a C/C++ compiler in a safe and effective manner simple because
no critical aspect of your library will be exposed to them.
Therefore, it is my theses that a safe method for ensuring that calls into
"critical-function" will not be tampered with must include a combination of
solutions that directly resolve all of the reordering issues that are
attributed to both the hardware your targeting, and the C compiler your
using."
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Boon said:
Hello,

I'm involved in a discussion on comp.programming.threads regarding
optimizing compilers reordering statements. [...]

This comes from a discussion regarding semaphores. I don't understand how
optimizing C compilers deal with semaphores.

Consider

struct foo
{
semaphore_t sem;
int i;
};

void increment(struct foo *bar)
{
sem_lock(&bar->sem);
++bar->i;
sem_unlock(&bar->sem);
}

Someone writing this code expects their compiler to output code with no
reordering allowed.

"i" must be incremented after sem_lock() is called, and before
sem_unlock() is called. How is an optimizing compiler made aware that the
increment statement may not be reordered? Where is the magic?

The magic comes into play when you use a POSIX compliant compiler.
[...]

Well, the magic also comes into play if your using a platform (e.g., POSIX
aside) that has threads and a compiler for said platform understands that.
The compiler needs to explicitly document that it understands your platform
and can handle code which uses its threading API's.

For instance, the way that MSVC explicitly understands and documents that
Windows supports threads, and acts accordingly in response to threading API
calls (e.g., reduce optimizations, whatever)...
 
B

Boon

Chris said:
A typical C compiler is usually forced to treat any call into an "unknown and
external" function in a fairly "pessimistic" manor [...]

ITYM manner ;-)

A manor is a large house (which gives your sentence an interesting angle)
 
B

Boon

Boon said:
If I understand correctly, function calls "prevent" optimizations when
the compiler has no information about the semantics of said function (in
other words, most of the time).

Therefore

some_code1 (potentially a lot)
local_x = foob();
some_code2 (potentially a lot)

leaves less room for optimizations than

local_temp = foob();
some_code1
local_x = local_temp;
some_code2

because the compiler is "more free" to move code around in the second
form than in the first form? (Because code cannot be moved across the
function call.)

Is this correct? (Ignoring link-time optimization)
 
C

Chris M. Thomasson

Boon said:
Chris said:
A typical C compiler is usually forced to treat any call into an "unknown
and
external" function in a fairly "pessimistic" manor [...]

ITYM manner ;-)

A manor is a large house (which gives your sentence an interesting angle)

LOL!

fuc%ing typos!


;^o
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top