Short-circuiting in C

S

Shao Miller

Hey CLC,

I was wondering if there is a way to avoid the C short circuiting for
boolean expressions such as the following:

if(a && b && c) {
/* Do stuff */
}

I am currently doing GPU programming and it would be better to evaluate
the entire condition, even if a is 0, since branches are slow.

#include <stdio.h>

#define f(x) ((x) ? 1 : 0)

int main(void) {
int a, b, c = b = a = 0;
if (1 << 2 >> f(a) >> f(b) >> f(c))
puts("Boo.");
else
puts("Yay.");
return 0;
}
 
S

Shao Miller

Sorry, that was a bad example as GCC was clever and actually produced a
branch. So a better example might be:

#include <stdio.h>

#define f(x) ((x) ? 1 : 0)

int main(void) {
volatile int x = 1, a, b, c = b = a = 2;
if (x << 2 >> f(a) >> f(b) >> f(c))
puts("Boo.");
else
puts("Yay.");
return 0;
}
 
S

Shao Miller

Though the strategy'd grow pretty quickly, maybe even:

#include <stdio.h>

const char * func(int a, int b, int c) {
static const char * const ret[] = {"Boo.", "Yay."};
static const int table[2][2][2] = {{{0,0},{0,0}},{{0,0},{0,1}}};
return table[!!a][!!b][!!c][ret];
}

int main(void) {
puts(func(0, 0, 0));
puts(func(0, 0, 3));
puts(func(0, 5, 0));
puts(func(0, 7, 11));
puts(func(13, 0, 0));
puts(func(17, 0, 1));
puts(func(19, 23, 0));
puts(func(29, 31, 37));
return 0;
}
 
S

Stefan Ram

Eric Sosman said:
You've chosen the wrong implementation language.

Or the wrong newsgroup.

In C, however, sometimes, we can replace branching code with
non-branching code, e.g., under certain circumstances:

a += b ? c : d;

by

a += b * c +( 1 - b )*d;
 
S

Seebs

C has no way to express the computation you desire. C's abstract
machine has only one flow of control, if we ignore the rather mysterious
effects of asynchronous signals and `volatile' variable. There is no
C construct that will evaluate a,b,c in parallel execution paths and
then somehow combine the three results into a single test. You've
chosen the wrong implementation language.

While this is true, I've seen some cleverness. One of the geo random
number generators replaced:
if (x) {
y += z;
}
with
y += (z * x);
(where x was necessarily 0 or 1) and got a VERY noticeable performance
improvement.

That is generally the path to take for stuff like this, if your compiler
isn't clever enough to do what you mean. (In particular, a compiler which
produces multiple branches in the absence of side effects is not thinking
as clearly as it should, on a system where branches are expensive.)

-s
 
K

Keith Thompson

Billy Mays said:
I believe this will work, thanks!

You do? The point was to avoid differing branches depending on
differing values of a, b, and c. In the if() form, the branches are,
in some sense, even more explicit than in the && form.

The problem is that the C language defines *behavior*, and you're trying
to control the manner in which that behavior is created, i.e., the
sequence of machine-level operations. There's no 100% reliable way to
do that.

Of course you're going to have *some* inconsistent branching; depending
on the values of a, b, and c, each thread will or will not execute the
body of the if statement. So you're trying to minimize the possible
differences in branching.

I think your best bet is probably to try a few possible solutions and
examine the generated assembly or machine language for each one. If the
compiler knows that branching is expensive, it might optimize
if (a && b && c)
into straight-line code. That's assuming b and c have no side effects,
which you can guarantee by storing their values in temporary variables.
 
N

Nobody

I'm using OpenCL's library which then invokes the GPU. The programming
is done in a subset of C.

OpenCL may be closer to C than e.g. GLSL, but it still isn't C. And even
if it was, C doesn't define the execution time of any given construct, nor
treat it as significant (i.e. a compiler make whatever optimisations it
likes so long as the code produces the correct result, regardless of the
effect upon execution time).

AFAICT, the only reliable way to prevent short-circuting is to either
ensure that the later expressions have side-effects such that
short-circuiting would change the observable behaviour, or to "hide" their
implementation in a different translation unit so that the compiler cannot
determine whether short-circuiting will change the observable behaviour.

Either approach is likely to impose an unnecessary performance penalty on
either or both implementations (CPU and GPU). I don't know whether the
latter approach is even possible for OpenCL.

Actually, with modern CPUs performing speculative execution, I'm not sure
that the "hiding" approach will work. If the CPU hasn't performed the
calculation by the time it discovers that the result isn't needed, it
will just cancel it.
 
S

Shao Miller

You do? The point was to avoid differing branches depending on
differing values of a, b, and c. In the if() form, the branches are,
in some sense, even more explicit than in the&& form.

Agreed. I don't understand exactly what the original poster wants to
achieve, but my impression is that they want "lock-steppedness." :)
The problem is that the C language defines *behavior*, and you're trying
to control the manner in which that behavior is created, i.e., the
sequence of machine-level operations. There's no 100% reliable way to
do that.

Agreed. But one can make some educated guesses about the timing of
operations. In the table-using code I posted else-thread, if the 'func'
function is called with different arguments simultaneously, those
invocations should return the result simultaneously, barring external
interruptions. Of course, such a table strategy becomes unwieldy as
more parameters are required; heh.
Of course you're going to have *some* inconsistent branching; depending
on the values of a, b, and c, each thread will or will not execute the
body of the if statement. So you're trying to minimize the possible
differences in branching.

I think your best bet is probably to try a few possible solutions and
examine the generated assembly or machine language for each one.

Agreed. The 'gcc -save-temps' is handy for examining resulting '.s'
files. This shows that the table-using code doesn't have any branches
(excluding the function calls). Of course, 'gcc' mightn't be applicable
to the original poster's OpenCL needs.

Loops change timing, of course.
 
K

Keith Thompson

Scott Fluhrer said:
Well, it has a different meaning:
a && b && c is true if the three variables are all non-zero
!(a | b | c) is true if all three variables are zero

!(!a | !b | !c)

But this:

!!a & !!b & !!c

is probably clearer (though still, IMHO, not clear enough).

[...]
 
N

Noob

Eric said:
C has no way to express the computation you desire. C's abstract
machine has only one flow of control, if we ignore the rather mysterious
effects of asynchronous signals and `volatile' variable. There is no
C construct that will evaluate a,b,c in parallel execution paths and
then somehow combine the three results into a single test. You've
chosen the wrong implementation language.

How about changing

if (a && b && c)

into

if ((a != 0) & (b != 0) & (c != 0))

This will always evaluate a, b, and c (in an unspecified order).
 
I

ImpalerCore

Hey CLC,

I was wondering if there is a way to avoid the C short circuiting for
boolean expressions such as the following:

if(a && b && c) {
     /* Do stuff */

}

I am currently doing GPU programming and it would be better to evaluate
the entire condition, even if a is 0, since branches are slow.

Not sure if it would work, but what about evaluating the condition
outside of the 'if' statement? It has the overhead of allocating an
additional variable.

bool my_condition_is_true = a && b && c;
if (my_condition_is_true) {
/* Do stuff */
}

Best regards,
John D.
 
W

Willem

ImpalerCore wrote:
)> I was wondering if there is a way to avoid the C short circuiting for
)> boolean expressions such as the following:
)>
)> if(a && b && c) {
)> ? ? ?/* Do stuff */
)>
)> }
)>
)> I am currently doing GPU programming and it would be better to evaluate
)> the entire condition, even if a is 0, since branches are slow.
)
) Not sure if it would work, but what about evaluating the condition
) outside of the 'if' statement?
)
) bool my_condition_is_true = a && b && c;
) if (my_condition_is_true) {
) /* Do stuff */
) }

Nope, that would make the exact same branches.
I wouldn't be surprised if it generated exactly the same opcodes, even.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
I

ImpalerCore

ImpalerCore wrote:

)> I was wondering if there is a way to avoid the C short circuiting for
)> boolean expressions such as the following:
)>
)> if(a && b && c) {
)> ? ? ?/* Do stuff */
)>
)> }
)>
)> I am currently doing GPU programming and it would be better to evaluate
)> the entire condition, even if a is 0, since branches are slow.
)
) Not sure if it would work, but what about evaluating the condition
) outside of the 'if' statement?
)
) bool my_condition_is_true = a && b && c;
) if (my_condition_is_true) {
)   /* Do stuff */
) }

Nope, that would make the exact same branches.
I wouldn't be surprised if it generated exactly the same opcodes, even.

Oh, I get it now. The branches referred to are in the '&&' operator.
Thanks.
 
E

Eric Sosman

How about changing

if (a&& b&& c)

into

if ((a != 0)& (b != 0)& (c != 0))

This will always evaluate a, b, and c (in an unspecified order).

Unspecified, yes. Simultaneously, no. Even if a particular
implementation finds it can run each `x != 0' (or each `x' and
each `0', for that matter) in parallel, I repeat: C has no way to
express a requirement for parallelism.

If you think C has such expressive power, pray exhibit some
fragment of C that *requires* parallel execution (other than hand-
waving about `volatile' variables and asynchronous signals, already
mentioned up-thread and dismissed as not germane to the O.P.'s wish).
 
G

Geoff

Well, it has a different meaning:
a && b && c is true if the three variables are all non-zero
!(a | b | c) is true if all three variables are zero

This looks suspiciously like De Morgan's theorem but the correct
expression of it would have been:

!(!a || !b || !c)

Why is everyone changing a logical operator into a bitwise operator
for this functionality without consideration for the values of the
operands?

If we take the OP at face value and say, "how to short-circuit the
evaluation" of:

if(a && b && c) {
/* Do stuff */
}

It seems to me the correct answer is:

{
/* Do stuff */
}

Because if multiple threads or parallel processors are expected to
operate unconditionally on chunks of data with the same chunk of code
regardless of the values of three input variables then there is no
reason to initially evaluate the variables because they won't
influence the order of execution.
 
K

Keith Thompson

Geoff said:
This looks suspiciously like De Morgan's theorem but the correct
expression of it would have been:

!(!a || !b || !c)

Why is everyone changing a logical operator into a bitwise operator
for this functionality without consideration for the values of the
operands?

Not everyone is; most of us are specifically allowing for the
possibility of values other than 0 and 1.
If we take the OP at face value and say, "how to short-circuit the
evaluation" of:

if(a && b && c) {
/* Do stuff */
}

It seems to me the correct answer is:

{
/* Do stuff */
}

Because if multiple threads or parallel processors are expected to
operate unconditionally on chunks of data with the same chunk of code
regardless of the values of three input variables then there is no
reason to initially evaluate the variables because they won't
influence the order of execution.

I think the OP is trying to minimize branching, not eliminate it
altogether. An ideal solution to his problem would be an operator
that acts like && except that it doesn't short-circuit (if C had
such an operator). The branching implied by the "if" is unavoidable.

Hmm.

static inline int and(x, y) { return !!x & !!y; }

if (and(and(a, b), c)) {
/* Do stuff */
}

(untested)

On the other hand, the OP had said this code is running on a GPU,
and he's using a specialized compiler for a language very similar to
C, but not quite C. It could be that others have run into similar
issue, and that the tools he's already using provide better solutions
than anything we C folks can come up with.
 
B

blmblm

On 05/26/2011 02:22 PM, Thomas Richter wrote:
On 26.05.2011 20:09, Ben Pfaff wrote:

I was wondering if there is a way to avoid the C short circuiting for
boolean expressions such as the following:

if(a&& b&& c) {
/* Do stuff */
}
[...]
The processor is SIMD so each thread runs the same code, but they modify
their own data. As long as each thread executes the same instruction,
the processor can simultaneously work on each thread's data.

However, if one of the threads branches then the processor degrades
(SISD?) and loses a lot of the parallelism.

The goal here is to ensure each thread does exactly the same thing, even
if it is slightly wasteful.

C has no way to express the computation you desire. C's abstract
machine has only one flow of control, if we ignore the rather mysterious
effects of asynchronous signals and `volatile' variable. There is no
C construct that will evaluate a,b,c in parallel execution paths and
then somehow combine the three results into a single test. You've
chosen the wrong implementation language.

Hesitant though I am to argue with someone who generally seems
far more expert than I am ....

I don't get the impression at all that the OP wants a, b, and
c evaluated in parallel. As I understand it, he just wants to
be sure that there is no branching involved in the evaluation
of the condition (a && b && c). I admit I don't have a clear
understanding of the platform he's targeting, but it sounds like
it somehow involves multiple processing elements [*] executing the
same compiled code, and under those circumstances I can vaguely
imagine that performance might be better if all these processing
elements execute exactly the same instructions (i.e., no branches).

[*] The ones that put the "MD" in "SIMD" is what I'm thinking here,
though admittedly not with as much clarity as I might like.
 
E

Eric Sosman

[...]
C has no way to express the computation you desire. C's abstract
machine has only one flow of control, if we ignore the rather mysterious
effects of asynchronous signals and `volatile' variable. There is no
C construct that will evaluate a,b,c in parallel execution paths and
then somehow combine the three results into a single test. You've
chosen the wrong implementation language.

Hesitant though I am to argue with someone who generally seems
far more expert than I am ....

"My blushes, Watson." (And don't be too awed by my expertise;
the only reason I'm in the top ten of this group -- on good days --
is that the top twenty have stopped posting.)
I don't get the impression at all that the OP wants a, b, and
c evaluated in parallel. As I understand it, he just wants to
be sure that there is no branching involved in the evaluation
of the condition (a&& b&& c). I admit I don't have a clear
understanding of the platform he's targeting, but it sounds like
it somehow involves multiple processing elements [*] executing the
same compiled code, and under those circumstances I can vaguely
imagine that performance might be better if all these processing
elements execute exactly the same instructions (i.e., no branches).

[*] The ones that put the "MD" in "SIMD" is what I'm thinking here,
though admittedly not with as much clarity as I might like.

He didn't say anything about parallel execution at first, which
is why I gave a rather different response on the initial go-around.
But later he wrote that "On the GPU I am using, threads execute
simultaneously [provided...]," and I understood that as a request for
a C construct that expresses the "provided" and guarantees parallel
execution. C has conditional constructs that may (*may*) involve no
branches, given smooth seas and a following wind, but has nothing that
can guarantee or even talk about parallel execution, leaving aside
signals and volatiles.

'course, it's possible I've misunderstood the O.P. or blundered
in some other way, and will be summarily drummed out of the top ten.
Again. (Anybody recall the Brain Trust of the TV series "Scrubs?"
Them's my guys ...)
 
N

Nobody

I don't get the impression at all that the OP wants a, b, and
c evaluated in parallel. As I understand it, he just wants to
be sure that there is no branching involved in the evaluation
of the condition (a && b && c). I admit I don't have a clear
understanding of the platform he's targeting, but it sounds like
it somehow involves multiple processing elements [*] executing the
same compiled code, and under those circumstances I can vaguely
imagine that performance might be better if all these processing
elements execute exactly the same instructions (i.e., no branches).

I was (initially) under the impression that he wanted consistency between
CPU and GPU performance, and GPUs can't do short-circuit evaluation.

On a SIMD architecture (e.g. a GPU), you only have one program counter for
all processing units, so a branch where the test result can vary between
units cannot be implemented in the conventional manner (by modifying the
program counter). An if-else construct involves executing both branches on
all units; each unit ignores the instructions in one of the branches
depending upon whether the test is true or false on that particular unit.

OTOH, the OP may have just misunderstood what "branches are slow" means in
the context of GPU programming. They're slow in the sense that branching
over a set of instructions takes just as long as executing them. That
doesn't mean that tricks which eliminate branches will necessarily be any
faster.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,078
Messages
2,570,570
Members
47,204
Latest member
MalorieSte

Latest Threads

Top