IF-free conception.

I

Ian Collins

BartC said:
That makes a change from the majority of open-source C code that appears to
be Unix-oriented, in its distribution format and tool-chains expected to be
used to build it, just for a start!

Sums up the platform cultures....
 
G

glen herrmannsfeldt

But *you're* the one making (rather vague) claims about increased
performance. It's up to you to support those claims. I, for one,
am not going to do that work for you.
I can believe that replacing conditional code with non-conditional
("if-free") code can sometimes result in performance improvements.
If so, modern optimizing compilers are sophisticated enough to
perform such tranformations themselves. Do they? (I don't know the
answer to that). And I think that speculative execution avoids at
least some of the performance problems of conditional code; have you
looked into whether that might make your if-less code unnecessary?

I believe that some now have conditional load, which allows for a
conditional without the execution path uncertainty of a branch.

IA32 has CMOVxx (where xx is the condition) which can replace some
branch instructions. It might not go all the way back to the 386,
and it might be that compilers generating 32 bit code still stay
with instructions from the 386 (at least by default).

Between branch prediction and speculative execution, much of the
cost of branches is reduced.

-- glen
 
G

glen herrmannsfeldt

(snip, I wrote)
CDC was the famous one. There is a story that when Seymour Cray
switched to 2's complement for the Cray-1 somebody asked him why he'd
used 1's before. He said he'd only just learned about 2's complement.

I did use some CDC machines in the 1980's, but I don't remember if
they had C compilers. About the time I learned C, I had enough other
computers to use.

Unisys produced ones complement machines more recently, but again
I don't know about C compilers.

-- glen
 
E

Evgeney Knyazhev

1st & foremost, my Friends, my dozen of Thanks for reply of Yours. at the 2nd moment, i'd like to put some things clear:

*my very goal to verify one idea or another.
*i'm too lazy to polish code (yes, i know it's extremely bad :D ).
*i'd like to move on Linux, but no time + too lazy again.
* branch prediction is very reason to slow processor down, so it's really good idea to avoid if-statements at some cases as much as possible. (heap sort has been one of such representative examples).
-----------------------
let's compare two type of codes in the heap sort.

IF-based:
------------------------------------
if(child+1<end){
if(a[child]<a[child+1])child++;
}
 
I

Ian Collins

Evgeney said:
1st & foremost, my Friends, my dozen of Thanks for reply of Yours. at the 2nd moment, i'd like to put some things clear:

*my very goal to verify one idea or another.
*i'm too lazy to polish code (yes, i know it's extremely bad :D ).
*i'd like to move on Linux, but no time + too lazy again.
* branch prediction is very reason to slow processor down, so it's really good idea to avoid if-statements at some cases as much as possible. (heap sort has been one of such representative examples).

If you are so sure, post two *working C* examples and your benchmark
results. Otherwise this looks like yet another attempt at a pointless
premature micro-optimisation.
 
B

Ben Bacarisse

Evgeney Knyazhev said:
1st & foremost, my Friends, my dozen of Thanks for reply of Yours. at
the 2nd moment, i'd like to put some things clear:

*my very goal to verify one idea or another.

But you don't do any verification! In fact when asked if you have any
evidence to support your claims you tell the asker to do it. When I did
that (me) you say that the example I was foolish enough to waste my time
on was just for fun.
*i'm too lazy to polish code (yes, i know it's extremely bad :D ).
*i'd like to move on Linux, but no time + too lazy again.
* branch prediction is very reason to slow processor down, so it's
really good idea to avoid if-statements at some cases as much as
possible.

Why, then, did it not work for the code of yours that I tried?
(heap sort has been one of such representative examples).
----------------------- let's compare two type of codes in the heap
sort.

IF-based:
------------------------------------
if(child+1<end){
if(a[child]<a[child+1])child++;
}

Simple answer: you might need it for the code to work: your re-write is
not the same as the original. It is possible (likely, I'd say) that the
child + 1 < end test is there to prevent accessing a[child + 1] when
that element is out-of-bounds.
 
J

James Kuyper

---------------------------------
Well, so far the only results you've gotten is corrections to your
mistakes about C, expressions of disgust, reported timings that
contradict the supposed speed advantages, and redirection to more
appropriate windows/C++ forums. I haven't seen any positive expressions
of interest in the idea.

True, but this doesn't even come close. The costs to the clarity and
maintainability of the code are enormous. You've ignored requests to
provide test results measuring the supposed benefits, and other people
have produced test results that seem to suggest that your approach
actually makes things worse. Even if there were actual benefits, this
would constitute micro-optimization of the worst sort.
 
E

Evgeney Knyazhev

----------------------------------
Simple answer: you might need it for the code to work: your re-write is
not the same as the original. It is possible (likely, I'd say) that the
child + 1 < end test is there to prevent accessing a[child + 1] when
that element is out-of-bounds.
 
I

Ian Collins

E

Evgeney Knyazhev

well, James, perhaps later i'll get gcc to re-write code to it, or shall doit on Java. about readability: high-performance code frequently ain't goodreadable. Automatic Optimization with such technique would be preferable, but AO is too closely related with ATP (automated theorem proving). to avoid IFs ain't easy-go Beast at many cases. :D
 
J

James Kuyper

----------------------------------
Simple answer: you might need it for the code to work: your re-write is
not the same as the original. It is possible (likely, I'd say) that the
child + 1 < end test is there to prevent accessing a[child + 1] when
that element is out-of-bounds.
----------------------------------
Ben, it's the code from original source:

-----------------------------
dv=(child+1<end);
dv=(a[child]<a[child+1])*dv;
child+=dv;
----------------------------

The original code never accesses a[child+1] when child+1>=end. There are
other possibilities, but one of the most common reasons for writing code
like that is that for larger values of 'child', the expression
a[child+1] has undefined behavior, because it refers to a non-existent
element of an array.

Your re-write always evaluates the expression a[child+1], regardless of
the value of 'child', before multiplying by 0. Thus, it fails to avoid
the undefined behavior that the original version successfully avoided.
 
J

James Kuyper

well, James, perhaps later i'll get gcc to re-write code to it, or
shall do it on Java. about readability: high-performance code
frequently ain't good readable. Automatic Optimization with such
technique would be preferable, but AO is too closely related with ATP
(automated theorem proving). to avoid IFs ain't easy-go Beast at many
cases. :D

So far. several people have reported timing results suggesting that the
compilers they were using took the simple, clear, if-based code, and
automatically optimized it to run as fast as, or even faster than, your
obscure and unreadable if-free code. Do you know of any compiler for
which this is NOT the case? Could you identify that compiler, the
platform you were using it on, and the results of your timing tests?
 
I

Ian Collins

Evgeney said:
Ian, what a kind of optimization ye're saying?

Optimising your quoting stile would be a good start!

I'm saying the (completely off topic C++) code you referred to ran just
a fast with or without the sort when compiled with gcc -O3.

Which just goes to show that micro-optimisations are best left to the
compiler.
 
G

glen herrmannsfeldt

(snip on avoiding branches)
True, but this doesn't even come close. The costs to the clarity and
maintainability of the code are enormous. You've ignored requests to
provide test results measuring the supposed benefits, and other people
have produced test results that seem to suggest that your approach
actually makes things worse. Even if there were actual benefits, this
would constitute micro-optimization of the worst sort.

You might look at:

http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

Written by Linus Torvalds, it seems to indicate that CMOV isn't
all that useful compared to branches.

Well, more specifically, on the P4 it isn't all that good, but
then mispredicted branches aren't either. On Core 2, both CMOV
and branches are better.

The discussion relates to out-of-order execution, which is done
by many of the more recent processors, and the effect of different
instructions on it.

Seems that for a truly unpredicable branch that CMOV might be
better. (A test on the result of a random number generator.)

Compilers might not be so good at knowing which ones are more
predicatable and which less.

-- glen
 
8

88888 Dihedral

Evgeney Knyazhevæ–¼ 2013å¹´2月16日星期六UTC+8上åˆ4時51分47秒寫é“:
Good Time to all, Amici(Friends).



Here, i'd like to discuss how to reduce conditional branches in the code.1st of the all, i would like to share some tricks. Their description is here (http://alg0z.blogspot.ru/2013/02/if-or-no-if.html).

I'll give an example that will work to reduce the if part.

a= a&1; // integer for reducing to the 0,1 system

Invoke functor[a] immediately.

a=a | 0xffffffff ;// 0, 0xffffffff system for 32 bits
a2 = a ^ 0xffffffff;
y=(a&z1)| ( a2&z2); // if a y=z1 else z2 ;;
// good for very slow branching HW only
 
B

Bart van Ingen Schenau

well, James, perhaps later i'll get gcc to re-write code to it, or shall
do it on Java. about readability: high-performance code frequently ain't
good readable. Automatic Optimization with such technique would be
preferable, but AO is too closely related with ATP (automated theorem
proving). to avoid IFs ain't easy-go Beast at many cases. :D

Hight performance code does not have to be unreadable.
In fact, with modern optimizing compilers, it is far from obvious to hand-
optimize code, because any optimization you make might inhibit the
compiler from performing further optimizations, in effect rendering the
result *less* optimized.

As a point in fact, here are some measurements of your if-less sort
algorithms:

Optimised version (-O3):
(C) 2013 Evgeney Knyazhev.

Start No-IF heapsort
No-IF heapsort: 100000 ticks (100.000000 msec)
Start qsort
qsort: 20000 ticks (20.000000 msec)
Start std::sort
std::sort: 20000 ticks (20.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 60000 ticks (60.000000 msec)
Start No-IF two-way tides
No-IF two-way tides: 122200000 ticks (122200.000000 msec)

Debug version (-O0):
(C) 2013 Evgeney Knyazhev.

Start No-IF heapsort
No-IF heapsort: 140000 ticks (140.000000 msec)
Start qsort
qsort: 20000 ticks (20.000000 msec)
Start std::sort
std::sort: 30000 ticks (30.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 100000 ticks (100.000000 msec)
Start No-IF two-way tides
No-IF two-way tides: 221790000 ticks (221790.000000 msec)

As you can see, the No-IF algorithms are the two slowest ones (the two-
way tides might even be outperformed by a bogo-sort).

The code that generated the results (removed most of the C++ specific
stuff to keep it on topic here):

// IF-free sorting.cpp : Defines the entry point for the console
application.
// (C) 2013 Evgeney Knyazhev
// app name: IF-free Alg0Z
// ver: I.0

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <time.h>
#include <algorithm>

void modExp(uint64_t mod, uint64_t *seed, uint64_t exp)
{
int root = 1;
uint64_t lseed = *seed;

for(int i = 0; exp >> (i + 1) > 0; i++)
{
if(((exp >> i) & 1) == 1)
{
root *= lseed;
root %= mod;
}
lseed = (lseed * lseed) % mod;
}
*seed = root;
return;
}

void pseudoRND(int64_t *arr, int num0fItems, uint64_t mod, uint64_t seed,
uint64_t exp)
{
int size0fT1 = sizeof(uint64_t) * 8 - 1;
uint64_t sign = 0;

for(int i = 0; i < num0fItems; i++)
{
modExp(mod, &seed, exp);
sign = (seed >> size0fT1);
arr = -1 * seed * sign + seed * ((sign + 1) & 1);
}
}

#define chosenT uint64_t

/*********************** HeapSort ********************************/
inline void swp(int64_t *p0, int64_t *p1)
{
int64_t empty;
empty = *p0;
*p0 = *p1;
*p1 = empty;
return;
}

inline int64_t le(int64_t *p0, int64_t *p1)
{
int64_t i1, i2, empty, *pEmpty;
pEmpty = &empty;
i2 = (*p0 < *p1);
i1 = (i2 + 1) & 1;
p1 = (int64_t*)(i2 * (chosenT)p1 + i1 * (chosenT)pEmpty);
p0 = (int64_t*)(i2 * (chosenT)p0 + i1 * (chosenT)pEmpty);
empty = *p0;
*p0 = *p1;
*p1 = empty;
return i2;
}

inline int siftDown(int64_t *a, int root, int end)
{
int child;
int count = 0;
int64_t dv;
child = root << 1;
child++;
while(child < end)
{
dv = (child + 1 < end);
dv = (a[child] < a[child+1]) * dv;
child += dv;
dv = le(&a[root], &a[child]);
root = child;
end *= dv;
child = root << 1;
child++;
count++;
}
return count;
}

inline int Heapify(int64_t *a, int end)
{
int root = (end-1) >> 1;
int cnt = 0;
while(root > -1)
{
cnt += siftDown(a, root--, end);
}
return cnt;
}

inline int heapSort(int64_t *arr, int end)
{
int count = Heapify(arr, end);
while(end > 0)
{
swp(&arr[0], &arr[end - 1]);
count += siftDown(arr, 0, end-1);
end--;
}
return count;
}
/*********************** HeapSort ********************************/

int comp(const void* a, const void* b)
{
const int64_t* lhs = (const int64_t*)a;
const int64_t* rhs = (const int64_t*)b;
return lhs - rhs;
}

/****************** main ******************/

int main(int argc, char* argv[])
{
printf("(C) 2013 Evgeney Knyazhev.\n\n");

int size0fInt = sizeof(int64_t) * 8 - 1;
int N0fItems = 100000;
int bottom = 0;
int DownThe;
int UpThe = 1;
int i1;
int i2;
int64_t deltaV;
int64_t *array2sort;
int64_t *unsortedArr;
int64_t *p0;
int64_t *p1;
int64_t *pEmpty;
int64_t empty;
char StopThe = 0;
char GV; /*greater value*/
char unit = 0;
pEmpty = &empty;
chosenT seed = 613;
chosenT exp = 259;
chosenT mod = 91477361;
chosenT countIteration;

array2sort = (int64_t*)malloc(N0fItems * size0fInt);
unsortedArr = (int64_t*)malloc(N0fItems * size0fInt);

if(array2sort == NULL || unsortedArr == NULL)
{
printf("Not enough memory for array\n");
return 0;
}
pseudoRND(unsortedArr, N0fItems, mod, seed, exp);
for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

DownThe = N0fItems - 1;
countIteration = 0;
bottom = 0;
StopThe = 0;

printf("Start No-IF heapsort\n");
clock_t start = clock();
countIteration = heapSort(array2sort,N0fItems);
clock_t stop = clock();
printf("No-IF heapsort: %ld ticks (%f msec)\n", stop-start, (1000.0*
(stop-start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start qsort\n");
start = clock();
qsort(array2sort, N0fItems, sizeof(array2sort[0]), comp);
stop = clock();
printf("qsort: %ld ticks (%f msec)\n", stop-start, (1000.0*(stop-start)/
CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start std::sort\n");
start = clock();
std::sort(array2sort, array2sort+N0fItems);
stop = clock();
printf("std::sort: %ld ticks (%f msec)\n", stop-start, (1000.0*(stop-
start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start std::make_heap + std::sort_heap\n");
start = clock();
std::make_heap(array2sort, array2sort+N0fItems);
std::sort_heap(array2sort, array2sort+N0fItems);
stop = clock();
printf("std::make_heap + std::sort_heap: %ld ticks (%f msec)\n", stop-
start, (1000.0*(stop-start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start No-IF two-way tides\n");
start = clock();
/***********************************
IF-free Sorting two-way tides
************************************/
while(StopThe != 1)
{
p1 = &array2sort[DownThe];
p0 = &array2sort[DownThe - 1];
deltaV = *p1 - *p0;
GV = ((uint64_t)deltaV) >> size0fInt;
if(*p0 > *p1)
i2 = 1;
i1 = (GV + 1) & 1;
i2 = GV & 1;
p1 = (int64_t*)(i2 * (chosenT)p1 + i1 * (chosenT)pEmpty);
p0 = (int64_t*)(i2 * (chosenT)p0 + i1 * (chosenT)pEmpty);
empty = *p1;
*p1 = *p0;
*p0 = empty;
unit = i2 + i1 * unit;
p1 = &array2sort[UpThe];
p0 = &array2sort[UpThe - 1];
deltaV = *p1 - *p0;
GV = ((uint64_t)deltaV) >> size0fInt;
i1 = (GV + 1) & 1;
i2 = GV & 1;
p1 = (int64_t*)(i2 * (chosenT)p1 + i1 * (chosenT)pEmpty);
p0 = (int64_t*)(i2 * (chosenT)p0 + i1 * (chosenT)pEmpty);
empty = *p1;
*p1 = *p0;
*p0 = empty;
unit = i2 + unit * i1;
UpThe++;
deltaV = UpThe - N0fItems;
GV = ((uint64_t)deltaV) >> size0fInt;
i1 = (GV + 1) & 1;
i2 = GV & 1;
bottom += i1;
UpThe = i2 * UpThe + i1 * bottom;
DownThe--;
/*deltaV=bottom-DownThe;
GV=((unsigned int)deltaV)>>size0fInt;
i1=(GV+1)&1;
i2=GV&1;*/
N0fItems -= i1;
DownThe = i2 * DownThe + i1 * (N0fItems - 1);
StopThe = i1 & (unit + 1);
unit &= i2;
countIteration++;
}
/**************************************
IF-free Sorting two-way tides
***************************************/
stop = clock();
printf("No-IF two-way tides: %ld ticks (%f msec)\n", stop-start,
(1000.0*(stop-start)/CLOCKS_PER_SEC));

return 0;
}

Bart v Ingen Schenau
 
N

Noob

glen said:
IA32 has CMOVxx (where xx is the condition) which can replace some
branch instructions. It might not go all the way back to the 386,

Indeed not.

Intel introduced CMOVcc with the Pentium Pro (so -march=i686 in gcc).

The availability of CMOVcc is signaled by bit 15 of EDX after
executing CPUID with EAX=1.

https://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_Pentium_Pro
https://en.wikipedia.org/wiki/Processor_supplementary_capability#IA-32
http://www.sandpile.org/x86/opc_2.htm
http://www.sandpile.org/x86/cpuid.htm

Regards.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,077
Messages
2,570,569
Members
47,205
Latest member
KelleM857

Latest Threads

Top