IF-free conception.

N

Noob

Evgeney said:
branch prediction is very reason to slow processor down, so it's
really good idea to avoid if-statements at some cases as much as
possible. (heap sort has been one of such representative examples).

I think you need to read a book or three on computer architecture.
In comp arch, there are no universal truths. Even if one considers
several processors implementing the SAME instruction set, different
micro-architectures have different performance characteristics.

So what is "true" for one micro-arch may no longer hold for the next!
Case in point: the characteristics of Intel's P4 were wildly different
from what came before and after "Netbust".

Please read this discussion:

http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

Regards.
 
E

Evgeney Knyazhev

----------------------------------
Start No-IF heapsort
No-IF heapsort: 100000 ticks (100.000000 msec)
Start qsort
qsort: 20000 ticks (20.000000 msec)
Start std::sort
std::sort: 20000 ticks (20.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 60000 ticks (60.000000 msec)
Start No-IF two-way tides
No-IF two-way tides: 122200000 ticks (122200.000000 msec)
----------------------------------
Bart, it was Great :) qsort can be faster than heap one (it has been known). i think No-IF heap sort is slower because this piece of code:
---------------------------
T i1, i2, empty, *pEmpty;
pEmpty=∅
i2=(*p0<*p1);
i1=(i2+1)&1;
p1=(T*)(i2*(adr)p1+i1*(adr)pEmpty);
p0=(T*)(i2*(adr)p0+i1*(adr)pEmpty);
empty=*p0;
*p0=*p1;
*p1=empty;
return i2;
---------------------------
superfluous casting definitely makes stuff slower.
we can use scheme w/o pointers:
----------------------
empty=a[root];
a[root]=i2*a[child]+i1*a[root];
a[child]=i2*empty+i1*a[child];
----------------------

+ yet another room to speed-up could be to avoid multiplication.

As Glen mentioned, modern processors allow the way with CMOV to gain comparable performance. But it ain't available for all processors & compilers.

P.S.

two-way tides is O(n^2) time sort, but if array have small number of wrong-placed items, it'd run faster than heap-sort.
 
E

Evgeney Knyazhev

----------------------
I think you need to read a book or three on computer architecture.
In comp arch, there are no universal truths. Even if one considers
several processors implementing the SAME instruction set, different
micro-architectures have different performance characteristics.

So what is "true" for one micro-arch may no longer hold for the next!
Case in point: the characteristics of Intel's P4 were wildly different
from what came before and after "Netbust".

Please read this discussion:

http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus
 
Ö

Öö Tiib

Optimising your quoting stile would be a good start!

I'm saying the (completely off topic C++) code you referred to
ran just a fast with or without the sort when compiled with
gcc -O3.

True. What Evgeney Knyazhev is doing here (removing branches)
is often done in massively parallel computing and there it
gives major benefits. That means code that should run on
thousands of cores (like in modern GPU). It does not indeed
matter for modern desktop CPU.
Which just goes to show that micro-optimisations are best left
to the compiler.

Yes. Or for code-generator that is specially for GPU. It is anyway
worthy idea to discuss ... even with C++ for Windows examples.
C is usually expected to be among the first languages optimized
for whatever particular purpose hardware. ;-)
 
E

Evgeney Knyazhev

------------------------
// app name: IF-free Alg0Z
// ver: I.0

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <time.h>
#include <algorithm>

void modExp(uint64_t mod, uint64_t *seed, uint64_t exp)
{
int root = 1;
uint64_t lseed = *seed;

for(int i = 0; exp >> (i + 1) > 0; i++)
{
if(((exp >> i) & 1) == 1)
{
root *= lseed;
root %= mod;
}
lseed = (lseed * lseed) % mod;
}
*seed = root;
return;
}

void pseudoRND(int64_t *arr, int num0fItems, uint64_t mod, uint64_t seed,
uint64_t exp)
{
int size0fT1 = sizeof(uint64_t) * 8 - 1;
uint64_t sign = 0;

for(int i = 0; i < num0fItems; i++)
{
modExp(mod, &seed, exp);
sign = (seed >> size0fT1);
arr = -1 * seed * sign + seed * ((sign + 1) & 1);
}
}

#define chosenT uint64_t

/*********************** HeapSort ********************************/
inline void swp(int64_t *p0, int64_t *p1)
{
int64_t empty;
empty = *p0;
*p0 = *p1;
*p1 = empty;
return;
}

inline int64_t le(int64_t *p0, int64_t *p1)
{
int64_t i1, i2, empty;
i2 = (*p0 < *p1);
i1 = (i2 + 1) & 1;
empty = *p0;
*p0 = *p1*i2+*p0*i1;
*p1 = empty*i2+*p1*i1;
return i2;
}

inline int siftDown(int64_t *a, int root, int end)
{
int child;
int count = 0;
int64_t dv;
child = root << 1;
child++;
while(child < end)
{
dv = (child + 1 < end);
dv = (a[child] < a[child+1]) * dv;
child += dv;
dv = le(&a[root], &a[child]);
root = child;
end *= dv;
child = root << 1;
child++;
count++;
}
return count;
}

inline int Heapify(int64_t *a, int end)
{
int root = (end-1) >> 1;
int cnt = 0;
while(root > -1)
{
cnt += siftDown(a, root--, end);
}
return cnt;
}

inline int heapSort(int64_t *arr, int end)
{
int count = Heapify(arr, end);
while(end > 0)
{
swp(&arr[0], &arr[end - 1]);
count += siftDown(arr, 0, end-1);
end--;
}
return count;
}
/*********************** HeapSort ********************************/

int comp(const void* a, const void* b)
{
const int64_t* lhs = (const int64_t*)a;
const int64_t* rhs = (const int64_t*)b;
return lhs - rhs;
}

/****************** main ******************/

int main(int argc, char* argv[])
{
printf("(C) 2013 Evgeney Knyazhev.\n\n");

int size0fInt = sizeof(int64_t) * 8 - 1;
int N0fItems = 100000;
int bottom = 0;
int DownThe;
int UpThe = 1;
int i1;
int i2;
int64_t deltaV;
int64_t *array2sort;
int64_t *unsortedArr;
int64_t *p0;
int64_t *p1;
int64_t *pEmpty;
int64_t empty;
char StopThe = 0;
char GV; /*greater value*/
char unit = 0;
pEmpty = &empty;
chosenT seed = 613;
chosenT exp = 259;
chosenT mod = 91477361;
chosenT countIteration;

array2sort = (int64_t*)malloc(N0fItems * size0fInt);
unsortedArr = (int64_t*)malloc(N0fItems * size0fInt);

if(array2sort == NULL || unsortedArr == NULL)
{
printf("Not enough memory for array\n");
return 0;
}
pseudoRND(unsortedArr, N0fItems, mod, seed, exp);
for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

DownThe = N0fItems - 1;
countIteration = 0;
bottom = 0;
StopThe = 0;

printf("Start No-IF heapsort\n");
clock_t start = clock();
countIteration = heapSort(array2sort,N0fItems);
clock_t stop = clock();
printf("No-IF heapsort: %ld ticks (%f msec)\n", stop-start, (1000.0*
(stop-start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start qsort\n");
start = clock();
qsort(array2sort, N0fItems, sizeof(array2sort[0]), comp);
stop = clock();
printf("qsort: %ld ticks (%f msec)\n", stop-start, (1000.0*(stop-start)/
CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start std::sort\n");
start = clock();
std::sort(array2sort, array2sort+N0fItems);
stop = clock();
printf("std::sort: %ld ticks (%f msec)\n", stop-start, (1000.0*(stop-
start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start std::make_heap + std::sort_heap\n");
start = clock();
std::make_heap(array2sort, array2sort+N0fItems);
std::sort_heap(array2sort, array2sort+N0fItems);
stop = clock();
printf("std::make_heap + std::sort_heap: %ld ticks (%f msec)\n", stop-
start, (1000.0*(stop-start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];
return 0;
}
------------------------------

Results:
--------------------------------
Start No-IF heapsort
No-IF heapsort: 312 ticks (312.000000 msec)
Start qsort
qsort: 94 ticks (94.000000 msec)
Start std::sort
std::sort: 297 ticks (297.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 343 ticks (343.000000 msec)
 
E

Evgeney Knyazhev

i did change only "dv = (a[child] < a[child+1]) * dv;" to "dv = (a[child] < a[child+1]) & dv;"

gotten results:
--------------------
Start No-IF heapsort
No-IF heapsort: 297 ticks (297.000000 msec)
Start qsort
qsort: 110 ticks (110.000000 msec)
Start std::sort
std::sort: 343 ticks (343.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 375 ticks (375.000000 msec)
 
I

Ian Collins

Evgeney said:
i did change only "dv = (a[child] < a[child+1]) * dv;" to "dv = (a[child] < a[child+1]) & dv;"

In what context? You really should get to grips with Usenet quoting.
 
I

Ian Collins

------------------------------

Results:
--------------------------------
Start No-IF heapsort
No-IF heapsort: 312 ticks (312.000000 msec)
Start qsort
qsort: 94 ticks (94.000000 msec)
Start std::sort
std::sort: 297 ticks (297.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 343 ticks (343.000000 msec)

So you have proved your hackery is by far the slowest solution?

With optimisation enabled, your "No-IF heapsort" is off by almost an
order of magnitude...
 
E

Evgeney Knyazhev

So you have proved your hackery is by far the slowest solution?



With optimisation enabled, your "No-IF heapsort" is off by almost an

order of magnitude...

Ian, you say about qsort? :D qsort can be faster than heap-sort, but qsort ain't stable -- some data throws it to quadratic. i'm supporter of stable sorts :D
 
I

Ian Collins

Evgeney said:
Ian, you say about qsort? :D qsort can be faster than heap-sort, but qsort ain't stable -- some data throws it to quadratic. i'm supporter of stable sorts :D
I'm saying that with optimisation, your "No-IF heapsort" is almost an
order of magnitude slower than qsort or C++'s std::sort; I had to
increase the array sizes by 100 to get meaningful data:

g++ -m64 -O3 s.cc && ./a.out
Start No-IF heapsort
No-IF heapsort: 3940000 ticks (3940.000000 msec)
Start qsort
qsort: 590000 ticks (590.000000 msec)
Start std::sort
std::sort: 480000 ticks (480.000000 msec)
 
E

Evgeney Knyazhev

I'm saying that with optimisation, your "No-IF heapsort" is almost an

order of magnitude slower than qsort or C++'s std::sort; I had to

increase the array sizes by 100 to get meaningful data:



g++ -m64 -O3 s.cc && ./a.out

Start No-IF heapsort

No-IF heapsort: 3940000 ticks (3940.000000 msec)

Start qsort

qsort: 590000 ticks (590.000000 msec)

Start std::sort

std::sort: 480000 ticks (480.000000 msec)

what version did you use?????? with pointers?????
 
S

Shao Miller

Good Time to all, Amici(Friends).

Here, i'd like to discuss how to reduce conditional branches in the code. 1st of the all, i would like to share some tricks. Their description is here (http://alg0z.blogspot.ru/2013/02/if-or-no-if.html).

That web-page includes:
...at some cases) to gain practical benefits...

What practical benefits?
...Actually, IF-free code ain't panacea to speed-up...

Is speed one of the practical benefits? If so, C doesn't define the
speed of a program, so it'll always be relative to a particular instance
of a program, or sometimes even to particular input or even to the state
of system the program executes within.
... Here dead turns become faster than branching. ...

I'd suggest revising that statement to say something like "I measured
the performance of this program once and it took XXX amount of time,
whereas an 'if' counterpart took YYY amount of time. I tested with
such-and-such C compiler on such-and-such platform with such-and-such
compiler options." Otherwise, it looks like a universal claim, which I
find to be a turn-off.

I bet CHAR_BIT bits that you won't answer the following question: Why
did you choose to post to comp.lang.c?

What does this have to do with C? In C, 'if' can be used if you need to
conditionally execute a statement. If you only need expressions, then
you don't need 'if'. Are you familiar with the so-called "conditional"
operator?

expr1 ? expr2 : expr3

If 'expr1 == 0', then 'expr2' is evaluated, otherwise 'expr3' is evaluated.

Programming can be exciting. It's nice that you've found something
interesting. So what kind of discussion are you hoping for? :)
 
S

Shao Miller

What does this have to do with C? In C, 'if' can be used if you need to
conditionally execute a statement. If you only need expressions, then
you don't need 'if'. Are you familiar with the so-called "conditional"
operator?

expr1 ? expr2 : expr3

If 'expr1 == 0', then 'expr2' is evaluated, otherwise 'expr3' is evaluated.

Oops, I meant:
If 'expr1 == 0', then 'expr3' is evaluated, otherwise 'expr2' is evaluated.

Sorry about that.
 
Ö

Öö Tiib

what version did you use?????? with pointers?????

Whatever he uses, he is right. You apparently measure not optimized code.
Everywhere optimized std::sort tends to be tiny bit faster than qsort.
If you see it to be noticeably slower then that means you have
compiler optimizations turned off.
 
K

Keith Thompson

Evgeney Knyazhev said:
----------------------
I think you need to read a book or three on computer architecture.
In comp arch, there are no universal truths. Even if one considers
several processors implementing the SAME instruction set, different
micro-architectures have different performance characteristics.

So what is "true" for one micro-arch may no longer hold for the next!
Case in point: the characteristics of Intel's P4 were wildly different
from what came before and after "Netbust".

Please read this discussion:

http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

*Please* learn how to post here.

Quoted text is marked by a leading "> " on each line, not by
surrounding it with rows of '-' characters. Quoted text should
also be preceded by an attribution line, such as the "Evgeney
should be wrapped to about 72 colums so it's easy to read in an
80-column window, leaving room for a few "> " prefixes when it's
quoted in followups.

Any decent newsreader program will do this for you automatically.

For examples of the right way to quote and cite other articles, see the
vast majority of the articles in this newsgroup, including this one.
 
K

Keith Thompson

Öö Tiib said:
Yes. Or for code-generator that is specially for GPU. It is anyway
worthy idea to discuss ... even with C++ for Windows examples.

Examples using Windows-specific C++ code may well be worthy of
discussion -- but not here, please.
 
E

Evgeney Knyazhev

Shao, Thanks for reply. i've shared code for people to test solution in different environments. Actually, last variant of no-if heap-sort wins classic heap-sort in my case. qsort runs faster about 2X-4X.
 
N

Noob

Evgeney said:
superfluous casting definitely makes stuff slower.

Such peremptory (and incorrect) claims make baby Jesus cry.

How much C programming experience do you have?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,569
Members
47,205
Latest member
KelleM857

Latest Threads

Top