IF-free conception.

K

Keith Thompson

Evgeney Knyazhev said:
#include <algorithm> [...]
printf("Start std::sort\n");
start = clock();
std::sort(array2sort, array2sort+N0fItems);
stop = clock();
printf("std::sort: %ld ticks (%f msec)\n", stop-start, (1000.0*(stop-
start)/CLOCKS_PER_SEC));
[...]

Eveney, seriously, what part of the name "comp.lang.c" do you
not understand? There's an active comp.lang.c++ newsgroup.
Your articles might be considered off-topic there as well, for
different reasons, but why would you post C++ code in a newsgroup
dedicated to discussing C? They're two different languages.
 
Ö

Öö Tiib

[...]
Yes. Or for code-generator that is specially for GPU. It is anyway
worthy idea to discuss ... even with C++ for Windows examples.

Examples using Windows-specific C++ code may well be worthy of
discussion -- but not here, please.

Most of his code is written in C. C++ is used to compare timings with
C++ standard library. Parts of C++ standard library work more
efficiently than anything in standard C ... so dodging such benchmarks
equals hiding your head under sand.
 
K

Keith Thompson

Evgeney Knyazhev said:
Ian, you say about qsort? :D qsort can be faster than heap-sort, but
qsort ain't stable -- some data throws it to quadratic. i'm supporter
of stable sorts :D

C's qsort function doesn't necessarily use the QuickSort algorithm.
 
K

Keith Thompson

Öö Tiib said:
[...]
Yes. Or for code-generator that is specially for GPU. It is anyway
worthy idea to discuss ... even with C++ for Windows examples.

Examples using Windows-specific C++ code may well be worthy of
discussion -- but not here, please.

Most of his code is written in C.

All of his code is written in C++. (Most of it is written in the common
subset of the two languages.)
C++ is used to compare timings with
C++ standard library. Parts of C++ standard library work more
efficiently than anything in standard C ... so dodging such benchmarks
equals hiding your head under sand.

I'm not hiding my head anywhere. I'm not saying the comparisons aren't
useful or interesting, merely that if they're not about C, they belong
somewhere other than comp.lang.c.
 
Ö

Öö Tiib

Öö Tiib said:
[...]
Yes. Or for code-generator that is specially for GPU. It is anyway
worthy idea to discuss ... even with C++ for Windows examples.

Examples using Windows-specific C++ code may well be worthy of
discussion -- but not here, please.
Most of his code is written in C.

All of his code is written in C++. (Most of it is written in the common
subset of the two languages.)

We seem to say the same thing? The original work that he compares with C++
standard library is useful as C (being common subset between the languages)..
I'm not hiding my head anywhere.

I did not say that you hide. It was metaphor.
I'm not saying the comparisons aren't
useful or interesting, merely that if they're not about C, they belong
somewhere other than comp.lang.c.

He is about C. Performance is very important for C. Attempts to
write C that competes well with other well-performers belongs exactly
to comp.lang.c. The whole point of his work is to achieve well-performing
C code by reducing if() branches. Why it is wrong in comp.lang.c?
 
B

Ben Bacarisse

<snip rest of C++ program>

Before you post in comp.lang.c++ you might want to correct the program.
As I said before, it accesses the array outside of the array bounds.
Its behaviour is undefined so the timing is, for the moment, irrelevant.
 
B

Ben Bacarisse

Evgeney Knyazhev said:
Shao, Thanks for reply. i've shared code for people to test solution
in different environments. Actually, last variant of no-if heap-sort
wins classic heap-sort in my case. qsort runs faster about 2X-4X.

It still contains a bug, though.
 
E

Evgeney Knyazhev

<snip rest of C++ program>



Before you post in comp.lang.c++ you might want to correct the program.

As I said before, it accesses the array outside of the array bounds.

Its behaviour is undefined so the timing is, for the moment, irrelevant.

Ben, it cannot go out of bounds:

code:
dv = (child + 1 < end);
dv = (a[child] < a[child+1]) & dv;
child += dv;
 
E

Evgeney Knyazhev

Start No-IF heapsort
No-IF heapsort: 14969 ticks (14969.000000 msec)
Start qsort
qsort: 1703 ticks (1703.000000 msec)
Start std::sort
std::sort: 1250 ticks (1250.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 5765 ticks (5765.000000 msec)
 
B

Ben Bacarisse

Evgeney Knyazhev said:
<snip rest of C++ program>

Before you post in comp.lang.c++ you might want to correct the program.
As I said before, it accesses the array outside of the array bounds.
Its behaviour is undefined so the timing is, for the moment, irrelevant.

Ben, it cannot go out of bounds:

code:
dv = (child + 1 < end);
dv = (a[child] < a[child+1]) & dv;
child += dv;

There is so much wrong with that. 'end' is not the number of items in
the array -- the code you cite is in a loop that modifies 'end' so it
takes lots of values. However, that's not really the issue. The
problem is that 'child+1' can be equal to the size of the array.
 
E

Evgeney Knyazhev

There is so much wrong with that. 'end' is not the number of items in
the array -- the code you cite is in a loop that modifies 'end' so it

takes lots of values. However, that's not really the issue. The

problem is that 'child+1' can be equal to the size of the array.
Ben, end can be set to zero only when siftdown() must be finished.
 
B

Ben Bacarisse

Evgeney Knyazhev said:
Ben, end can be set to zero only when siftdown() must be finished.

You keep replying by stating things that have no bearing on the issue.

If you want to understand this bug you can (a) run the program under a
memory checker like valgrind; (b) use a debugger to trap when child + 1
is equal to the array size; or (c) put 'assert(child + 1 < 100000);'
above the code in question. I'm suggesting practical thing because you
don't seem to like thinking about the code.
 
J

James Kuyper

On 02/17/2013 01:50 PM, Evgeney Knyazhev wrote:
....
superfluous casting definitely makes stuff slower.

I wonder - what makes you so certain about that? Have you had the
misfortune of using a compiler so badly implemented that this is
actually true? If so, could you identify the compiler, and provide us
with evidence supporting that claim?
Alternatively, are you just making very bad guesses about what would
happen if you tried this?
 
J

James Kuyper

On 02/17/2013 01:50 PM, Evgeney Knyazhev wrote:
....
superfluous casting definitely makes stuff slower.

I wonder - what makes you so certain about that? Have you had the
misfortune of using a compiler so badly implemented that this is
actually true? If so, could you identify the compiler, and provide us
with evidence supporting that claim?
Alternatively, are you just making very bad guesses about what would
happen if you tried this?
 
E

Evgeney Knyazhev

You keep replying by stating things that have no bearing on the issue.



If you want to understand this bug you can (a) run the program under a

memory checker like valgrind; (b) use a debugger to trap when child + 1

is equal to the array size; or (c) put 'assert(child + 1 < 100000);'

above the code in question. I'm suggesting practical thing because you

don't seem to like thinking about the code.
Ben, i respect, & am thankful for your code inspection + your suggestion, but i did check it many times & everything has been ok. that logical scheme can fail, if you send bad value of 'end'. if 'end' equals length of array than stuff runs trouble - free.
-------

Well, my Friends, i got a really anecdotic situation :D i was dipping at the Asm level:

$LN6@siftDown:

; 82 : dv = (a[child] < a[child+1]) & dv;

00032 8b 44 fe 04 mov eax, DWORD PTR [esi+edi*8+4]
00036 3b 44 fe 0c cmp eax, DWORD PTR [esi+edi*8+12]
0003a 7f 12 jg SHORT $LN7@siftDown
0003c 7c 09 jl SHORT $LN15@siftDown
0003e 8b 04 fe mov eax, DWORD PTR [esi+edi*8]
00041 3b 44 fe 08 cmp eax, DWORD PTR [esi+edi*8+8]
00045 73 07 jae SHORT $LN7@siftDown
 
Ö

Öö Tiib

Ben, end can be set to zero only when siftdown() must be finished.

Ben is correct, the code really has to dereference one-past-end element
of array at least once. Also ... there is borken comparator for qsort:

/*********************** HeapSort ********************************/

int comp(const void* a, const void* b)
{
const int64_t* lhs = (const int64_t*)a;
const int64_t* rhs = (const int64_t*)b;
return lhs - rhs;
}

It should 'return *lhs - *rhs' otherwise it is comparing addresses
passed to it.

Overall ... do not worry, fix it, all code is full of bugs anyway.
 
I

Ian Collins

Evgeney said:
Ben, i respect, & am thankful for your code inspection + your suggestion, but i did check it many times & everything has been ok. that logical scheme can fail, if you send bad value of 'end'. if 'end' equals length of array than stuff runs trouble - free.

Please learn to quote correctly, it isn't rocket science!

Your code does use end == the length of the array!
 
Ö

Öö Tiib

such IF-fy cr@p :D

Then you have to try harder. Here's a bonus trick for you:

/// @return 1 if a is less than b, otherwise 0
/// uses the less than that possibly causes branches
/// do not use if you can not predict what it likely
/// returns.
int less64_bad( int64_t a, int64_t b )
{
return a < b;
}

/// @return 1 if a is less than b, otherwise 0
/// uses two's complement bit arithmetic
/// you likely get solid branch free code with it.
int less64_good( int64_t a, int64_t b )
{
return (a - b)>>63&1;
}

Latter works faster when it is unpredictable what it actually
returns. Not by much ... maybe 20% of ticks you get down.

Note that math is assuming specific binary layout of int64_t
that C standard does not guarantee, but hardware with ones
complement or sign + magnitude is *very* rare and twos
complement is almost everywhere. If unsure, add compile-time
asserts to it.
 
B

Ben Bacarisse

Ian Collins said:
Please learn to quote correctly, it isn't rocket science!

Your code does use end == the length of the array!

Yes it does and he thinks that's fine! The fact that end is (at some
times during the execution) equal to the length of the array does not
seem to be disputed by anyone (which is good because it's true!). What
Evgeney Knyazhev seems to be disputing is that child+1 is ever equal to
the length of the array just before it used as an array index.
 
B

Ben Bacarisse

Evgeney Knyazhev said:
Ben, i respect, & am thankful for your code inspection + your
suggestion, but i did check it many times & everything has been
ok. that logical scheme can fail, if you send bad value of 'end'. if
'end' equals length of array than stuff runs trouble - free.

How did you check it? Did you actually do any of (a), (b) or (c) that I
suggested or did you just assume that it must be right because you ran
the program a few times and you got the output you expected?

If you do nothing else, please add #include <assert.h> at the top of the
file and, just before the code that accesses 'a[child+1]' put:

assert(child + 1 < 100000);

'a' points to an array of length 100000 so if this assertion fails you
will know that the code has a bug. If I do that using exactly the code
you posted I see:

(C) 2013 Evgeney Knyazhev.

Start No-IF heapsort
e: e.cc:77: int siftDown(int64_t*, int, int): Assertion `child + 1 < 100000' failed.
Aborted

Of course you could just think about the program: main calls heapSort
with end==100000. That calls Heapify with end==100000 which in turn
calls siftDown with root==49999 and end==100000. siftDown starts:

int child;
int count = 0;
int64_t dv;
child = root << 1;
child++;

At this point child == 99999, so the loop is entered:

while(child < end)
{
dv = (child + 1 < end);
dv = (a[child] < a[child+1]) * dv;

At this point child+1 is not a valid index for the pointer 'a'.

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top