comparison function for qsort() question

D

David Mathog

A piece of code I am working on uses qsort() to sort buffers containing
packed binary fixed length words, however the number of bytes in a word
will vary from one call of qsort() to another. We've discussed here
before how qsort() doesn't provide any nice way to pass in extra data,
so here the number of bytes is smuggled into the comparison function
using the global integer variable gbl_bws (gbl_bws == GloBaL Binary Word
Size). The word size varies but isn't likely to be huge in any case,
with a maximum of probably around 8 bytes. The current code is this:

static int compare_bw1(const void *p1, const void *p2){
unsigned char *uc1 = (unsigned char *)p1;
unsigned char *uc2 = (unsigned char *)p2;
int i;
for(i=0; i < gbl_bws; i++){
if( uc1 > uc2 ){ return 1; }
else if( uc1 < uc2 ){ return -1;}
}
return 0;
}

Another way of doing this is:

static int compare_bw2(const void *p1, const void *p2){
return(memcmp(p1,p2,gbl_bws);
}

Any thoughts on whether the second function form would usually be
faster, slower, or the same?

It seems to me to be pretty compiler dependent in that if the compiler
actually makes a function call to memcmp() that call overhead is going
to add up in a hurry when sorting many billions of words. On the other
hand, if the compiler expands memcmp() in place that code is likely to
be heavily optimized. I know that generally using memcpy() ends up
being faster than using my own code, is that going to be generally true
for memcmp() as well?

On a related note, what does the standard say memcmp() will return for
the following 4 cases? My guesses are shown in the last column.

p1 p2 Guess (why)
1 null null 0 (because they are identical, albeit undefined)
2 null !null -1 (nothing < something )
3 !null null 1 (something > nothing )
-----------------------------------------------
4 n == 0 0 {0 bytes of anything is always nothing)

Since memcmp() only returns a comparison value, and not an error value
which is really what's called for here, it presumably has these
"comparisons" defined in the standard. The man pages I have seen do
not address this point.

Thanks,

David Mathog
 
W

Walter Roberson

David Mathog said:
On a related note, what does the standard say memcmp() will return for
the following 4 cases? My guesses are shown in the last column.

p1 p2 Guess (why)
1 null null 0 (because they are identical, albeit undefined)
2 null !null -1 (nothing < something )
3 !null null 1 (something > nothing )
Since memcmp() only returns a comparison value, and not an error value
which is really what's called for here, it presumably has these
"comparisons" defined in the standard. The man pages I have seen do
not address this point.

The standard does not specifically say what happens for null pointers
or for n == 0.

I would think it likely that using a null pointer
would be considered to be outside the range of valid parameters
and so that the behaviour would be undefined.

Returning 0 for n == 0 sounds like proper behaviour to me, but on
different grounds: that all empty buffers are identical. Returning
non-zero would imply that there was a difference found between
the empty buffers, which could not be the case.
 
E

Eric Sosman

Walter said:
The standard does not specifically say what happens for null pointers
or for n == 0.

7.21.1p1 seems pretty specific:

Where an argument declared as size_t n specifies the
length of the array for a function, n can have the value
zero on a call to that function. Unless explicitly stated
otherwise in the description of a particular function in
this subclause, pointer arguments on such a call shall
still have valid values, as described in 7.1.4.

Over in 7.1.4p1, we find:

[...] unless explicitly stated otherwise [...] If an
argument to a function has an invalid value (such as a
value outside the domain of the function, or a pointer
outside the address space of the program, or a null
pointer, [...]) [...] the behavior is undefined.

So, since the description of memcmp() mentions no exceptions,
it's legal for the third argument to be zero but not legal for
either of the first two to be NULL.
 
P

Peter Nilsson

[email protected] (Walter Roberson) said:
The standard does not specifically say what happens for
null pointers or for n == 0.

7.21.1p2:

Where an argument declared as size_t n specifies the
length of the array for a function, n can have the value
zero on a call to that function. Unless explicitly
stated otherwise in the description of a particular
function in this subclause, pointer arguments on such a
call shall still have valid values, as described in
7.1.4. On such a call, a function that locates a
character finds no occurrence, a function that compares
two character sequences returns zero, and a function that
copies characters copies zero characters.

7.21.4.1p2

The memcmp function compares the first n characters of
the object pointed to by s1 to the first n characters of
the object pointed to by s2.

So, if either pointer is null, the behaviour is undefined.
If 0 is passed (and both pointers are valid,) 0 will be
returned.
 
D

David Mathog

David said:
Any thoughts on whether the second function form would usually be
faster, slower, or the same?

I forgot to mention, on gcc 4.1.2 on an opteron system the memcmp() form
runs about 25% faster on a test with a sort of 125M words of 5 bytes
each. So I knew that it worked better with _some_ compiler, I was just
curious how common that was likely to be.

Regards,

David Mathog
 
D

David Mathog

David said:
I forgot to mention, on gcc 4.1.2 on an opteron system the memcmp() form
runs about 25% faster on a test with a sort of 125M words of 5 bytes
each. So I knew that it worked better with _some_ compiler, I was just
curious how common that was likely to be.

Turns out the fastest variant so far is neither of the above (all tests
at gcc -03).

Using memcmp() helped, dropping the run time to 164 seconds. Still, the
one below ran in just 144 seconds. The program does a lot of IO,
unclear how much of the time was in qsort() and how much in IO, but the
IO sections were unchanged and should have run in exactly the same time.
In the tests gbl_bws was 5 and there were around 120M words in the
sort buffer.

/* based on the memcpm() implementation here:
http://mia.ece.uic.edu/cgi-bin/lxr/http/source/memcmp.c?v=openvpn-1.4.2
*/
int compare_bw(const void *s1, const void *s2){
register unsigned const char *p1 = s1, *p2 = s2;
register int d;
register int n;

if (gbl_bws > 0){
n = gbl_bws;
while (n-- > 0){
d = *p1++ - *p2++;
if (d != 0)return(d);
}
}
return 0;
}

Probably this program shouldn't be using the qsort() from the
dynamically linked library anyway, since that can't inline the
comparison function into the sort function. No big deal to pick
up a qsort source and put it into this code, still, why should
the programmer have to do this? Do any C compilers offer an
option to compile and statically link their qsort() in order
to achieve better optimization? In other words, rather than
the end user having to physically paste the qsort code
in somewhere, can some compilers do it automagically? This
pretty much only comes up with qsort() since it makes a lot of
calls to the comparison function, and so is ripe for some
optimization at compile time.

Regards,

David Mathog
 
R

Randy Howard

I forgot to mention, on gcc 4.1.2 on an opteron system the memcmp() form
runs about 25% faster on a test with a sort of 125M words of 5 bytes
each. So I knew that it worked better with _some_ compiler, I was just
curious how common that was likely to be.

memcmp() was hideously slow in some versions of Microsoft's library, it
may have improved recently, I don't use it anymore.
 
P

pete

David said:
Turns out the fastest variant
so far is neither of the above (all tests at gcc -03).

Using memcmp() helped, dropping the run time to 164 seconds.
Still, the
one below ran in just 144 seconds. The program does a lot of IO,
unclear how much of the time was in qsort() and how much in IO,
but the
IO sections were unchanged and
should have run in exactly the same time.
In the tests gbl_bws was 5 and there were around 120M words in the
sort buffer.

/* based on the memcpm() implementation here:
http://mia.ece.uic.edu/cgi-bin/lxr/http/source/memcmp.c?v=openvpn-1.4.2
*/
int compare_bw(const void *s1, const void *s2){
register unsigned const char *p1 = s1, *p2 = s2;
register int d;
register int n;

if (gbl_bws > 0){
n = gbl_bws;
while (n-- > 0){
d = *p1++ - *p2++;
if (d != 0)return(d);
}
}
return 0;
}

Probably this program shouldn't be using the qsort() from the
dynamically linked library anyway, since that can't inline the
comparison function into the sort function. No big deal to pick
up a qsort source and put it into this code, still, why should
the programmer have to do this?

I don't know.
If you're sorting arrays of pointers, then you could try ucptrsort.
If you're sorting arrays of arrays,
then I'd have to write a MOV(A,B) macro.

/* BEGIN ucptrsort.h */

#ifndef H_UCPTRSORT_H
#define H_UCPTRSORT_H

#include <stddef.h>

typedef unsigned char * e_type;

extern size_t gbl_bws;

void ucptrsort(e_type *array, size_t nmemb);
void hsortm(e_type *array, size_t nmemb);
void i_sort(e_type *array, size_t nmemb); /* unstable insertionsort */

#endif

/* END ucptrsort.h */


/* BEGIN ucptrsort.c */

#include <assert.h>
#include <string.h>

#include "ucptrsort.h"

#define GT(A, B) (memcmp(A, B, gbl_bws) > 0)
#define SMALL_PARTITION 50
/*
** Reduce SMALL_PARTITION for slower moving data types.
** for example: 50 for unsigned, 20 for long double.
** SMALL_PARTITION must be made to be greater than or equal to 4.
** assert(SMALL_PARTITION >= 4);
** Use i_sort instead of ucptrsort, for sorting small arrays.
*/
#define LU_RAND_SEED 123456789LU
#define LU_RAND(S) ((S) * 69069 + 362437)
/*
** Use:
** #define LU_RAND(S) ((S) * 69069 + 362437)
** for better sorting
** Use:
** #define LU_RAND(S) ((S) * 69069 + 362437 & 0XFFFFFFFF)
** to avoid implementation defined behavior
*/
#define SWAP(A, B, T) ((void)((T) = *(A), *(A) = *(B), *(B) = (T)))
#define SORT_3(A, B, C, T) \
if (GT((A), (B))) { \
(T) = *(A); \
if (GT((C), (B))) { \
*(A) = *(B); \
if (GT(&(T), (C))) { \
*(B) = *(C); \
*(C) = (T); \
} else { \
*(B) = (T); \
} \
} else { \
*(A) = *(C); \
*(C) = (T); \
} \
} else { \
if (GT((B), (C))) { \
if (GT((A), (C))) { \
(T) = *(A); \
*(A) = *(C); \
*(C) = *(B); \
*(B) = (T); \
} else { \
SWAP((B), (C), (T)); \
} \
} \
}
#define SIFTDOWNM(A, I, N, P, C, T) \
{ \
(P) = (I); \
(T) = (A)[(P)]; \
(C) = (P) * 2; \
while ((N) > (C)) { \
if (GT((A) + (C) + 1, (A) + (C))) { \
++(C); \
} \
if (GT((A) + (C), &(T))) { \
(A)[(P)] = (A)[(C)]; \
(P) = (C); \
(C) *= 2; \
} else { \
break; \
} \
} \
if ((N) == (C) && GT((A) + (C), &(T))) { \
(A)[(P)] = (A)[(C)]; \
(P) = (C); \
} \
(A)[(P)] = (T); \
}

static int sorted(e_type *array, size_t nmemb);
static int rev_sorted(e_type *array, size_t nmemb);
static void rev_array(e_type *array, size_t nmemb);
static long unsigned
ucptrloop(e_type *array, size_t nmemb, size_t d, long unsigned seed);

void ucptrsort(e_type *array, size_t nmemb)
{
size_t d, n;

if (nmemb > 1) {
if (!rev_sorted(array, nmemb)) {
d = 2;
for (n = nmemb / 4; n; n /= 2) {
++d;
}
ucptrloop(array, nmemb, d * 2, LU_RAND_SEED);
} else {
rev_array(array, nmemb);
}
}
}

void hsortm(e_type *array, size_t nmemb)
{
size_t i, child, parent;
e_type temp;

if (nmemb > 1) {
i = --nmemb / 2;
do {
SIFTDOWNM(array, i, nmemb, parent, child, temp);
} while (i-- != 0);
SWAP(array, array + nmemb, temp);
while (--nmemb != 0) {
SIFTDOWNM(array, 0, nmemb, parent, child, temp);
SWAP(array, array + nmemb, temp);
}
}
}

void i_sort(e_type *array, size_t nmemb)
{
e_type temp, *first, *middle;
size_t counter;

if (nmemb > 1) {
first = middle = 1 + array;
counter = --nmemb;
while (--counter != 0) {
++first;
if (GT(middle, first)) {
middle = first;
}
}
if (GT(array, middle)) {
SWAP(array, middle, temp);
}
++array;
while (--nmemb != 0) {
first = array++;
if (GT(first, array)) {
middle = array;
temp = *middle;
do {
*middle-- = *first--;
} while (GT(first, &temp));
*middle = temp;
}
}
}
}

static int sorted(e_type *array, size_t nmemb)
{
while (--nmemb != 0 && !GT(array, array + 1)) {
++array;
}
return !nmemb;
}

static int rev_sorted(e_type *array, size_t nmemb)
{
while (--nmemb != 0 && !GT(array + 1, array)) {
++array;
}
return !nmemb;
}

static void rev_array(e_type *array, size_t nmemb)
{
e_type temp, *end;

for (end = array + nmemb; --end > array; ++array) {
SWAP(array, end, temp);
}
}

static long unsigned
ucptrloop(e_type *array, size_t nmemb, size_t d,
long unsigned seed)
{
e_type temp, *first, *last;

assert(SMALL_PARTITION >= 4);
while (nmemb > SMALL_PARTITION) {
if (sorted(array, nmemb)) {
return seed;
}
if (!d--) {
hsortm(array, nmemb);
return seed;
}
seed = LU_RAND(seed);
first = seed % --nmemb + array;
SWAP(array, first, temp);
first = 1 + array;
last = nmemb + array;
SORT_3(first, array, last, temp);
do {
++first;
} while (GT(array, first));
do {
--last;
} while (GT(last, array));
while (last > first) {
SWAP(last, first, temp);
do {
++first;
} while (GT(array, first));
do {
--last;
} while (GT(last, array));
}
SWAP(array, last, temp);
seed = ucptrloop(last + 1, nmemb + array - last, d, seed);
nmemb = last - array;
}
i_sort(array, nmemb);
return seed;
}

/* END ucptrsort.c */
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top