On Thu, 23 Jun 2005 05:46:57 +1000, Netocrat wrote:
...
This makes big assumptions abut the representation of integers in the
platform. It won't work if they contain any padding bits or use anything
other than big endian byte order. And technically speaking C's type
aliasing rules disallow this.
Where can I verify this from documentation - not that I don't believe
you, just that I'd like to know how to find these things out without
asking on the group? I know you can buy the standard, but I'd rather not
spend money. What are my options?
Since I posted I have realised that it is actually failing - my processor
is little endian...
Were you comparing equal data?
I intended that it was random but didn't explicitly randomise it; and I
think that the memory block malloc was returning was zeroed for these
tests, although it's impossible to go back and check... I am now
explicitly setting each byte to a random value. So for the data above,
yes, it probably was all equal.
The average length of data being compared is not important for non-equal
data, what is important is the position of the first character that
differs. For "random" data this will be the first character tested the
great majority of the time.
Well put.
<Off-topic (lots of it)>
It so happens that the data was probably equal for the post to which you
responded. I tested again, this time with equal data except for the final
compared byte which was alternately less than and greater than. I also
re-ordered the inner while loop of my naive memcmp to be:
if (*ua == *ub) {
ua++;
ub++;
} else if (*ua < *ub)
return -1;
else
return 1;
The results that I got are different to those I posted above - those above
were done pretty roughly so I may have made an error translating from
times to relative performance. Anyhow, for all lengths less than or equal
to 10, both of my functions out-perform the library when all data bar
the last byte is the same. Here are some very rough relative times:
Length Library Naive Word-based
0 2.782063 0.824878 1.026883
1 3.839585 1.794128 1.536425
2 3.979533 1.936510 1.711663
3 4.487526 2.082990 2.091040
4 4.911784 2.249875 4.031234
5 5.433289 2.392460 2.443990
6 5.142204 3.054199 2.226923
7 5.226639 3.310050 2.500838
8 5.122240 3.394408 4.416643
9 5.741017 3.498679 2.579461
10 5.832017 4.845331 3.007469
11 6.545730 5.683157 3.090258
12 6.340615 6.557857 4.964385
13 6.434350 6.702367 3.512579
14 6.485347 6.534230 3.369173
15 6.735006 7.560720 4.631879
16 8.304644 6.969153 4.455131
17 6.899765 6.990281 3.269464
18 7.499913 7.809836 4.871980
19 7.754756 7.356656 6.513516
20 8.369894 7.477097 5.325045
....
300 61.002508 65.282563 34.535567
301 61.211380 66.803313 33.170267
302 67.950986 66.556912 33.923567
303 66.034115 67.561750 32.898632
In all of these tests the word-based function far outperforms the library
function - although as I have noted, the endian-ness causes errors.
The naive approach is very similar in time to the library version and
sometimes faster; suggesting that no tricks/optimisations are being used
in the library version.
Intel chips seem to support an op-code that compares a 32-bit value in
byte-order, so I'm going to inline that into my function with some
assembly in gcc and see if I can get both a correct result and also
maintain the performance boost that this function provides, which I
suspect is possible. If so it would be an appropriate way to optimise the
intel version of glibc's memcpy function.
These are techniques available for the implementation to use for
standard library functions if the implementor chooses. They are not
normally techniques a program should use.
That's what I wanted to know. Cheers.