string comparison

L

Lawrence Kirby

Here's my favorite version. I got it from someone here on clc
but don't remember whom:
return *a < *b ? -1 : *a > *b;

I have vague recollections about that, but it was a long time ago,
probably in the previous millennium.

Lawrence
 
L

lawrence.jones

Ben Pfaff said:
Here's my favorite version. I got it from someone here on clc
but don't remember whom:
return *a < *b ? -1 : *a > *b;

My favorite is the conditionless form:

return (*a > *b) - (*a < *b);

-Larry Jones

I suppose if I had two X chromosomes, I'd feel hostile too. -- Calvin
 
N

Netocrat

On Thu, 23 Jun 2005 05:46:57 +1000, Netocrat wrote:

...


This makes big assumptions abut the representation of integers in the
platform. It won't work if they contain any padding bits or use anything
other than big endian byte order. And technically speaking C's type
aliasing rules disallow this.

Where can I verify this from documentation - not that I don't believe
you, just that I'd like to know how to find these things out without
asking on the group? I know you can buy the standard, but I'd rather not
spend money. What are my options?

Since I posted I have realised that it is actually failing - my processor
is little endian...
Were you comparing equal data?

I intended that it was random but didn't explicitly randomise it; and I
think that the memory block malloc was returning was zeroed for these
tests, although it's impossible to go back and check... I am now
explicitly setting each byte to a random value. So for the data above,
yes, it probably was all equal.
The average length of data being compared is not important for non-equal
data, what is important is the position of the first character that
differs. For "random" data this will be the first character tested the
great majority of the time.

Well put.

<Off-topic (lots of it)>
It so happens that the data was probably equal for the post to which you
responded. I tested again, this time with equal data except for the final
compared byte which was alternately less than and greater than. I also
re-ordered the inner while loop of my naive memcmp to be:
if (*ua == *ub) {
ua++;
ub++;
} else if (*ua < *ub)
return -1;
else
return 1;

The results that I got are different to those I posted above - those above
were done pretty roughly so I may have made an error translating from
times to relative performance. Anyhow, for all lengths less than or equal
to 10, both of my functions out-perform the library when all data bar
the last byte is the same. Here are some very rough relative times:

Length Library Naive Word-based
0 2.782063 0.824878 1.026883
1 3.839585 1.794128 1.536425
2 3.979533 1.936510 1.711663
3 4.487526 2.082990 2.091040
4 4.911784 2.249875 4.031234
5 5.433289 2.392460 2.443990
6 5.142204 3.054199 2.226923
7 5.226639 3.310050 2.500838
8 5.122240 3.394408 4.416643
9 5.741017 3.498679 2.579461
10 5.832017 4.845331 3.007469
11 6.545730 5.683157 3.090258
12 6.340615 6.557857 4.964385
13 6.434350 6.702367 3.512579
14 6.485347 6.534230 3.369173
15 6.735006 7.560720 4.631879
16 8.304644 6.969153 4.455131
17 6.899765 6.990281 3.269464
18 7.499913 7.809836 4.871980
19 7.754756 7.356656 6.513516
20 8.369894 7.477097 5.325045
....
300 61.002508 65.282563 34.535567
301 61.211380 66.803313 33.170267
302 67.950986 66.556912 33.923567
303 66.034115 67.561750 32.898632

In all of these tests the word-based function far outperforms the library
function - although as I have noted, the endian-ness causes errors.
The naive approach is very similar in time to the library version and
sometimes faster; suggesting that no tricks/optimisations are being used
in the library version.

Intel chips seem to support an op-code that compares a 32-bit value in
byte-order, so I'm going to inline that into my function with some
assembly in gcc and see if I can get both a correct result and also
maintain the performance boost that this function provides, which I
suspect is possible. If so it would be an appropriate way to optimise the
intel version of glibc's memcpy function.

These are techniques available for the implementation to use for
standard library functions if the implementor chooses. They are not
normally techniques a program should use.

That's what I wanted to know. Cheers.
 
L

Lawrence Kirby

Where can I verify this from documentation - not that I don't believe
you, just that I'd like to know how to find these things out without
asking on the group? I know you can buy the standard, but I'd rather not
spend money. What are my options?

You could download a draft version of the standard or buy a good book. Of
course buying a book implies spending money and isn't guaranteed to be
accurate or complete. Drafts are also not going to be 100% accurate but at
least are based on the actual text of the standard.

The aliasing rules I referred to are in C99 6.5p7:

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:73)
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the
object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
- a character type.

Specifically you can access any type of object through an lvalue of
character type, but the reverse is not true - accessing a character, or an
array of characters, through an lvalue of non-character type is not one of
the permitted options. C99 introduces the concept of "effective type" as
used above. This supports the use of non-character objects in malloc'd
memory.

....
In all of these tests the word-based function far outperforms the library
function - although as I have noted, the endian-ness causes errors.

You can make the code insensitive to byte order by using "word" ops to
find the first word that differs, then using byte ops to find the first
byte in the word that differs.

Lawrence
 
N

Netocrat

You could download a draft version of the standard or buy a good book. Of
course buying a book implies spending money and isn't guaranteed to be
accurate or complete. Drafts are also not going to be 100% accurate but at
least are based on the actual text of the standard.

I'm now using on-line drafts of both C90 and C99 as well as the ANSI
Rationale as posted in another thread by various people. So occasionally
it'll be inaccurate, but for the basics I should be fine, particularly
where the two drafts concur.
The aliasing rules I referred to are in C99 6.5p7:

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:73) - a type compatible
with the effective type of the object, - a qualified version of a type
compatible with the effective type of the
object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
- a character type.

Specifically you can access any type of object through an lvalue of
character type, but the reverse is not true - accessing a character, or an
array of characters, through an lvalue of non-character type is not one of
the permitted options. C99 introduces the concept of "effective type" as
used above. This supports the use of non-character objects in malloc'd
memory.

What do you mean by your last statement - how do you define non-character
objects?
You can make the code insensitive to byte order by using "word" ops to
find the first word that differs, then using byte ops to find the first
byte in the word that differs.

It's always the simple solutions that are the most elusive. I had no need
to revert to assembly after all. Anyhow the excursion was informative.

I now have code that does effectively what you suggested except that
should it find a word that differs, it uses an assembly opcode to reverse
the byte order and test again. I haven't checked whether this is any
faster than as you have suggested simply reverting to byte ops in that
case. It may even vary from processor to processor - the array of Intel
chips supporting this opcode is vast.
 
N

Netocrat

Netocrat wrote:

[stats throughout 3 different posts demonstrating that glibc's memcpy
function on intel is much slower than a 'word-based' compare function]

It's off-topic but I want to correct the record:

I have discovered that when optimisation is turned on in my version of
gcc, memcpy performance roughly halves. This can be avoided by
accessing memcpy through a function pointer - it's not enough to write
(&memcpy) - you actually have to use a separate variable. The
statistics I provided were based on an optimised compilation without
accessing memcpy through a function pointer, so glibc's memcpy
performance was adversely affected.

Without optimisations the word-based memcmp function is only faster for
the cases where the number of initially equal bytes is 1, 3, 5 or 7.

With optimisations the word-based memcmp function is always
significantly faster than glibc's memcmp accessed directly. When
glibc's memcmp is accessed through a function pointer though, the
word-based memcmp is faster only - and only very slightly - for most of
the cases where the number of initially equal bytes is less than 20.
Thereafter glibc's memcmp is increasingly faster.
 
D

Dave Thompson

On 11 Jul 2005 03:01:13 -0700, "Netocrat" <[email protected]>
wrote:
It's off-topic but I want to correct the record:

I have discovered that when optimisation is turned on in my version of
gcc, memcpy performance roughly halves. This can be avoided by
accessing memcpy through a function pointer - it's not enough to write
(&memcpy) - you actually have to use a separate variable. <snip>

Multiposted and answered in comp.lang.asm.x86 where it is ontopic --
at least in my opinion and it has been passing moderation.

The standard-C part is that an implementation (compiler) may generate
inline code instead of actually calling any standard library function,
and IME gcc-x86 does for memcmp() with -O >0 and not -fno-builtin.
Although the inlined version is AFAICT always repe cmpsb, which is the
same as in the (out-of-line) library (glibc) source Netocrat found.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,166
Messages
2,570,907
Members
47,448
Latest member
DeanaQ4445

Latest Threads

Top