comparing two strcasecmp (stricmp) implementations

R

Randy Howard

Ben Pfaff wrote
(in article said:
[case-insensitive strcmp-like function]
I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

strcmp() isn't specified so strictly. You can't depend on it
returning exactly -1 or 1. Here's what the standard says:

3 The strcmp function returns an integer greater than, equal to,
or less than zero, accordingly as the string pointed to by
s1 is greater than, equal to, or less than the string
pointed to by s2.

And what 'harm' does it do, relative to the standard, to
restrict it in such a fashion? Are there programs that depend
upon larger and smaller values being returned?
 
C

Chris McDonald

Randy Howard said:
Ben Pfaff wrote
(in article <[email protected]>):
[case-insensitive strcmp-like function]
I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

strcmp() isn't specified so strictly. You can't depend on it
returning exactly -1 or 1. Here's what the standard says:

3 The strcmp function returns an integer greater than, equal to,
or less than zero, accordingly as the string pointed to by
s1 is greater than, equal to, or less than the string
pointed to by s2.
And what 'harm' does it do, relative to the standard, to
restrict it in such a fashion? Are there programs that depend
upon larger and smaller values being returned?


Surely it's not any standard's responsibility to protect programs which
make incorrect assumptions about their working within that standard's
environment?
 
P

pete

Jordan Abel wrote:
strcmp("foo","bar") returns 4 on my system, and probably yours. Only a
positive value is required by the standard in that case.

No.

It might be required according to every character set
that you ever heard of,
but that result isn't required by the standard.

The values of 'f' and 'b' are implementation defined,
and there's nothing in the standard requiring either
one to be greater.
 
J

Jordan Abel

No.

It might be required according to every character set
that you ever heard of,
but that result isn't required by the standard.

The values of 'f' and 'b' are implementation defined,
and there's nothing in the standard requiring either
one to be greater.

....that's nitpicking. i meant that nothing is required of the result but
the sign in general.
 
E

Eric Sosman

pete said:
I'm seeing "interpreted as unsigned char"
in the above quote from the standard.
*(unsigned char *)byte, is the what "interpreted as" means.
The standard isn't shy about using the word "converted".

I do not subscribe to the idea that the Standard's
description of a Standard library function should govern
the design of non-Standard functions.

Besides, you haven't addressed the issue of the
argument values for <ctype.h> functions. Shy, schmy.
 
P

pete

I do not subscribe to the idea that the Standard's
description of a Standard library function should govern
the design of non-Standard functions.

Then what difference does it make to you
whether or not the Standard is entirely clear
about what should be done with negative `char' values?
Besides, you haven't addressed the issue of the
argument values for <ctype.h> functions. Shy, schmy.

It's all here:

*(const unsigned char *)s1 is the value of the byte.
That value is the argument to toupper.
I don't see why anything else would be better.

int str_ccmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

while (toupper(*p1) == toupper(*p2)) {
if (*p1 == '\0') {
return 0;
}
++p1;
++p2;
}
return toupper(*p2) > toupper(*p1) ? -1 : 1;
}
 
P

pete

Jordan said:
...that's nitpicking.
i meant that nothing is required of the result but
the sign in general.

strcmp("foo","bar") is not required by
the standard to return a positive value.

You mean only that strcmp("foo","bar") should not return zero?
 
J

Jordan Abel

strcmp("foo","bar") is not required by
the standard to return a positive value.

You mean only that strcmp("foo","bar") should not return zero?

I meant "If 'f' > 'b', then strcmp("foo","bar") is required to return a
positive number." And I thought that was perfectly clear.
 
R

Randy Howard

Chris McDonald wrote
(in article said:
Randy Howard said:
Ben Pfaff wrote
(in article <[email protected]>):
[case-insensitive strcmp-like function]

I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

strcmp() isn't specified so strictly. You can't depend on it
returning exactly -1 or 1. Here's what the standard says:

3 The strcmp function returns an integer greater than, equal to,
or less than zero, accordingly as the string pointed to by
s1 is greater than, equal to, or less than the string
pointed to by s2.
And what 'harm' does it do, relative to the standard, to
restrict it in such a fashion? Are there programs that depend
upon larger and smaller values being returned?


Surely it's not any standard's responsibility to protect programs which
make incorrect assumptions about their working within that standard's
environment?

I didn't say or imply that it was. As such, I don't understand
your question. All I said was that returning only three values
instead of a much larger range didn't seem like a problem from
here.
 
J

Jack Klein

I'm currently evaluating two implementations of a case insensitive
string comparison function to replace the non-ANSI stricmp(). Both of
the implementations below seem to work fine but I'm wondering if one is
better than the other or if there is some sort of hybrid of the two
that would be superior.


IMPLEMENTATION 1:

#ifndef HAVE_STRCASECMP
#define ccmp(a,b) ((a) == (b) ? 0 : ((a) > (b) ? 1 : -1))
int strcasecmp(unsigned char *s1, unsigned char *s2)

[snip]

You have quite a few useful replies, but I notice one thing nobody has
pointed out.

Since you seem to be concerned with "ANSI-ness", you should not be
defining external or even file scope identifiers that start with "str"
followed by a lower case letter, as they are reserved for the
implementation.

Consider 'str_casecmp'.
 
C

Chris McDonald

Randy Howard said:
Chris McDonald wrote
(in article <[email protected]>):
I didn't say or imply that it was. As such, I don't understand
your question. All I said was that returning only three values
instead of a much larger range didn't seem like a problem from
here.

OK, sorry, we both agree
(without intonation, flat ASCII replies can get interpreted either way).
 
T

Tim Rentsch

William Krick said:
One final revision. I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

int str_ccmp( const char *s1, const char *s2 )
{
int c1, c2;
for(;;)
{
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
if (c1 == 0 || c1 != c2)
return c1 == c2 ? 0 : c1 > c2 ? 1 : -1;
}
}

If the tests are written differently, the return values
might be somewhat clearer:

int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;
if( c1 == 0 ) return 0;
}
}


Incidentally, please use spaces rather than tabs when
posting to this newsgroup.
 
M

Michael Mair

Jordan said:
I meant "If 'f' > 'b', then strcmp("foo","bar") is required to return a
positive number." And I thought that was perfectly clear.

<commonsense>Yes, of course</commonsense>
<clc pedantic="true">No, you did not state it that way</clc>

I suggest you try to be a little more concise even in the
obvious cases and ignore or welcome the nits in the others.
The nits then tend to come as additional comments and not
outright contradiction... :)

Cheers
Michael
 
J

Jordan Abel

I'm currently evaluating two implementations of a case insensitive
string comparison function to replace the non-ANSI stricmp(). Both of
the implementations below seem to work fine but I'm wondering if one is
better than the other or if there is some sort of hybrid of the two
that would be superior.


IMPLEMENTATION 1:

#ifndef HAVE_STRCASECMP
#define ccmp(a,b) ((a) == (b) ? 0 : ((a) > (b) ? 1 : -1))
int strcasecmp(unsigned char *s1, unsigned char *s2)

[snip]

You have quite a few useful replies, but I notice one thing nobody has
pointed out.

Since you seem to be concerned with "ANSI-ness", you should not be
defining external or even file scope identifiers that start with "str"
followed by a lower case letter, as they are reserved for the
implementation.

Consider 'str_casecmp'.

The HAVE_STRCASECMP macro presumably comes from a compile-time check on
whether the function exists. thus his code will only be reached on an
environment which has already been determined to be non-conforming.
 
C

Chris Dollin

Skarmander said:
I'd prefer

assert(s1 && s2);

Well, it's a preference. It's not a very good guard, because if you
compile with NDEBUG it evaporates. I'm also not keen on programs that
can do an unavoidable BOOM, but this is C, so we're sort of stuck with
it.
Or a (documented) redefinition of the semantics (e.g., treat 0 as "").

Yes, you can do that. But I'd be very wary, because I think that a null
pointer should not be accepted as a string, and that redefinition of
the semantics licences it.

assert( s1 && s2 );
if (s1 == 0 || s2 == 0) yourLogFunction
( "A null pointer was passed to strcasecmp. Don't /do/ that. "
"It has been treated as the empty string. Don't rely on this." );
if (s1 == 0) s1 = "";
if (s2 == 0) s2 = "";
You could use WHATEVERYOUWANT if you document either it or the fact that
passing null pointers will yield an indeterminate value. Don't keep it
under the hood, in any case.

Oh, indeed; explicitness about the semantics (including "is undefined")
is desirable.
 
P

pete

Peter said:
Some previous queries by myself on the issue...

For negative int value arguments,
simple conversion to unsigned char seems appropriate,
for ctype functions.

The output functions, like putchar,
output their int value argument,
as though it were converted to unsigned char.

If you have a negative int value like:

#define NEG_A ('A' - 1 - (unsigned char)-1)

then,

putchar(NEG_A);

is going to output the 'A' character and return a value of 'A'.

tolower((unsigned char)NEG_A);

will return 'a'.
 
W

William Krick

Tim said:
If the tests are written differently, the return values
might be somewhat clearer:

int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;
if( c1 == 0 ) return 0;
}
}


Incidentally, please use spaces rather than tabs when
posting to this newsgroup.


Actually, when you cleaned up the return conditions, you left out some
of the conditions and broke the code. I've added them back in but I'm
sure it could still be simplified...

int str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if (c1 != c2 || c1 == 0) {
if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;
if( c1 == 0 ) return 0;
}
}
}
 
W

William Krick

Tim said:
If the tests are written differently, the return values
might be somewhat clearer:

int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;
if( c1 == 0 ) return 0;
}
}


Incidentally, please use spaces rather than tabs when
posting to this newsgroup.

Actually, when you cleaned up the return conditions, you left some of
them out and broke the code.

Here's an improved version using do-while as jokingly (I think)
suggested by Skarmander earlier in this thread...

int str_ccmp( const char *s1, const char *s2 ) {
int c1, c2;
do {
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
} while (c1 == c2 && c1 != 0);
return c1 - c2;
}
 
P

pete

William Krick wrote:
Actually, when you cleaned up the return conditions, you left out some
of the conditions and broke the code. I've added them back in but I'm
sure it could still be simplified...

int str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if (c1 != c2 || c1 == 0) {
if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;
if( c1 == 0 ) return 0;
}
}
}

int str_ccmp( const char *s1, const char *s2 )
{
int c1, c2;

do {
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
} while (c1 == c2 && c1 != 0);
return c2 > c1 ? -1 : c1 > c2;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,172
Messages
2,570,934
Members
47,474
Latest member
AntoniaDea

Latest Threads

Top