comparing two strcasecmp (stricmp) implementations

P

pete

William Krick wrote:
Here's an improved version using do-while as jokingly (I think)
suggested by Skarmander earlier in this thread...

int str_ccmp( const char *s1, const char *s2 ) {
int c1, c2;
do {
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
} while (c1 == c2 && c1 != 0);
return c1 - c2;
}

I got another one for you.

int str_cncmp(const char *s1, const char *s2, size_t n)

{
int c1, c2;

if (n != 0) {
do {
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
} while (c1 == c2 && c1 != '\0' && --n != 0);
}
return n != 0 ? c1 - c2 : 0;
}
 
P

pete

Eric Sosman wrote:
I do not subscribe to the idea that the Standard's
description of a Standard library function should govern
the design of non-Standard functions.

Besides, you haven't addressed the issue of the
argument values for <ctype.h> functions. Shy, schmy.

Upon further consideration, I think you made some good points.
 
T

Tim Rentsch

William Krick said:
Tim said:
If the tests are written differently, the return values
might be somewhat clearer:

int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;
if( c1 == 0 ) return 0;
}
}

Actually, when you cleaned up the return conditions, you left out some
of the conditions and broke the code. [snip]

If you look again I think you'll see that the posted function
does indeed work properly. Here is the same function with some
assertions added -- see if you agree.


int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;

assert( c1 == c2 );

if( c1 == 0 ) return 0;

assert( c1 == c2 && c1 != 0 );
}
}

Note that each of the 'return' statements is executed only if the
condition 'c1 != c2 || c1 == 0' is true, because of the 'if'
tests; it's not necessary to test for it separately.
 
M

Michael Wojcik

I think Richard Tobin was right when he said...
"Since when were the str* functions supposed to handle NULL?"

I shouldn't be trying to handle null pointers.

That's debatable. The standard str* functions aren't required to do
anything sensible with null arguments (whether the actual parameter
is NULL or any other form of a null pointer value), but that doesn't
mean you must refrain from doing so as well. There may be justifi-
cations that apply to the standard functions (such as pre-standard
implementations and implementation-specific performance tricks) which
don't apply to yours.

All the functions *I* write which take a pointer parameter (except
for a very few special cases) handle null pointers in some fashion,
either by explicitly permitting them in the interface or by returning
an error.

For a function like your "strcasecmp" (note that this name is
reserved to the implementation, and you should not use it), I can see
a few possibilities. One already suggested is to treat null pointers
as if they were pointers to empty strings. A small variation on that
is to consider null pointers equal, but less than any string,
including an empty string. Another is to treat them similarly to
NaNs: null pointers never compare equal to anything, including null
pointers, for the purpose of your function. You'll have to decide
what nonzero value to return in that case, of course. (Since you
have int as the return type, you might want to use something like
INT_MAX as the return value, to distinguish it from likely values for
any legitimate comparison.)

Variations on this topic have been rehashed many times over the years
here, and there are those who feel that violating the interface of a
user-defined function ought to stop a program in its tracks. Even if
I agreed (and I generally do not, for programs of any complexity), I
wouldn't rely on undefined behavior to do that - because, being
undefined, it can't be trusted to do so. If you want to abort the
program if strcasecmp gets a null pointer, that's what abort() is
for.
 
J

Jordan Abel

That's debatable. The standard str* functions aren't required to do
anything sensible with null arguments (whether the actual parameter
is NULL or any other form of a null pointer value), but that doesn't
mean you must refrain from doing so as well. There may be justifi-
cations that apply to the standard functions (such as pre-standard
implementations and implementation-specific performance tricks) which
don't apply to yours.

It appears that he's writing a substitute intended to drop-in for
strcasecmp on systems that don't provide it.
All the functions *I* write which take a pointer parameter (except
for a very few special cases) handle null pointers in some fashion,
either by explicitly permitting them in the interface or by returning
an error.

For a function like your "strcasecmp" (note that this name is
reserved to the implementation, and you should not use it)

which is why it's #ifndef'd to only compile on systems that don't
provide it.
 
K

Keith Thompson

Ben Pfaff said:
[case-insensitive strcmp-like function]
I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

strcmp() isn't specified so strictly. You can't depend on it
returning exactly -1 or 1. Here's what the standard says:

3 The strcmp function returns an integer greater than, equal to,
or less than zero, accordingly as the string pointed to by
s1 is greater than, equal to, or less than the string
pointed to by s2.

Which means that restricting the returned values to -1 / 0 / 1 is both
perfectly legal and perfectly unnecessary. If you've gone to any
extra effort to restrict the result to those values, I suggest that
that effort is wasted.

On the other hand, there might be some issues on systems where
sizeof(int)==1 (and, by implication, CHAR_BIT>=16). I'm too lazy to
track down the details.
 
K

Keith Thompson

Jordan Abel said:
The HAVE_STRCASECMP macro presumably comes from a compile-time check on
whether the function exists. thus his code will only be reached on an
environment which has already been determined to be non-conforming.

Non-conforming to what? Standard C doesn't define a strcasecmp()
function<OT>, though POSIX does</OT>.
 
W

William Krick

Tim said:
William Krick said:
Tim said:
One final revision. I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

int str_ccmp( const char *s1, const char *s2 )
{
int c1, c2;
for(;;)
{
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
if (c1 == 0 || c1 != c2)
return c1 == c2 ? 0 : c1 > c2 ? 1 : -1;
}
}

If the tests are written differently, the return values
might be somewhat clearer:

int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;
if( c1 == 0 ) return 0;
}
}

Actually, when you cleaned up the return conditions, you left out some
of the conditions and broke the code. [snip]

If you look again I think you'll see that the posted function
does indeed work properly. Here is the same function with some
assertions added -- see if you agree.


int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;

assert( c1 == c2 );

if( c1 == 0 ) return 0;

assert( c1 == c2 && c1 != 0 );
}
}

Note that each of the 'return' statements is executed only if the
condition 'c1 != c2 || c1 == 0' is true, because of the 'if'
tests; it's not necessary to test for it separately.


I'll be damned. You're right. My bad.

Even though that is clearer, it would probably be a little slower since
there's 3 comparisons being done on each loop vs two in this version
that Pete posted...

int str_ccmp( const char *s1, const char *s2 )
{
int c1, c2;
do {
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
} while (c1 == c2 && c1 != 0);
return c2 > c1 ? -1 : c1 > c2;
}
 
P

pete

William said:
Tim said:
of the conditions and broke the code. [snip]

If you look again I think you'll see that the posted function
does indeed work properly. Here is the same function with some
assertions added -- see if you agree.


int
str_ccmp( const char *s1, const char *s2 ){
int c1, c2;

for( ; ; s1++, s2++ ){
c1 = tolower( (unsigned char) *s1 );
c2 = tolower( (unsigned char) *s2 );

if( c1 > c2 ) return 1;
if( c1 < c2 ) return -1;

assert( c1 == c2 );

if( c1 == 0 ) return 0;

assert( c1 == c2 && c1 != 0 );
}
}

Note that each of the 'return' statements is executed only if the
condition 'c1 != c2 || c1 == 0' is true, because of the 'if'
tests; it's not necessary to test for it separately.

I'll be damned. You're right. My bad.

Even though that is clearer,

Is it really clearer?
That's not the first thing that would pop into my head
after getting an explanation of how I read the code wrong.
 
M

Michael Wojcik

It appears that he's writing a substitute intended to drop-in for
strcasecmp on systems that don't provide it.

Yes, but I don't see that makes any difference to anything I wrote.
which is why it's #ifndef'd to only compile on systems that don't
provide it.

So what? That identifier is still reserved to the implementation.
Perhaps the next revision of the implementation for that platform
*will* include it. Perhaps that implementation uses it for some
other purpose - it's not required to document that.

In any event, there's little to be gained by adding a nonstandard
function with a reserved name only on platforms where it doesn't
already exist. Write the function, give it a legal name, and use it
everywhere. Then there's no need to worry about checking to see
whether it's already defined on each platform and conditional
compilation. The code is cleaner and portable.

The only possible justification for using a version supplied with the
implementation is performance, and if case-insensitive string
comparisons are in a performance-critical code section, that suggests
a review of the design might be in order.
 
J

Jordan Abel

Yes, but I don't see that makes any difference to anything I wrote.


So what? That identifier is still reserved to the implementation.

As function names in stdlib.h and string.h - they are not reserved for
any other purpose, and if it has been determined that the headers in
question don't contain that function name...

Cites: C89:
4.13.7
Function names that begin with str and a lower-case letter
(followed by any combination of digits, letters and underscore) may be
added to the declarations in the <stdlib.h> header.
4.13.8
Function names that begin with str , mem , or wcs and a lower-case
letter (followed by any combination of digits, letters and underscore)
may be added to the declarations in the said:
Perhaps the next revision of the implementation for that platform
*will* include it.

At which point HAVE_STRCASECMP will presumably test true, since these
are generated by seeing if a test program trying to use it successfully
translates.
Perhaps that implementation uses it for some other purpose - it's not
required to document that.

It's not allowed to do that. The language of the standard only permits
implementations to use it as a function name, and only in stdlib.h or
string.h. Nowhere else and for no other purpose.
 
P

pete

Jordan said:
and if it has been determined that the headers in
question don't contain that function name...

Where are you reading that that has anything to do with anything?
 
J

Jordan Abel

Where are you reading that that has anything to do with anything?

I'm saying that if the implementation does not use the identifier for
either of the two purposes it has been reserved for [as determined at
compile-time], then the program is free to use it.
 
P

pete

Jordan said:
Where are you reading that that has anything to do with anything?

I'm saying that if the implementation does not use the identifier for
either of the two purposes it has been reserved for [as determined at
compile-time], then the program is free to use it.

Yes.
I'm saying, how do you figure that's true?
 
J

Jordan Abel

Jordan said:
Jordan Abel wrote:

and if it has been determined that the headers in question don't
contain that function name...

Where are you reading that that has anything to do with anything?

I'm saying that if the implementation does not use the identifier for
either of the two purposes it has been reserved for [as determined at
compile-time], then the program is free to use it.

Yes.
I'm saying, how do you figure that's true?

Because it's not a generally reserved identifier, only one that an
implementation is free to use for a specific purpose. If it is
determined that it does not use it for that purpose, where does the
standard say the user is forbidden to use it?
 
P

pete

pete said:
Jordan said:
Jordan Abel wrote:

and if it has been determined that the headers in
question don't contain that function name...

Where are you reading that that has anything to do with anything?

I'm saying that if the implementation does not use the identifier for
either of the two purposes it has been reserved for [as determined at
compile-time], then the program is free to use it.

Yes.
I'm saying, how do you figure that's true?

7.26 Future library directions

1 The following names are grouped under individual headers for
convenience. All external names described below are reserved no
matter what headers are included by the program.
 
P

pete

Jordan said:
Jordan said:
Jordan Abel wrote:

and if it has been determined that the headers in question don't
contain that function name...

Where are you reading that that has anything to do with anything?

I'm saying that if the implementation does not use the identifier for
either of the two purposes it has been reserved for [as determined at
compile-time], then the program is free to use it.

Yes.
I'm saying, how do you figure that's true?

Because it's not a generally reserved identifier, only one that an
implementation is free to use for a specific purpose. If it is
determined that it does not use it for that purpose, where does the
standard say the user is forbidden to use it?

7.26 Future library directions

1 The following names are grouped under individual headers for
convenience. All external names described below are reserved no
matter what headers are included by the program.

7.26.11 String handling <string.h>
1 Function names that begin with str, mem,
or wcs and a lowercase letter may be added
to the declarations in the <string.h> header.
 
P

pete

pete said:
Jordan said:
Jordan Abel wrote:

Jordan Abel wrote:

and if it has been determined that the headers in question don't
contain that function name...

Where are you reading that that has anything to do with anything?

I'm saying that if the implementation does not use the identifier for
either of the two purposes it has been reserved for [as determined at
compile-time], then the program is free to use it.

Yes.
I'm saying, how do you figure that's true?

Because it's not a generally reserved identifier, only one that an
implementation is free to use for a specific purpose. If it is
determined that it does not use it for that purpose, where does the
standard say the user is forbidden to use it?

2 If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or
defines a reserved identifier as a macro name,
the behavior is undefined.
 
J

Jordan Abel

pete said:
Jordan said:
Jordan Abel wrote:

Jordan Abel wrote:

and if it has been determined that the headers in question don't
contain that function name...

Where are you reading that that has anything to do with anything?

I'm saying that if the implementation does not use the identifier for
either of the two purposes it has been reserved for [as determined at
compile-time], then the program is free to use it.

Yes.
I'm saying, how do you figure that's true?

Because it's not a generally reserved identifier, only one that an
implementation is free to use for a specific purpose. If it is
determined that it does not use it for that purpose, where does the
standard say the user is forbidden to use it?

2 If the program declares or defines an identifier in a context in
which it is reserved (other than as allowed by 7.1.4), or defines a
reserved identifier as a macro name, the behavior is undefined.
7.26 Future library directions

1 The following names are grouped under individual headers for
convenience. All external names described below are reserved no
matter what headers are included by the program.

7.26.11 String handling <string.h>
1 Function names that begin with str, mem, or wcs and a lowercase
letter may be added to the declarations in the <string.h> header.

It's bad form to followup to yourself. And stop citing C99 at me, that's
not the standard in effect in 99% of implementations. I still see
"function names" [which refutes your 'could be using it for other
purposes' claim] and '_may_ be added' [which tells me that if they're
NOT added, they're ok to use].
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,172
Messages
2,570,933
Members
47,472
Latest member
blackwatermelon

Latest Threads

Top