about STREQ

P

Peng Jian

#define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)

I'm a beginner learning C. I can't see what *(a) == *(b) is for.
If either a or b is a null pointer, this may cause crash.
And if neither is a null pointer, only using strcmp((a), (b)) == 0 can
do the job, so *(a) == *(b) seems unnecessary.
 
L

Leor Zolman

#define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)

I'm a beginner learning C. I can't see what *(a) == *(b) is for.
If either a or b is a null pointer, this may cause crash.
And if neither is a null pointer, only using strcmp((a), (b)) == 0 can
do the job, so *(a) == *(b) seems unnecessary.

In the case where it is sufficiently uncommon for the strings being
compared to begin with the same character, this approach might get you
better performance than a simple strcmp.

Note that strcmp isn't permitted to take null pointers either, so you'd
have to test for that in either case. However, it is fairly
straight-forward to programmatically constrain pointer-to-char variables to
insure they remain valid before being used in such contexts.
-leor
 
R

Régis Troadec

"Leor Zolman" <[email protected]> a écrit dans le message de

Hi,
In the case where it is sufficiently uncommon for the strings being
compared to begin with the same character, this approach might get you
better performance than a simple strcmp.

On the other hand, it's very funny if strings being compared are *very long*
and only differ at, say, *(a+1) and *(b+1),
I think it might not be the case and that this macro is used in a context
where strings being compared differ very often in their first character,
otherwise, it's completely useless :)

Regis
 
L

Leor Zolman

"Leor Zolman" <[email protected]> a écrit dans le message de

Hi,


On the other hand, it's very funny if strings being compared are *very long*
and only differ at, say, *(a+1) and *(b+1),
I think it might not be the case and that this macro is used in a context
where strings being compared differ very often in their first character,
otherwise, it's completely useless :)

That's what I said: "uncommon to begin with the same character" == "differ
very often in their first character." To paraphrase, the more often they
differ in their first character, the more you gain by the "extra" test of
just that first character. If they /never/ or hardly ever differ in their
first character, the technique is worse than "completely useless"--it would
be a pessimization.

If it is blinding string comparison performance you're after, though,
there's no substitute for tuning the algorithm to the nature of the strings
involved. The more you "know" about the nature of the strings, the more
ammunition you'll have to think of clever ways to code something that runs
faster (on average) than strcmp.

But as usual, beware of premature optimization.
-leor
 
T

Thomas Matthews

Peng said:
#define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)

I'm a beginner learning C. I can't see what *(a) == *(b) is for.
If either a or b is a null pointer, this may cause crash.
And if neither is a null pointer, only using strcmp((a), (b)) == 0 can
do the job, so *(a) == *(b) seems unnecessary.

Supposedly, the macro save the cost of a function call if the
first letters differ. The logical AND (&&) operator has a
short circuit that says if the first expression is false, the
other is not evaluated.

On many systems, the execution time saved by this expression
is negligble compared to the actual speed and observed speed
of a program. It is called premature optimization.

I would treat readablility as a more important criteria
and remove the macro. Just replace the macro with the
call to strcmp. After all, you would think that the strcmp
function would perform an optimal search or people wouldn't
use it.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
M

Michael Wojcik

I would treat readablility as a more important criteria
and remove the macro. Just replace the macro with the
call to strcmp.

This macro has different semantics than strcmp(). Most importantly, it
inverts the sense of the comparison. It can't just be replaced with a
call to strcmp, if you want the program logic preserved.

This is another fine example of why change for the sake of change is a
bad idea. I wouldn't recommend using a macro like this one, as it's
both obscuring and dangerous (the arguments are evaluated twice). But
changing it where it's used in existing code seems like a very poor
idea, since the maintainer might well make a mistake (such as simply
changing it to a call to strcmp).

It *might* arguably be a good idea to change the macro definition to

#define STREQ(a, b) (strcmp((a), (b)) == 0)

and get rid of the probably-pointless check of the initial characters,
to avoid any future maintenance introducing an argument with side
effects into a use of STREQ. (That's assuming that there aren't any
current uses of it where an argument with side effects needs to be
evaluated twice. I certainly hope that's the case.)
After all, you would think that the strcmp
function would perform an optimal search or people wouldn't
use it.

Most people use the standard library functions regardless of their
quality. I've seen plenty of little card games that use insufficient
rand implementations to shuffle the deck. But I'll agree that it's
poor practice to worry about the performance of the standard library
until you know it's important for your application. Few programs
need to worry about the performance of strcmp.
 
E

Eric Sosman

Michael said:
Most people use the standard library functions regardless of their
quality. I've seen plenty of little card games that use insufficient
rand implementations to shuffle the deck. But I'll agree that it's
poor practice to worry about the performance of the standard library
until you know it's important for your application. Few programs
need to worry about the performance of strcmp.

In two and a half decades of using C, I've encountered
exactly *one* machine on which strcmp's speed was a concern.
The machine has long since gone the way of the dinosaur. So
has the company that built it. So, too, has the company that
bought what was left of the first company. It's nice to
imagine that all this bloodshed was the result of poor QoI,
but there may have been one or two other factors ...

Just use strcmp().
 
K

kal

Régis Troadec said:
I think it might not be the case and that this macro is used in a context
where strings being compared differ very often in their first character,
otherwise, it's completely useless :)

In purely random cases where 7-bit ascii is used, the probability that
the first two characters of two strings will be the same is 1/128.

Even in alpha strings the probability is 1/26 or even only 1/52.

Analysis of the English language may reduce this probability somewhat
but, IMO, not less than 1/10.

So the code is likely to be efficient in all except a few rare cases.

To say that one should eschew such obtuse coding methods is, on the
other hand, an entirely different matter.
 
M

Mabden

kal said:
"Régis Troadec" <[email protected]> wrote in message
Even in alpha strings the probability is 1/26 or even only 1/52.
So the code is likely to be efficient in all except a few rare cases.

To say that one should eschew such obtuse coding methods is, on the
other hand, an entirely different matter.

Macro: #define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)

Assuming alphanumeric (1/62) comparing "this", with "that2", or "the other
thing3" would be doubling the comparisons since strcmp() is going to do it
as well. Is this a savings over the cost of a function call? Also, is it an
improvement worth all the programmer time talking about it?! You would have
to have a lot of very different strings being compared.

At least make the macro: #define STREQ(a, b) (*(a) == *(b) && strcmp((a+1),
(b+1)) == 0)

;-)
 
C

Chris Torek

At least make the macro: #define STREQ(a, b) (*(a) == *(b) && strcmp((a+1),
(b+1)) == 0)

;-)

Ignoring the line-wrap issue :) the problem is that this version
of the macro misbehaves:

a = "hello";
if (somecond())
a = "";
...
if (STREQ(a, ""))

What can we say about the values now passed to strcmp(), if strcmp()
is called? (Note that strcmp() is called only when *a == *b, and *b
is '\0'. Hence strcmp() is called only if *a == '\0', and the question
is about a+1 and ""+1.)
 
A

Arthur J. O'Dwyer

Macro: #define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)

Assuming alphanumeric (1/62) comparing "this", with "that2", or "the other

s/62/26/
You've got things backwards in a couple of places below, too. Must be
one of those days. :)
thing3" would be doubling the comparisons since strcmp() is going to do it
as well.

In 1/26 of the cases, yes. In 25/26 of the cases, no, strcmp will
never get called, because the initial characters will differ. Thus
we are trading the cost of (26 comparisons and one call to strcmp) for
the cost of (26 calls to strcmp). It's likely that this is a good
trade, I think, although as the cost of a function call gets cheaper,
it becomes less and less of a good trade.
At least make the macro: #define STREQ(a, b) (*(a) == *(b) && strcmp((a+1),
(b+1)) == 0)

Unfortunately, this code is broken. What happens during STREQ("","")?
[Rhetorical question. Answer: the code breaks.]

-Arthur
 
S

Sam Dennis

Mabden said:
Macro: #define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)

At least make the macro: #define STREQ(a, b) (*(a) == *(b) && strcmp((a+1),
(b+1)) == 0)

UB if both arguments are empty strings. If one must perform such a
de-optimisation, this seems preferable:

#define STREQ(a, b) (*(a) == *(b) && strcmp((a) + !!*(a), (b) + !!*(a)))
 
M

Mabden

Sam Dennis said:
UB if both arguments are empty strings. If one must perform such a
de-optimisation, this seems preferable:

#define STREQ(a, b) (*(a) == *(b) && strcmp((a) + !!*(a), (b) + !!*(a)))

Not familiar with "UB". I'll assume it means, "You're wrong!"

If the strings are both empty, they are both \0 and the final part of the
macro won't be tried.

If they are random pointers (i.e. uninitialized) then I guess it would be
?UB?, you bet.


Maybe someone could tell me what this means in English:

strcmp((a) + !!*(a), (b) + !!*(a))

My try: compare the pointer to a added to not not the value of a with the
pointer to b added to the not not value of a again?

I don't get it.
 
L

Leor Zolman

Not familiar with "UB". I'll assume it means, "You're wrong!"

Sort of...it means "undefined behavior". You really want to avoid doing
anything that invokes UB, because once you've done that, "all bets are
off". You've given the compiler carte blanche to generate code to do just
about anything at all, and not be out of compliance with the language
standard. UB may manifest as "doing the expected thing", which is actually
your worst nightmare...it means you'll end up shipping code that seems to
work for you but segfaults for your customer (or your customer's client,
or...)
If the strings are both empty, they are both \0 and the final part of the
macro won't be tried.

If both pointers point to the same value, whether that's a NUL or not, then
the expression *(a) == *(b) is true, and it will go on to evaluate the
strcmp call.
If they are random pointers (i.e. uninitialized) then I guess it would be
?UB?, you bet.

No, you get UB when you've advanced the pointers (formerly pointers to NUL
bytes) past the NUL, in the case where the NUL is actually the end of
whatever memory was allocated (which would typically be the case for
dynamically allocated strings). As soon as you make a pointer invalid, it
may as well just be a "random pointer" and you've got UB...before you even
attempt to dereference it. That was actually the subject of a Moby Thread
around here of late.
Maybe someone could tell me what this means in English:

strcmp((a) + !!*(a), (b) + !!*(a))

It's a rather cute trick. !!x produces 0 for any non-zero x, and 1 when x
is zero. Thus, the value of the first expression above (I'll dispense with
the macro-motivated parens for now) is either a (if *a is NUL) or a + 1 (if
*a is not NUL).
-leor
 
L

Leor Zolman

It's a rather cute trick. !!x produces 0 for any non-zero x, and 1 when x
is zero.

I got it backwards. !!x produces 0 for x equal to 0, and 1 for non-zero x.
The part below I got right ;-)
-leor
 
C

CBFalconer

Mabden said:
.... snip ...

Maybe someone could tell me what this means in English:

strcmp((a) + !!*(a), (b) + !!*(a))

if (a) points to an non-empty string (i.e. *(a) != '\0') then
compare the string pointed to by (a + 1) against that pointed to
by (b). Else compare the strings (a) and (b).

The key is that the operator !! converts a non-zero value to 1,
and a zero value is left alone.

It would seem to make more sense if the operator was plain !
 
A

Allin Cottrell

CBFalconer said:
Mabden wrote:

... snip ...



if (a) points to an non-empty string (i.e. *(a) != '\0') then
compare the string pointed to by (a + 1) against that pointed to
by (b). Else compare the strings (a) and (b).

I think you forgot to advance b in case *a != '\0'. In non-
obfuscated C the original translates to (wrapping it in a
function for completeness):

int weird_strcmp (const char *a, const char *b)
{
if (*a != '\0') {
a++;
b++;
}
return strcmp(a, b);
}

Allin Cottrell
 
S

Sam Dennis

CBFalconer said:
if (a) points to an non-empty string (i.e. *(a) != '\0') then
compare the string pointed to by (a + 1) against that pointed to
by (b)

.... + 1
It would seem to make more sense if [the check was inverted]

So you think that reading beyond the end of both strings makes sense?
For the example I gave (a = "", b = "")?
 
J

J. J. Farrell

Mabden said:
Not familiar with "UB". I'll assume it means, "You're wrong!"

It means Undefined Behaviour.
If the strings are both empty, they are both \0 and the final part of the
macro won't be tried.

If that were the case then STREQ("Hello", "Harry") would give 1.
strcmp() is only called if the first characters of the string are
equal, as in STREQ("", "").
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,142
Messages
2,570,819
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top