Improving String.equals() implementation

  • Thread starter softwarepearls_com
  • Start date
J

John W Kennedy

Lew said:
J. Stoever said:
You must be using some buggy broken OS then. When normal people set up
Eclipse on a working, professional OS, it works just find without me
having to tell it where anything is.

Cute. Irrelevant and misleading, but rhetorically quite cute.

The OSes we use are Windows XP and Sun Solaris. They are probably the
same versions that you call "working [and] professional". The OS has
nothing whatsoever to do with it, and it's strange to focus on that.

It has to do with the fact that the projects require certain versions of
Java, perhaps obsolete ones or ones that are newer than the what comes
with Eclipse, or ones with a specific extensions configuration. Since
those versions do not come with Eclipse, and Eclipse, despite your
apparent impression of its godlike superpowers, is not telepathic and
cannot infer without input where those JDKs are.

It can do all that -- choose one of multiple JDKs, and set the internal
compiler to one of multiple Java versions. And, of course, the choice is
set-and-forget per project.

And, at least on Windows and MacOS, the only time I ever had to fuss
with telling Eclipse where a JDK is was when the JDK was installed after
Eclipse was.
--
John W. Kennedy
"Those in the seat of power oft forget their failings and seek only
the obeisance of others! Thus is bad government born! Hold in your
heart that you and the people are one, human beings all, and good
government shall arise of its own accord! Such is the path of virtue!"
-- Kazuo Koike. "Lone Wolf and Cub: Thirteen Strings" (tr. Dana Lewis)
 
Z

zerg

Tom said:
Ah, ok. That's what I get for posting before checking the docs....

But:

public boolean equals(Object obj) {
try {
if (this == obj) return true ; // controversial!
String that = (String)obj ;
if (this.length != that.length) return false ;
for (int i = 0 ; i < this.length ; ++i)
if (this.chars != that.chars) return false ;
return true ;
} catch (ClassCastException e) {
return false ;
} catch (NullPointerException e) {
return false ;
}
}

In the case of comparing to an actual String, this should be quite fast
- i don't think there's any per-execution overhead to setting up a try
block. When comparing to null or a non-String, though, there will be a
throw-catch, which is disproportionately expensive.


Pardon me, but ...

Yuck.

This isn't any kind of savings at all, since the compiler still inserts
the equivalent bytecode to the instanceof test into the method before
the cast. In fact, there are separate tests for null and for not
instanceof String, as if you wrote:

if (this == null) throw new NullPointerException();
if (!this instanceof String) throw new ClassCastException();
String that = (String) obj;

which may actually be worse than

if (this == null) return false;
if (!this instanceof String) return false;
String that = (String) obj;

since in the latter case, the compiler may be able to optimize away the
first statement, with static analysis telling it that the second
statement would return false anyway in that case and that the first
statement has no side effects. I very much doubt it will figure out that
the semantics are the same removing the null test in the
exception-throwing case, even though the catch blocks are in the same
method, unless the optimizer's coders were downright heroic in their
skills and efforts.

Besides the lack of actual efficiency improvement, there's also the
aesthetic/style/clunkiness/readability/maintainability matter of the
gross misuse of exceptions.

I believe I've already mentioned, in another thread, my company's
employment contract's "early termination" clause for anyone who writes
code like that. :)
 
Z

zerg

Lew said:
This might be related, perhaps:
(JLS 4.1)

I do not know if the JVM has a way to use that for its 'instanceof'
comparison.

There's an obvious way to make instanceof efficient, if writing a JVM.

The null reference is presumably a particular pointer value. It does not
have to be zero, or an illegal address to the underlying OS and machine
architecture. So it could be a memory address within the JVM's address
space. Now make the object layout start with a class ID of some sort,
and allocate at the "null" address four bytes at which an exceptional
class ID value, otherwise unused, is written.

Now instanceof probably looks something like:

if (object == null) return false;
if (object.getClass() == targetClass) return true;
return object.getClass().extends(targetClass);

in terms of semantics and structure. The above allows this rearrangement:

if (object.classID == targetClass.id) return true;
if (object == null) return false;
return object.getClass().extends(targetClass);

In the important case that the object IS an instance of the target
class, and of that exact class, this returns true immediately -- the
null test doesn't happen first. It happens next, though, since it's
quicker to check for null and return false immediately than to go
chasing down the inheritance tree determining that null's bogus classID
is not a subclass of targetClass.

The classID would tend to be a pointer to a data structure representing
information such as superclass and inherited interfaces, among other
things. Making the null classID a legitimate pointer to such a structure
indicating it to lack a superclass or any inherited interfaces lets you
pare it down to:

if (object.classID == targetClass.id) return true;
return object.getClass().extends(targetClass);

Now the "extends" test boils down to if targetClass is an interface,
look at the class structure pointed to by object.classID and see if it's
in the interfaces list, then walk the superclass chain looking at their
interfaces lists; for efficiency, the latter might be avoided at the
cost of more memory used per loaded class by storing a list of ALL
implemented interfaces with each class. If targetClass is not an
interface, only the superclass, its superclass, and so on until the
chain terminates need be tested.

The interfaces list is empty and the superclass is null for "null" in
this scenario, so the null reference produces a "false" quite quickly.
Not quite as quickly as with an explicit null test, but quickly enough
that the loss is outweighed by the benefit of increased speed in the
frequent case that the instanceof test will return false and the object
reference isn't null.

On a side note, it's interesting that "x instanceof y == false" and "y z
= (y)x throws a ClassCastException" are not always either both true or
both false. If x is null, the first is true and the second is false.
 
Z

zerg

RedGrittyBrick said:
AFAIK Sun's javac does not support incremental compilation. At least not
in the way that Eclipse does.


Where did that conjecture come from? Perhaps more reliable!


As you know, you can set the language version in Eclipse.


As you know, Eclipse supports all that.


Unless you consider incremental compilation an idiosyncracy, I don't
know what you mean. Please elucidate.

Eclipse versus NetBeans: the new editor wars.

Tune in next week for Episode II: Attack of the Beans.
(And, of course, The Phantom Reference is currently in preproduction and
Revenge of the Generics, A New Bug, Eclipse Strikes Back, and Return of
the Ant Scripts are promised in the very near future.)
 
M

Mark Thornton

zerg said:
The null reference is presumably a particular pointer value. It does not
have to be zero, or an illegal address to the underlying OS and machine
architecture. So it could be a memory address within the JVM's address
space. Now make the object layout start with a class ID of some sort,
and allocate at the "null" address four bytes at which an exceptional
class ID value, otherwise unused, is written.

JVM's gain some efficiency by zeroing memory in bulk before it is
allocated. This means that the required 'initialisation' of an object's
fields doesn't have to be done individually. Using a non zero value for
null would require reference fields to be initialised individually at
greater cost.

Mark Thornton
 
T

Tom Anderson

Tom said:
Eric Sosman wrote:
Mark Space wrote:
John W Kennedy wrote:

And don't forget the null check, class check, and length check that all
have to come before the data check.

Why check for null? Throw a NPE....

Violates the equals() contract, that's why: thing.equals(null)
should yield false and throw no exception.

Ah, ok. That's what I get for posting before checking the docs....

But:

public boolean equals(Object obj) {
try {
if (this == obj) return true ; // controversial!
String that = (String)obj ;
if (this.length != that.length) return false ;
for (int i = 0 ; i < this.length ; ++i)
if (this.chars != that.chars) return false ;
return true ;
} catch (ClassCastException e) {
return false ;
} catch (NullPointerException e) {
return false ;
}
}

In the case of comparing to an actual String, this should be quite fast - i
don't think there's any per-execution overhead to setting up a try block.
When comparing to null or a non-String, though, there will be a
throw-catch, which is disproportionately expensive.


Pardon me, but ...

Yuck.


Oh, absolutely! I was merely trying to show that a NullPointerException
could be used to handle the null condition.
This isn't any kind of savings at all, since the compiler still inserts
the equivalent bytecode to the instanceof test into the method before
the cast.

I wasn't claiming it was faster than the sensible way of doing it - just
that in the common (hopefully) case, it wouldn't be significantly slower.
In fact, there are separate tests for null and for not instanceof
String, as if you wrote:

if (this == null) throw new NullPointerException();
if (!this instanceof String) throw new ClassCastException();
String that = (String) obj;

which may actually be worse than

if (this == null) return false;
if (!this instanceof String) return false;
String that = (String) obj;

since in the latter case, the compiler may be able to optimize away the
first statement, with static analysis telling it that the second
statement would return false anyway in that case and that the first
statement has no side effects. I very much doubt it will figure out that
the semantics are the same removing the null test in the
exception-throwing case, even though the catch blocks are in the same
method, unless the optimizer's coders were downright heroic in their
skills and efforts.

On the other hand, it might be able to remove the null check from the
instanceof implementation, if it inlines enough of it, which would boil
down to much the same thing.

The real problem would be the overhead of allocating and throwing the
exceptions - as you say, i don't think the compiler will optimise those
away.
Besides the lack of actual efficiency improvement, there's also the
aesthetic/style/clunkiness/readability/maintainability matter of the
gross misuse of exceptions.

I don't think i'd call it gross misuse. It might not be to your taste, or
indeed to the vast majority of people's taste, but there's a difference
between that and 'gross misuse'.

I think there is a perfectly valid philosophy of coding which says you
should write straight-line code on the assumption that everything will
work, and then write exception handlers to deal with the case where it
doesn't. This approach leads to very concise, simple code describing the
common case, and well-defined blocks of code describing how to deal with
the exceptions. In this case, the expectation is for a String as an
argument, and the exceptions are nulls and things which aren't Strings.

You could argue that in this case, nulls and non-strings aren't
exceptional, they're part of the normal input that the method handles, in
which case this style would be inappropriate by virtue of asymmetry. I
don't think i'd agree with that myself.

Anyway, i actually think this approach is stylistically preferable to the
nested if approach:

if (obj == this) {
return true ;
} else {
if (obj != null) {
if (obj instanceof String) {
String that = (String)obj ;
if (this.length == that.length) {
for (int i = 0 ; i < this.length ; ++i)
if (this.chars != that.chars) return false ;
return true ;
} else {
return false ;
}
} else {
return false ;
}
} else {
return false
}
}

Although of course that's a bit of a strawman, because it could be written
more simply with else-ifs and some swapping-round:

if (obj == this) {
return true ;
} else if (obj == null) {
return false ;
} else if (!obj instanceof String) {
return false ;
} else {
String that = (String)obj ;
if (this.length != that.length) {
return false ;
} else {
for (int i = 0 ; i < this.length ; ++i)
if (this.chars != that.chars) return false ;
return true ;
}
}

Or, better yet, guard clauses:

if (obj == this) return true ;
if (obj == null) return false ;
if (!obj instanceof String) return false ;
String that = (String)obj ;
if (this.length != that.length) return false ;
for (int i = 0 ; i < this.length ; ++i)
if (this.chars != that.chars) return false ;
return true ;
I believe I've already mentioned, in another thread, my company's
employment contract's "early termination" clause for anyone who writes
code like that. :)

Then i'm rather glad i work for a different company!

Or rather, i hope i do ...

tom
 
S

softwarepearls_com

In short: it's a cheap test that allows one to bypass a very expensive test.

Like I showed with the benchmark program (elsewhere in this thread),
this argument is entirely data dependent. It all depends on your
Strings whether it's valid to want to have the special case test
there, or not. If you're trying to match a String in a large body of
Strings, the test is counterproductive (see benchmark results).

I'm just wondering whether Sun engineers go to the trouble of
profiling their API usage, say, like Intel or AMD profile tons of
real-life applications to see how their instructions are being used
(statistically). It would be nice to think that Sun has done such
profiling for core, critical classes like String.

BTW, the Strings that I'm having to pass through equals() are
generated by a third party, and are unfortunately not interned.
 
Z

zerg

softwarepearls_com said:
If you're trying to match a String in a large body of
Strings, the test is counterproductive (see benchmark results).

If you're trying to match a String to a large body of strings, not using
a HashMap/HashSet is counterproductive.

Using one results in there being very few String equals() call --
typically none for a non-matching String and one for a matching one, in
the latter case one that returns "true", so compares every character if
the references aren't equal. Then the references being equal is of
greater likelihood and the reference equality test consumes
proportionately less of the total time used than would otherwise be the
case.
BTW, the Strings that I'm having to pass through equals() are
generated by a third party, and are unfortunately not interned.

Once your app gets ahold of a String reference, does that reference tend
to be used for many comparisons, and mainly with constants or with other
references that tend to be used for many comparisons? If so, and if
there won't typically be an endless variety of different String values,
then interning them when you get them makes sense. If the incoming
references tend to be used for only one comparison each, though, or a
huge and bewildering array of interned Strings would accumulate, then it
doesn't.
 
J

John W Kennedy

Mark said:
JVM's gain some efficiency by zeroing memory in bulk before it is
allocated.

Especially when the operating system has already it automatically.
This means that the required 'initialisation' of an object's
fields doesn't have to be done individually. Using a non zero value for
null would require reference fields to be initialised individually at
greater cost.






--
John W. Kennedy
"Give up vows and dogmas, and fixed things, and you may grow like
That. ...you may come to think a blow bad, because it hurts, and not
because it humiliates. You may come to think murder wrong, because it
is violent, and not because it is unjust."
-- G. K. Chesterton. "The Ball and the Cross"
 
L

Lothar Kimmeringer

Tom said:
But:

public boolean equals(Object obj) {
try {
if (this == obj) return true ; // controversial!
String that = (String)obj ;
if (this.length != that.length) return false ;
for (int i = 0 ; i < this.length ; ++i)
if (this.chars != that.chars) return false ;
return true ;
} catch (ClassCastException e) {
return false ;
} catch (NullPointerException e) {
return false ;
}
}

In the case of comparing to an actual String, this should be quite fast -
i don't think there's any per-execution overhead to setting up a try
block.


There is none.
When comparing to null or a non-String, though, there will be a
throw-catch, which is disproportionately expensive.

In most cases, the programmer should already be sure that two
Strings are compared. If you would build something like

if (!(obj instanceof String)) return false;
String cmp = (String) obj;

you have in fact two checks of instanceof, which will let you
lose time for all the other calls of equals. It's the same
pattern you use when working with maps. Instead of

if (!map.containsKey(thekey)){
return false;
}
Object value = map.get(thekey);
.... work with value

you should use

Object value = map.get(thekey);
if (value == null){
return false;
}
.... work with value

to avoid the call of get two times (with corresponding calls
of equals of the key-class which might be expensive as well).


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
L

Lothar Kimmeringer

Lew said:
And I'm still puzzled about Eclipse's compilation. When I set up
Eclipse, I have to tell it where my JDKs are, and I can have it use
any one installed on my system.

Actually you don't need an istalled JDK, a JRE is doing fine
as well. I assume that "softwarepearls" is simply copying the
classes/bin-directory and calls that deploying.

Your (and my) mileague might differ, but it is a working way.
No way that can be called stable but it's working. As soon as
you want to introduce automatic production tools like Ant,
you need a JDK.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
L

Lothar Kimmeringer

Eric said:
Just for fun, I tried to measure the slowdown with a simple
test. On my machine, ten thousand calls to a method that used
instanceof-plus-cast took about 0.104 ms longer than calling a
method that just cast (successfully) without testing.

Let's see: 0.104 ms per ten thousand calls, that's a potential
savings of 10.4 ns per instanceof test ...

That might be the reason why the equals-method of String in
the sources of Java 1.6 is actually doing the instance-of-check
(I had a look after my previous posting).


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
T

Tom Anderson

Lothar said:
In most cases, the programmer should already be sure that two
Strings are compared. If you would build something like

if (!(obj instanceof String)) return false;
String cmp = (String) obj;

you have in fact two checks of instanceof, which will let you
lose time for all the other calls of equals. [...]

Just for fun, I tried to measure the slowdown with a simple
test. On my machine, ten thousand calls to a method that used
instanceof-plus-cast took about 0.104 ms longer than calling a
method that just cast (successfully) without testing.

Let's see: 0.104 ms per ten thousand calls, that's a potential
savings of 10.4 ns per instanceof test ...

I'm really surprised by that. I'd assumed that the compiler would optimise
away the second test completely.

You did all the right magic to let HotSpot do its thing? Putting in its
own method (yes, from youe description), doing some warm-up runs first?
I'm sure you did, but i thought i'd check.

What did the rest of the method do? Did the variable then get used? Could
the test-cast version be doing a null check that got optimised away in the
cast-only version?

tom
 
L

Lew

Eric said:
     /** A trivial class with a "safe" equals() method */
     private static class Safe {
         private final int value;
         private Safe(int value) {
             this.value = value;
         }

         public boolean equals(Object obj) {
             return (obj instanceof Safe) && (value == ((Safe)obj).value);
         }
     }

     /** A trivial class with a "fast" equals() method. */
     private static class Fast {
         private final int value;
         private Fast(int value) {
             this.value = value;
         }

         public boolean equals(Object obj) {
             return (value == ((Fast)obj).value);
         }
     }

}
I think I caught the obvious gotchas.

The behaviors of the two 'equals()' implementations are not
equivalent.

A fair benchmark would compare methods with the same behaviors.
 
L

Lothar Kimmeringer

Lew said:
The behaviors of the two 'equals()' implementations are not
equivalent.

A fair benchmark would compare methods with the same behaviors.

Only if you use null as value for testing. This was not the
case (and wasn't the topic that lead to this testcase).


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
L

Lothar Kimmeringer

Lew said:
That is not correct. A comparison with a non-null object in the
'Fast#equals()' can throw an exception.

OK, but again: The testcase was only to find out the difference
between a check with an instanceof and without under the assumtion,
that the object to be checked is of correct type and not null.

So the methods are OK for that specific case.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
A

Arne Vajhøj

Lew said:
And I'm still puzzled about Eclipse's compilation. When I set up
Eclipse, I have to tell it where my JDKs are, and I can have it use
any one installed on my system. I routinely have it use Sun's at my
work site. Are you saying that Eclipse still doesn't use the JDK's
compiler in that circumstance?

Yes.

It needs an rt.jar but it uses its own compiler instead of
javac.exe.
If so, yet another reason to detest Eclipse.

Why ?

I don't care particular much about what Java compiler is being used.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top