Catching invalid string comparisons

  • Thread starter Ulrich Eckhardt
  • Start date
U

Ulrich Eckhardt

Greetings!

I guess many of you have already seen code (especially from C++ programmers)
like the following:

String s = someFunction();
if(s == "0")
doThis();
else
doThat();

I wonder, is there a way to catch such comparisons where a non-fundamental
type is compared to a constant? I'm actually looking for a way to avoid
errors that occur due to this, maybe a warning when a comparison is done
for identity where one of the two sides required boxing (Is that the term?)
first.

Further, I wonder about this case:

Integer n = otherFunction();
if(n == 42)
dontPanic();

Other than the above, I haven't actually tried it, but I would expect this
code to also not do what a naive programmer would expect.


Just for the record, in case you didn't spot the problem in above code, this
is how the above should be written:

if(s.equals("0")) ...
if(n.equals(42)) ...
// Note: I'm not sure about the latter.

The reason is that '==' determines identity for objects while it determines
equality for builtin types like 'int' (please correct me if I'm wrong or if
this explanation is misleading in any way).

thanks

Uli
 
U

Ulrich Eckhardt

Lew said:
There is no boxing with Strings, and String is a pretty fundamental type,
so I'm not quite sure I get these comments.

Bear with me, I probably didn't express myself correctly because I'm not
that fluent in Java.

The problem I'm facing is this:

String s1 = "foo";
String s2 = s1.substring(1,1); // yields empty string!
if(s2 == "")
System.out.println("s2 is empty");
else
System.out.println("s2 is not empty");

This code actually checks if s2 is identical to an empty string literal, but
the programmer expected it to check if it is equal to an empty string.
Anyhow, it's usually a mistake to compare String values with ==, except
when comparing against null. It's such a common mistake that IDEs
include an option to warn you of it.


What would a naive programmer expect? What do you expect this snippet to
do?

Actually, I would expect this to behave just like the code using a string
above, i.e. that it checks if 'n' is identical to the integer literal '42'
and not if they are equal, but that is because I know that Java makes this
distinction between identity and equality.

Note that I find this behaviour a bit confusing, because if I had written it
like this:

int n = yetAnotherFunction();
if(n == 42)
doPanic();

really checks if 'n' is equal to '42', right?

The 's' case is correct, but the 'n' case is just silly.

Why silly? I don't get it. To me it's the same principle in both cases, in
order to compare an object (like String or Integer) for equality with a
constant, you use the equals() function and _not_ ==, which would check for
identity.


Uli
 
B

blmblm

[ snip ]
The problem I'm facing is this:

String s1 = "foo";
String s2 = s1.substring(1,1); // yields empty string!
if(s2 == "")
System.out.println("s2 is empty");
else
System.out.println("s2 is not empty");

This code actually checks if s2 is identical to an empty string literal, but
the programmer expected it to check if it is equal to an empty string.


Actually, I would expect this to behave just like the code using a string
above, i.e. that it checks if 'n' is identical to the integer literal '42'
and not if they are equal, but that is because I know that Java makes this
distinction between identity and equality.

Note that I find this behaviour a bit confusing, because if I had written it
like this:

int n = yetAnotherFunction();
if(n == 42)
doPanic();

really checks if 'n' is equal to '42', right?



Why silly? I don't get it. To me it's the same principle in both cases, in
order to compare an object (like String or Integer) for equality with a
constant, you use the equals() function and _not_ ==, which would check for
identity.

To me it makes better sense if you realize that the variable s is
not actually a String object, but a *reference to* a String object.
"==" works on such variables exactly the way it works on primitives --
compares for equality -- but two (non-null) references are equal
if, well, they're equal -- which for references means that they
point to the same object. Comparing the objects themselves,
rather than the references, requires use of the equals() method.

Does that help? If you're coming from C++ this behavior will
likely seem baffling at first, but IMO it makes good sense if
you realize that non-primitive variables in Java are references
to objects rather than objects.
 
R

Roedy Green

I wonder, is there a way to catch such comparisons where a non-fundamental
type is compared to a constant? I

I'm sure others will give you all sorts of suggestions, but the way I
do it is using the code inspector (lint) that comes with Intellij
Idea. That is just one of thousands of errors or suspect codings it
catches.
 
R

Roedy Green

Integer n = otherFunction();
if(n == 42)
dontPanic();

Other than the above, I haven't actually tried it, but I would expect this
code to also not do what a naive programmer would expect.

If you need boxing and you are using an old Java, the compiler will
complain. If you need boxing and you are using a new compiler it will
autobox for you. So don't worry about that.
 
M

Mark Space

Matt said:
I think what you're saying is that (for the benefit of those less
experienced with Java) you would like the IDE to highlight cases where == is

I think what he's say is that he'd like the Java compiler to do this,
emit warnings when there's questionable use of == operator. If the IDE
wants to do it also, that's great.
 
M

Mark Space

To me it makes better sense if you realize that the variable s is
not actually a String object, but a *reference to* a String object.

I agree. 'if ( s == "" )' checks to see if s is the same object as the
null string object that the compiler/runtime has created to represent
your literal. This could actually work, if s has been interned.

String s1 = "foo";
String s2 = s1.substring(1,1); // yields empty string!
s2 = s2.intern();
if(s2 == "")
System.out.println("s2 is empty");
else
System.out.println("s2 is not empty");

I believe this will work (I didn't test it). == compares object
references. It doesn't check to see if two objects are "identical" it
checks to see if they are in fact the same object.

Basically, == does a bit-wise check for identity on any primitive. (Or
at least that's how I think of it, it might actually work differently
internally.) In this respect, comparing references in Java is exactly
like comparing pointers in C.
 
M

Mark Jeffcoat

Ulrich Eckhardt said:
I guess many of you have already seen code (especially from C++ programmers)
like the following:

String s = someFunction();
if(s == "0")
doThis();
else
doThat();

This (anti-)pattern is common enough that most static
analyzers will check for it. I've had great experiences
with FindBugs (findbugs.sourceforge.net) and PMD
(pmd.sourceforge.net).
 
U

Ulrich Eckhardt

To me it makes better sense if you realize that the variable s is
not actually a String object, but a *reference to* a String object.

Yes, this is a better way to express it.
"==" works on such variables exactly the way it works on primitives --
compares for equality -- but two (non-null) references are equal
if, well, they're equal -- which for references means that they
point to the same object. Comparing the objects themselves,
rather than the references, requires use of the equals() method.

Right, I understood that.
If you're coming from C++ this behavior will likely seem baffling
at first, but IMO it makes good sense if you realize that non-primitive
variables in Java are references to objects rather than objects.

Yes, I had to wrap my head around that first, but it makes sense in Java way
of doing things.

However, there is one thing which I don't understand yet:

Integer n = new Integer(42);

'n' is now a reference to an Integer object. So, if I do this:

if(n==42) ...

it should check if 'n' refers to the same object as '42' here. However, it
doesn't, it actually checks if the Integer that 'n' refers to has the same
value, which - to me - seems inconsistent with the String case. Is there a
way to explain this inconsistency so that it makes sense again? Or do I
just have to learn the rules for each type?

thanks

Uli
 
U

Ulrich Eckhardt

Mark said:
I agree. 'if ( s == "" )' checks to see if s is the same object as the
null string object that the compiler/runtime has created to represent
your literal. This could actually work, if s has been interned.

String s1 = "foo";
String s2 = s1.substring(1,1); // yields empty string!
s2 = s2.intern();
if(s2 == "")
System.out.println("s2 is empty");
else
System.out.println("s2 is not empty");

I believe this will work (I didn't test it).

In a test, this printed "s2 is not empty", because 's2' doesn't refer to the
same object as '""' does. However, I could well imagine that a smarter
compiler replaces all references to an empty string with references to the
same string, which could then result in the other output.

Uli
 
A

Andreas Leitgeb

Ulrich Eckhardt said:
However, there is one thing which I don't understand yet:
Integer n = new Integer(42);
'n' is now a reference to an Integer object. So, if I do this:
if(n==42) ...

If you compiled this, and ran javap on it, you'd see
that n is really unboxed for the comparison, so you
get a comparison of two int-primitives.

If you had two Integer objects, then the comparison would
be strictly on the references. So you will get "true" or
"false" from the comparison, basically depending on whether
the references just point to the same object anyway (due to
previous copying of the reference), or if their "box" was
autogenerated *and* the value was within byte-range.

Integer n=42, m=42; // ==> m==n
Integer n=new Integer(42), m=<whatever>(42); // ==> m!=n
Integer n=4200, m=4200; // ==> m!=n

For those not wishing to play lottery, I recommend .equals(),
or explicit cast to primitives.
 
A

Andreas Leitgeb

If you compiled this, and ran javap on it, you'd see
that n is really unboxed for the comparison, so you
get a comparison of two int-primitives.

I forgot the other question...
Since Strings are not just boxes wrapped around primitives,
this logic of course does not apply at all to Strings.

The logic of comparing two String objects does share some
of the peculiarities of comparing numeric box-objects, by
which I mean, that references "sometimes" are equal for
equal Stings, even if one is not a direct copy of the other
reference. For String, however, this is more explicit
and (imo) easier understandable and even somewhat controllable
through it's method intern() - in contrast you can't enforce
a canonical Integer-object for values outside the byte-range.
 
B

blmblm

Yes, this is a better way to express it.


Right, I understood that.

Just for the record -- I wasn't meaning to imply that you didn't,
just restating it as part of an attempt to suggest a perspective
that might help it make (more) sense.
Yes, I had to wrap my head around that first, but it makes sense in Java way
of doing things.

However, there is one thing which I don't understand yet:

Integer n = new Integer(42);

'n' is now a reference to an Integer object. So, if I do this:

if(n==42) ...

it should check if 'n' refers to the same object as '42' here. However, it
doesn't, it actually checks if the Integer that 'n' refers to has the same
value, which - to me - seems inconsistent with the String case. Is there a
way to explain this inconsistency so that it makes sense again? Or do I
just have to learn the rules for each type?

Others have explained about what's actually happening in that case
(autoboxing).

Maybe this one is easier to understand if you think "well, there's
no sensible way to directly compare n, which is a reference to
an Integer, with the integer primitive 42, so the compiler puts
in code to perform appropriate conversions" ?

Very much off the top of my head, but I feel sure someone will correct
me if I've misspoken. :)
 
M

Mark Space

Ulrich said:
In a test, this printed "s2 is not empty", because 's2' doesn't refer to the
same object as '""' does. However, I could well imagine that a smarter
compiler replaces all references to an empty string with references to the
same string, which could then result in the other output.

Um, ok so I tested it myself.

init:
deps-jar:
compile:
run:
s2 is empty
BUILD SUCCESSFUL (total time: 1 second)


What the heck are you using to test? Which compiler and runtime? This is
runtime thing however, not a compile thing btw.

Do you mean my code snippet exactly or another test? Are there any
ClassLoaders involved besides the bootstrap one? I'm still somewhat
surprised. Could you test my snippet exactly? Or maybe your runtime is
buggered-up.
 
U

Ulrich Eckhardt

Mark said:
s2 is empty
BUILD SUCCESSFUL (total time: 1 second)


What the heck are you using to test? Which compiler and runtime? This is
runtime thing however, not a compile thing btw.

Argh, I didn't actually run your snippet but rather mine, which lacks the
call

s2 = s2.intern();

and that makes a big difference! ;)

sorry for the confusion

Uli
 
M

Mark Space

Ulrich said:
sorry for the confusion

Ah, ok. To be fair, intern() does use a pool of strings, which I assume
is limited in capacity. So a call to intern() could possibly fail, or
bump an older string out of the pool. Something to be wary of ....
 
R

Roedy Green

It's such
a common mistake that IDEs include an option to warn you of it.

You can use == with interned strings, but strings conceivably might
become uninterned via serialisation. It is safe to use equals all the
time, even with interned Strings, though a little slower. You still
get a quick == match when the strings are interned as part of the
equals check.
 
A

Andreas Leitgeb

Lew said:
One thing that irritates me about the IDE feature to check for == (or != ) in
lieu of equals() comparisons is that I've seen it flag
if ( s == null )
as a violation.

I'd consider it a bug in the particular IDE.
It renders the whole feature almost useless.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top