Infinite loops in hashCode() and equals()

M

Mike Schilling

What do you think the following fragment prints?:

List l = new ArrayList();
l.add(l);
System.out.println(l.hashCode());

Trick question. The result is an infinite recursion (i.e. a stack
overflow), since the hash code for a List is calculated from the hash codes
of its members.

Likewise, the following causes an infinite recursion, since lists are
defined to be equal when they have the same members.

List l1 = new ArrayList();
List l2 = new ArrayList();

l1.add(l2);
l2.add(l1);

System.out.println(l1.equals(l2));

These cases are simple to spot, but the same situation could occur with any
cyclic graph of objects that compute hash code and/or equality based on the
values of their members. Also, equals() and hashCode() are often called
implicitly, for instance when adding an object to a HashSet or HashMap.

I don't see any way this could be addressed other than by keeping track of
which objects have already been processed by the current hashCode() or
equals() call, and there's no way to do that without changing the method
signatures, to, say, add a Set:

in Object:

public final int hashCode() {
return hashCode(new HashSet());
}

public final int hashCode(Set processed)
{
if (processed.contains(this)) {
return System.identityHashCode(this);
}
else {
processed.add(this);
return this.calculateHashCode(processed);
}
}

public int calculateHashCode(processed) {
return return System.identityHashCode(this);
}

with calculateHashCode(Set) being the method that classes can override and
hashCode(Set) the method List et al. call on their members. And that is a
pipe dream, since it isn't possible to make this sort of incompatible change
at this late date.
 
J

Jesper Nordenberg

Mike Schilling said:
What do you think the following fragment prints?:

List l = new ArrayList();
l.add(l);
System.out.println(l.hashCode());

Trick question. The result is an infinite recursion (i.e. a stack
overflow), since the hash code for a List is calculated from the hash codes
of its members.

Likewise, the following causes an infinite recursion, since lists are
defined to be equal when they have the same members.

List l1 = new ArrayList();
List l2 = new ArrayList();

l1.add(l2);
l2.add(l1);

System.out.println(l1.equals(l2));

These cases are simple to spot, but the same situation could occur with any
cyclic graph of objects that compute hash code and/or equality based on the
values of their members. Also, equals() and hashCode() are often called
implicitly, for instance when adding an object to a HashSet or HashMap.

Hmmm, I can't think of a case where this problem would occur, so I
don't think there is a need for a solution. If object A reference
another object B you don't include B in A.equals() (or A.hashCode()),
unless B is part of A (composition). And in that case B is clearly not
a part of A, so there is no cycle. Can you provide a real world
example where this problem occurs?

/Jesper Nordenberg
 
A

Andy Fish

Jesper Nordenberg said:
"Mike Schilling" <[email protected]> wrote in message

Hmmm, I can't think of a case where this problem would occur, so I
don't think there is a need for a solution. If object A reference
another object B you don't include B in A.equals() (or A.hashCode()),
unless B is part of A (composition). And in that case B is clearly not
a part of A, so there is no cycle. Can you provide a real world
example where this problem occurs?

Although mike's examples aren't necessarily real-world ones, I don't see any
reason they couldn't appear in real life

having an arrayList contain itself seems a perfectly reasonable thing to do
when manipulating complex data structures (though the only example I can
think of off hand is trying to compute an empirical solution to russell's
paradox ;-). And computing the hashcode of an ArrayList also seems entirely
reasonable.

I would say it's a bug in the implementation of ArrayList.HashCode()
 
G

Guest

Andy said:
I would say it's a bug in the implementation of ArrayList.HashCode()

The Java List specification says:
Note: While it is permissible for lists to contain
themselves as elements, extreme caution is advised:
the equals and hashCode methods are no longer
well defined on a such a list.

- Dario
 
J

John C. Bollinger

Andy said:
Although mike's examples aren't necessarily real-world ones, I don't see any
reason they couldn't appear in real life

having an arrayList contain itself seems a perfectly reasonable thing to do
when manipulating complex data structures (though the only example I can
think of off hand is trying to compute an empirical solution to russell's
paradox ;-). And computing the hashcode of an ArrayList also seems entirely
reasonable.

I would say it's a bug in the implementation of ArrayList.HashCode()

I would say that the following excerpt from List's class-level Javadocs
is precisely to the point:

====
Note: While it is permissible for lists to contain themselves as
elements, extreme caution is advised: the equals and hashCode methods
are no longer well defined on a such a list.
====

List defines the algorithms for List implementations' hashCode() and
equals() methods in terms of their elements, which significantly eases
some applications. This is purposeful. I think it is in general _not_
a very reasonable thing to do to add a List to itself, and I would be
very interested in even a hypothetical scenario where it would make
sense. I would be especially interested in such a scenario where in
addition there is no good workaround. Until I see such I'll agree with
Jesper that this looks like a theoretical problem that need not occur in
practice.


John Bollinger
(e-mail address removed)
 
M

Mike Schilling

List defines the algorithms for List implementations' hashCode() and
equals() methods in terms of their elements, which significantly eases
some applications. This is purposeful. I think it is in general _not_
a very reasonable thing to do to add a List to itself, and I would be
very interested in even a hypothetical scenario where it would make
sense. I would be especially interested in such a scenario where in
addition there is no good workaround. Until I see such I'll agree with
Jesper that this looks like a theoretical problem that need not occur in
practice.

First, note that this behavior doesn't depend upon a List being a member of
itself. Create a graph of objects that have value-like semantics for
equality (and thus for hash code). This includes all Lists, Sets and Maps.
If there are any cycles in the graph, the result of hashCode() is an
infinite loop. Arguing that there will never be cycles in a well-designed
application is a bit like arguing that reference counting works as well as a
true GC, since cycles, which is where reference counting breaks down, should
be avoided.

Anyway, this came up while investigating automated generation of classes to
represent types in SOAP messages. These classes would have no notion of
identity, so value semantics are indicated: two objects received in
different messages are equal if all of their values are equal. hashcode()
and equals() should be generated in the obvious way, by iterating over the
members of the class. But what happens if there's a cycle (which is
possible in encoded SOAP messages, though not in literal ones)? How do you
avoid the infinite loop? Well, let's see how containers avoid it...
 
M

Mike Schilling

Dario (drinking coï¬?ee in the oï¬fceâ?¦) said:
The Java List specification says:
Note: While it is permissible for lists to contain
themselves as elements, extreme caution is advised:
the equals and hashCode methods are no longer
well defined on a such a list.

Less weaselly wording would have been: "Doing so can lead to unavaoidable
infinite recursion."
 
J

John C. Bollinger

Mike said:
First, note that this behavior doesn't depend upon a List being a member of
itself. Create a graph of objects that have value-like semantics for
equality (and thus for hash code). This includes all Lists, Sets and Maps.
If there are any cycles in the graph, the result of hashCode() is an
infinite loop.
Acknowledged.

Arguing that there will never be cycles in a well-designed
application is a bit like arguing that reference counting works as well as a
true GC, since cycles, which is where reference counting breaks down, should
be avoided.

Hold on a moment there. I certainly wouldn't argue for a minute that a
well-designed application wouldn't have cycles of references. I would,
however, claim that I currently see no use for such a cycle in which all
members have the kind of value-like semantics that we're discussing.
Whether a List is directly or indirectly a member of itself makes little
difference -- neither makes sense in any practical application as far as
I can see.
Anyway, this came up while investigating automated generation of classes to
represent types in SOAP messages. These classes would have no notion of
identity, so value semantics are indicated: two objects received in
different messages are equal if all of their values are equal. hashcode()
and equals() should be generated in the obvious way, by iterating over the
members of the class. But what happens if there's a cycle (which is
possible in encoded SOAP messages, though not in literal ones)?

I know enough about SOAP to find that confusing, but not enough to be
certain whether I really ought to be confused. Are you talking about
types defined via XML Schema, to be used in schemas for SOAP message
bodies? IIRC, type _definitions_ can be recursive. Type
_implementations_, on the other hand, cannot be recursive because XML
documents are flat. (Or they're tree-shaped, if you prefer; you still
can't get recursion.)
How do you
avoid the infinite loop?

If I understand correctly (and quite possibly I don't) then you don't
need to worry about it.
Well, let's see how containers avoid it...

Gotcha. The point that we have now beaten to death being that they
don't. My further point being that they probably don't need to do for
any sensible application I can imagine. The same applies to your
analogous XML type classes as far as I can tell from your brief description.


John Bollinger
(e-mail address removed)
 
X

xarax

Mike Schilling said:
What do you think the following fragment prints?:

List l = new ArrayList();
l.add(l);
System.out.println(l.hashCode());

Trick question. The result is an infinite recursion (i.e. a stack
overflow), since the hash code for a List is calculated from the hash codes
of its members.

Likewise, the following causes an infinite recursion, since lists are
defined to be equal when they have the same members.

List l1 = new ArrayList();
List l2 = new ArrayList();

l1.add(l2);
l2.add(l1);

System.out.println(l1.equals(l2));

These cases are simple to spot, but the same situation could occur with any
cyclic graph of objects that compute hash code and/or equality based on the
values of their members. Also, equals() and hashCode() are often called
implicitly, for instance when adding an object to a HashSet or HashMap.

I don't see any way this could be addressed other than by keeping track of
which objects have already been processed by the current hashCode() or
equals() call, and there's no way to do that without changing the method
signatures, to, say, add a Set:

in Object:

public final int hashCode() {
return hashCode(new HashSet());
}

public final int hashCode(Set processed)
{
if (processed.contains(this)) {
return System.identityHashCode(this);
}
else {
processed.add(this);
return this.calculateHashCode(processed);
}
}

public int calculateHashCode(processed) {
return return System.identityHashCode(this);
}

with calculateHashCode(Set) being the method that classes can override and
hashCode(Set) the method List et al. call on their members. And that is a
pipe dream, since it isn't possible to make this sort of incompatible change
at this late date.


Not infinite recursion, because you don't have
infinite memory. You will elicit a stack overflow
exception.
 
J

Jesper Nordenberg

Andy Fish said:
Although mike's examples aren't necessarily real-world ones, I don't see any
reason they couldn't appear in real life

having an arrayList contain itself seems a perfectly reasonable thing to do
when manipulating complex data structures (though the only example I can
think of off hand is trying to compute an empirical solution to russell's
paradox ;-). And computing the hashcode of an ArrayList also seems entirely
reasonable.

I would say it's a bug in the implementation of ArrayList.HashCode()

Although ArrayList could be designed to solve this problem by using
thread local variables and sets, it's not worth the effort (and
performance hit) because I can't think of, and you haven't presented,
a case where cyclic hash code calculation would be useful. It's better
to document that object cycles are not supported in the current
implementation.

/Jesper Nordenberg
 
J

Jesper Nordenberg

Mike Schilling said:
First, note that this behavior doesn't depend upon a List being a member of
itself. Create a graph of objects that have value-like semantics for
equality (and thus for hash code). This includes all Lists, Sets and Maps.
If there are any cycles in the graph, the result of hashCode() is an
infinite loop. Arguing that there will never be cycles in a well-designed
application is a bit like arguing that reference counting works as well as a
true GC, since cycles, which is where reference counting breaks down, should
be avoided.

That is not a valid comparison, I can easily give an example in which
a object reference cycle occurs naturally (parent-child) and a
reference counting GC would fail. Can you give an example where a
cyclic hash code calculation would occur?
Anyway, this came up while investigating automated generation of classes to
represent types in SOAP messages. These classes would have no notion of
identity, so value semantics are indicated: two objects received in
different messages are equal if all of their values are equal. hashcode()
and equals() should be generated in the obvious way, by iterating over the
members of the class.

Not all class members should be included in the calculation of a hash
code, some "weak" references are created for traversal and
optimization (for example a parent reference in a child). Your cycle
can probably be broken by removing some member that shouldn't be
included in your hash code calculation. You should ask yourself if the
members you include in your hash code calculation for an object really
is a part of the object.

/Jesper Nordenberg
 
M

Mike Schilling

John C. Bollinger said:
Hold on a moment there. I certainly wouldn't argue for a minute that a
well-designed application wouldn't have cycles of references. I would,
however, claim that I currently see no use for such a cycle in which all
members have the kind of value-like semantics that we're discussing.
Whether a List is directly or indirectly a member of itself makes little
difference -- neither makes sense in any practical application as far as
I can see.


I know enough about SOAP to find that confusing, but not enough to be
certain whether I really ought to be confused. Are you talking about
types defined via XML Schema, to be used in schemas for SOAP message
bodies? IIRC, type _definitions_ can be recursive. Type
_implementations_, on the other hand, cannot be recursive because XML
documents are flat. (Or they're tree-shaped, if you prefer; you still
can't get recursion.)

No, encoded SOAP messages can express graphs with cycles. Attributes are
used to express directed references from one object to another, and the
graph can be arbitrarily complex.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top