hashCode

B

bob smith

From: bob smith <[email protected]>

Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: bob smith
From: Arne Vajhoj <[email protected]>

Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

It meets the minimum contractual obligation for that method.

Would it be potentially better if that was Object's implementation?

It has a different behavior that may not be valid if you override equals.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
E

Eric Sosman

To: bob smith
From: Eric Sosman <[email protected]>

Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?

Define "better."

--
Eric Sosman
(e-mail address removed)

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
M

markspace

To: Arne Vajhøj
From: markspace <-@.>

It meets the minimum contractual obligation for that method.



It has a different behavior that may not be valid if you override equals.


I think at this point we are doing Bob's homework for him.

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: markspace
From: Arne Vajhoj <[email protected]>

I think at this point we are doing Bob's homework for him.

Could be.

But I think the question whether returning a constant from hashCode is a valid
Java question for understanding the contract for that method.

And I am pretty sure that I have seen other similar examples (just with 42 as
constant).

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
R

Roedy Green

To: bob smith
From: Roedy Green <[email protected]>

@Override
public int hashCode() {
return 1;
}

that's about the worst possible hashCode function.

See http://mindprod.com/jgloss/hashcode.html for tips on how to write good
ones.
--
Roedy Green Canadian Mind Products http://mindprod.com A new scientific truth
does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up
that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
L

Lew

To: Roedy Green
From: Lew <[email protected]>

Roedy said:
bob smith wrote, quoted or indirectly quoted someone
who said :


that's about the worst possible hashCode function.

Normally that's correct, but it's conceivable that one might do it for some
hackish reason. In most situations where one might do such an override as this,
one would do better not to override hashCode().
See http://mindprod.com/jgloss/hashcode.html for tips on how to write
good ones.

The default of assembling it via the mix-in of attribute hash codes using the
Knuth constants is usually good enough.

h(object) = Sum(i rnn 0.. n-1) of ( attribute * pow(31, n - 1 - i) );

or

public static int calculateHash(Foo arg) {
int h = 0;

for ( each attribute of Foo that contributes to 'equals()' )
{
h = 31 * h + attribute.hashCode();
}
return h;
}

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Java_hashCode()
http://en.wikipedia.org/wiki/Java_hashCode()#Java

--
Lew

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
B

bob smith

To: Eric Sosman
From: bob smith <[email protected]>

Define "better."



--

Eric Sosman

(e-mail address removed)

Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very
broken.

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
L

Lew

To: bob smith
From: Lew <[email protected]>


It complies with the contract for 'hashCode()'. Is that all it takes to be
correct?

Would what be better if what were Object's implementation of what?
Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very
broken.

No.

No matter what 'Object''s 'hashCode()' implementation were, it would need to be
overridden when you want value equality instead of object identity for
'equals()'.

See Joshua Bloch's seminal work /Effective Java/, which has items that pertain
to this.

Bottom line: 'hashCode()', 'equals()', and when present, 'compareTo()' must be
consistent.

'toString()' should be consistent with those.

As long as 'hashCode()' fulfills the contract, your code will work -
functionally. But a bad
'hashCode()' could and likely will noticeably affect performance. There is more
to correctness
than mere functional conformance.

--
Lew

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Roedy Green
From: Arne Vajhoj <[email protected]>

that's about the worst possible hashCode function.

Yes, but the posted asked "Is it always technically correct to ..." not whether
is was "best possible".

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Lew
From: Arne Vajhoj <[email protected]>

broken.

As long as 'hashCode()' fulfills the contract, your code will work - functionally. But a bad
'hashCode()' could and likely will noticeably affect performance. There is more to correctness
than mere functional conformance.

If the code per specs is guaranteed to work then it is correct.

Good (or just decent) performance is not necessary for code to be correct.

At least not in the traditional programming terminology.

In plain English maybe.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: bob smith
From: Arne Vajhoj <[email protected]>

Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very
broken.

It is not broken.

It will perform poorly in many cases.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
R

rossum

To: Arne Vajhøj
From: rossum <[email protected]>

And I am pretty sure that I have seen other similar
examples (just with 42 as constant).
The magic word is "Bloch".

rossum

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
R

Roedy Green

To: Lew
From: Roedy Green <[email protected]>

h =3D 31 * h + attribute.hashCode();
}
In my essay I recommend XOR which is an inherentely faster operation than
multiply. I wonder which actually works out better. If you had a large number
of fields, the multiply effect could fall off the left hand end. It is the
algorithm used for String which could have very long strings, so Sun must have
thought of that.
--
Roedy Green Canadian Mind Products http://mindprod.com A new scientific truth
does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up
that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
E

Eric Sosman

To: bob smith
From: Eric Sosman <[email protected]>

On 8/10/2012 6:22 PM, bob smith wrote: [... many blank lines removed for
legibility's sake ...]
Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very
broken.

I cannot think of a case where you HAVE to override hashCode(),
except as a consequence of other choices that you didn't HAVE to make. You
don't HAVE to invent classes where distinct instances are considered equal, and
even if you do you don't HAVE to put those instances in HashMaps or HashSets or
whatever.

But that's a bit specious: All it says is that you don't HAVE
to override hashCode() because you don't HAVE to use things that call it. It's
like "You don't HAVE to pay taxes, because you don't HAVE to live outside
prison." So, let's take it as a given that you will often need to write
classes that override equals() and hashCode() -- I imagine you understand that
they go together.

Okay: Then returning a constant 1 (or 42 or 0 or whatever)
would in fact satisfy the letter of the law regarding hashCode(): Whenever
x.equals(y) is true, x.hashCode() == y.hashCode(). In your example this would
be trivially true because x,y,z,... all have the same hashCode() value, whether
they're equal or not -- You have lived up to the letter of the law.

Of course, such a hashCode() would make all those hash-based
containers pretty much useless: They would work in the sense that they would
get the Right Answer, but they'd be abominably slow, with expected performance
of O(N) instead of O(1). See
<http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
for a survey of some denial-of-service attacks that work by driving hash tables
from O(1) to O(N), resulting in catastrophic failure of the attacked system.

In other words, the letter of the law on hashCode() is a bare
minimum that guarantees correct functioning, but it is not enough to guarantee
usability. Why isn't the law more specific? Because nobody knows how to write
"hashCode() must be correct *and* usable" in terms that would cover all the
classes all the Java programmers have dreamed up and will dream up. Your
hashCode() meets the bare minimum requirement, but is not "usable." The actual
hashCode() provided by Object also meets the bare minimum requirement, and *is*
usable as it stands, until (and unless; you don't HAVE to) you choose to
implement other equals() semantics, and a hashCode() to match them.


--
Eric Sosman
(e-mail address removed)

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
E

Eric Sosman

To: Arne Vajhøj
From: Eric Sosman <[email protected]>

Yes, but the posted asked "Is it always technically correct to ..."
not whether is was "best possible".

He also asked whether it would "be potentially better."

--
Eric Sosman
(e-mail address removed)

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
J

Jan Burse

To: bob smith
From: Jan Burse <[email protected]>

bob said:
Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?

Maybe it would make sense to spell out what the contract for hashCode() is.
Well the contract is simply, the following invariant should hold:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

It should be noted that this does not imply:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy the later, although
the closer it comes to the later, the better it works for HashMap, etc..

The default objects implementation of hashCode() matches the default objects
impementation of equals(). The default objcts implementation of equals() is ==.
And the default objects implementation of hashCode() is
System.identityHashCode().

The System identity hash code is stored in the object and generated by the
system. It does not change during GC although the internal object address might
change during GC. It is only 32bit although internal object addresses might by
64bit with a corresponding JVM.

Returning a constant, any constant c not only 1, would be technically correct
correct for the default implementation of the class object. Since it trivially
satisfies the invariant:

if a.equals(b) then c==c

is trivially true, since c==c is true. But it is not better. Since you would
get very degenerated HashMaps, etc..

You need to override the hashhCode() when there is danger that the invariant is
not anymore satisified. This is not the case when equals() is not overridden.
So overriding hashCode() just for fun when equals() is not overriden, usually
doesn't make sense. It will probably only slow down the hashCode() calculation.
So the following:

hashCode() = sum attr_i * c^i

Is not necessary. But it would be a possible way to go when equals() were
overriden in the following way:

equals(other) = and_i attr_i.equals(other.attr_i)

The above happens when you turn your object into a container of other objects
irrespective of the own object identity. But beware if the container contains
itself somewhere. This is why we find in the code for Hashtable the following
complication:

public synchronized int hashCode() {
/*
* This code detects the recursion caused by computing the hash code
* of a self-referential hash table and prevents the stack overflow
* that would otherwise result. This allows certain 1.1-era
* applets with self-referential hash tables to work. This code
* abuses the loadFactor field to do double-duty as a hashCode
* in progress flag, so as not to worsen the space performance.
* A negative load factor indicates that hash code computation is
* in progress.
*/

Interestingly it will return a constant for the object when it detects a loop.
Maybe one could do better... Dunno

Bye

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Eric Sosman
From: Arne Vajhoj <[email protected]>

He also asked whether it would "be potentially better."

"better to use Object hashCode" which again should bring the correctness
question before the performance question.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
J

Joerg Meier

To: Roedy Green
From: Joerg Meier <[email protected]>

In my essay I recommend XOR which is an inherentely faster operation
than multiply.

Hasn't that been wrong since about the invention of the 80386 processor family
? Pretty sure by now MUL and XOR both take one cycle and that's it.


Liebe Gruesse,
Joerg

--
Ich lese meine Emails nicht, replies to Email bleiben also leider ungelesen.

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
P

Peter Duniho

To: Roedy Green
From: Peter Duniho <[email protected]>

In my essay I recommend XOR which is an inherentely faster operation
than multiply. I wonder which actually works out better. If you had a
large number of fields, the multiply effect could fall off the left
hand end. It is the algorithm used for String which could have very
long strings, so Sun must have thought of that.

Lew's example is better than XOR. I think you should fix your essay.

There are of course even better approaches if one knows something specific
about the input data. But the "multiply previous by prime, add next value" is a
reasonably standard, "pretty good" composition function.

XOR a well-known problems in the context of hash codes: in many situations,
large numbers of combinations of values wind up mapping to the same result.
Consider a pair of integers: any time those values are identical, the result of
XOR is 0. On the other hand, with the multiply-and-add approach, it's much less
common for "typical" integer values to combine to a hash value of 0.

(Integers are an important scenario, because a 32-bit integer naturally is its
own hash value...there's no reason to try to mix the bits any further as is
done with other data types, so they don't).

Now (unless I've done my math wrong) if your component hash codes are actually
well-distributed across the entire range of a 32-bit integer (i.e. are
completely uncorrelated to each other and have entirely random distribution),
the XOR should work fine. It's just that it's very susceptible to creating
hash value collisions when the input values aren't themselves well-distributed
and so as a general technique it's not nearly as good as multiply-and-add.

As far as performance goes, even if the multiply-and-add costs more (and as
Joerg points out, this may only be a 1-cycle vs 2-cycle difference anyway on
that specific operation), your code should not be spending a lot of time
computing hash values in the first place. If that's a significant bottleneck,
there are better ways to improve performance than worrying about XOR vs
multiply-and-add, especially since a poor hash function can hurt performance
much worse than using the wrong math operation would.

Pete

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

multiple inheritance 5
multiple inheritance 14
Bluetooth programming 0
multiple inheritance 6
multiple inheritance 5
Bluetooth programming 0
Bluetooth programming 0
Swing 7

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top