hash values and equality

E

Ethan Furman

Several folk have said that objects that compare equal must hash equal,
and the docs also state this
http://docs.python.org/dev/reference/datamodel.html#object.__hash__

I'm hoping somebody can tell me what horrible thing will happen if this
isn't the case? Here's a toy example of a class I'm thinking of writing
that will compare equal with int's, but hash differently:

--> class Wierd():
.... def __init__(self, value):
.... self.value = value
.... def __eq__(self, other):
.... return self.value == other
.... def __hash__(self):
.... return hash((self.value + 13) ** 3)
....
--> one = Wierd(1)
--> two = Wierd(2)
--> three = Wierd(3)
--> one
<Wierd object at 0x00BFE710>
--> one == 1
True
--> one == 2
False
--> two == 2
True
--> three == 3
True
--> d = dict()
--> d[one] = '1'
--> d[two] = '2'
--> d[three] = '3'
--> d
{<Wierd object at 0x00BFE710>: '1',
<Wierd object at 0x00BFE870>: '3',
<Wierd object at 0x00BFE830>: '2'}
--> d[1] = '1.0'
--> d[2] = '2.0'
--> d[3] = '3.0'
--> d
{<Wierd object at 0x00BFE870>: '3',
1: '1.0',
2: '2.0',
3: '3.0',
<Wierd object at 0x00BFE830>: '2',
<Wierd object at 0x00BFE710>: '1'}
--> d[2]
'2.0'
--> d[two]
'2'

All information greatly appreciated!

~Ethan~
 
U

Ulrich Eckhardt

Ethan said:
Several folk have said that objects that compare equal must hash equal,
and the docs also state this
http://docs.python.org/dev/reference/datamodel.html#object.__hash__

I'm hoping somebody can tell me what horrible thing will happen if this
isn't the case?

If you were familiar with what a hash map is, you wouldn't ask. The thing is
that the hash is used to look up the place in the map where the thing is
stored. If two equal objects have different hashes, they will be stored in
different places in the hash map. Looking for object1 will then not turn up
with object2, even though they are equal. If this is something you don't
care about, and all you care about is identity, then I'd derive the hash
from each object's ID.

This ID has another property which is something that is assumed for hashes,
and your code seems a bit to get that wrong, too, and that is that the hash
must not change. Again, the reason is that if the hash changes, the position
in the hash map changes, too. If you then try to look up the changed object,
it will look for it in the new place, but it won't be found because it is in
the old place.

For that reason, it is generally useful to use immutable types like
integers, floats, strings and tuples thereof as keys. Since you can't change
them, you basically have the guarantee that they hash the same.

Uli
 
P

Peter Otten

Ethan said:
Several folk have said that objects that compare equal must hash equal,
and the docs also state this
http://docs.python.org/dev/reference/datamodel.html#object.__hash__

I'm hoping somebody can tell me what horrible thing will happen if this
isn't the case? Here's a toy example of a class I'm thinking of writing
that will compare equal with int's, but hash differently:

--> class Wierd():
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return self.value == other
... def __hash__(self):
... return hash((self.value + 13) ** 3)
...

Try this:
True
 
M

MRAB

If you were familiar with what a hash map is, you wouldn't ask. The thing is
that the hash is used to look up the place in the map where the thing is
stored. If two equal objects have different hashes, they will be stored in
different places in the hash map.
[snip]
Is this strictly true? I thought that the hash value, an integer, is
moduloed (Is that how you spell it? Looks weird!) with the number of
array elements to give an index into the array, so different hashes
could give the same index, and objects with different hashes could be
stored in the same 'bucket'.
 
C

Chris Angelico

[snip]
Is this strictly true? I thought that the hash value, an integer, is
moduloed (Is that how you spell it? Looks weird!) with the number of
array elements to give an index into the array, so different hashes
could give the same index, and objects with different hashes could be
stored in the same 'bucket'.

There can always be hash collisions between different objects, but the
assumption is that two identical objects will _always_ "collide".

Chris Angelico
 
E

Ethan Furman

Peter said:
Try this:

True

My apologies -- I'm trying this in Python3:

--> two in d
True
--> two in d.keys()
True
-->
--> 2 in d
True
--> 2 in d.keys()
True

~Ethan~
 
I

Ian Kelly

I think the question was: can this dummy code ever produce a set containing
less then itemCount items (for 0 < itemCount < 2**32)?

In CPython, no. Even when you get a hash collision, the code checks
to see whether the hashes are actually equal before it calls the rich
comparison, the former check being a much faster operation since the
hash values are cached.

I'm not sure whether this can be counted on for all Python
implementations, though.
 
E

Ethan Furman

Ulrich said:
If you were familiar with what a hash map is, you wouldn't ask. The thing is
that the hash is used to look up the place in the map where the thing is
stored. If two equal objects have different hashes, they will be stored in
different places in the hash map. Looking for object1 will then not turn up
with object2, even though they are equal.

In this case this is the behavior I want.
If this is something you don't
care about, and all you care about is identity, then I'd derive the hash
from each object's ID.

This won't work, as objects of the same type that compare equal should
(and do, in my code) hash equal.
This ID has another property which is something that is assumed for hashes,
and your code seems a bit to get that wrong, too, and that is that the hash
must not change.

The hash does not change on the instances, and is the same for all
instances of my type that compare equal.

~Ethan~
 
P

Peter Otten

Ethan said:
My apologies -- I'm trying this in Python3:

Then you have to convert the keys to a list explicitly:
.... def __init__(self, value):
.... self.value = value
.... def __eq__(self, other):
.... return self.value == other
.... def __hash__(self):
.... return hash((self.value + 13) **3)
....True
 
E

Ethan Furman

Peter said:
Then you have to convert the keys to a list explicitly:

... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return self.value == other
... def __hash__(self):
... return hash((self.value + 13) **3)
...
True

Ah!! The light finally dawned! Many thanks for everyone's input.

So if Wierd has a need to compare equal to some other type, it should
implement a .equals() method. Gotcha.

Likewise, if two different type's instances can compare equal, then for
the most part they should be interchangeable.

~Ethan~
 
S

Steven D'Aprano

A brief search on the web found a use of the word in 1982.

All that means is that two people, three decades apart, used the same non-
word :)

I think you are treating "modulo" as a verb, equivalent to division,
hence:

a/b => a is divided by b
a%b => a is "moduloed" by b

But modulo is not a verb. It is a preposition, a modifier word. Just as
you might say "the cat sat on the mat" (cat on mat) or "the Princess
found a pea underneath her mattress" (pea underneath mattress) so
mathematicians will say "a is taken modulo b" (a modulo b).

English verbs nouns at the drop of a hat, but I've never heard of it
verbing propositions:

"The princess underneathed the pea."

No, I don't think so.

English does use "remainder" as a verb, although not in the mathematical
sense; I think that:

a%b => a is remaindered by b

is at least grammatical, although still ugly and awkward. I'm afraid that
in English, the best way to say what you are trying to say is moderately
verbose:

"the hash value, an integer, is taken modulo ..."
 
S

Steven D'Aprano

All that means is that two people, three decades apart, used the same
non- word :)
[snip]
There were other uses. That's just the earliest one I found.


Nevertheless, it is still ungrammatical and incorrect usage. I'm not a
prescriptivist, but not everything people write down is a word, otherwise
we'd be forcefied to say evert typlo and mystake wsa an actul wrd.
 
G

Gregory Ewing

Ethan said:
In this case this is the behavior I want.

You can't rely on it, though. The hash value gets reduced
modulo the size of the dict, so even if two objects have
different hashes, in some cases they will land on the same
dict slot anyway.

So an object such as you're postulating would behave
unpredictably when used as a dict key. Sometimes a lookup
using a different but equal object would find it, and
sometimes not, seemingly at random.
 
P

Peter Otten

Gregory said:
You can't rely on it, though. The hash value gets reduced
modulo the size of the dict, so even if two objects have
different hashes, in some cases they will land on the same
dict slot anyway.

So an object such as you're postulating would behave
unpredictably when used as a dict key. Sometimes a lookup
using a different but equal object would find it, and
sometimes not, seemingly at random.

I think for every potential match the current dict implementation checks
identity, then hashes -- and equality only if the hash values are equal. The
relevant function is lookdict() in dictobject.c.

While that means that the problem you describe cannot occur a simple
approach that avoids relying on an implementation detail (?) would be to use
(hash(obj), obj) tuples instead of just obj as the key.
 
J

John Nagle

For that reason, it is generally useful to use immutable types like
integers, floats, strings and tuples thereof as keys. Since you can't change
them, you basically have the guarantee that they hash the same.

Right. It's something of a lack that Python doesn't
have user-defined immutable objects.

John Nagle
 
I

Irmen de Jong

Right. It's something of a lack that Python doesn't
have user-defined immutable objects.

John Nagle

collections.namedtuple is rather powerful though... and can be used as key AFAIK.

Irmen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top