booleans, hash() and objects having the same value

R

Ryszard Szopa

Hi all,

I've just read PEP 285 so I understand why bool inherits from int and
why, for example, ((False - True)*True)**False==1. This was necessary
for backwards compatibility and to give the beast some ability to do
moral reasoning. For example, Python knows to value the whole truth
more than just a half-truth:

In [95]: True > 0.5*True
Out[95]: True

Anyway, the thing that bothers me is the behavior of booleans when
passed as argument to the hash() function... That is, hash(True) ==
hash(1) and hash(False) == hash(0). This leads to a rather
counterintuitive interaction with dicts:

In [124]: d = {}

In [125]: d[True] = repr(True)

In [126]: d[1] = repr(1)

In [127]: d[True]
Out[127]: '1'

In [128]: d
Out[128]: {True: '1'}

You may argue that this is a rather strange use case... However, you
may imagine that somebody would want a dict mapping from objects to
their representations, with 0, 1 and booleans among the objects, like
in:

In [123]: dict((el, repr(el)) for el in [0, 1, True, False])
Out[123]: {0: 'False', 1: 'True'}

In both cases, the result is rather unexpected, though after some
thinking, understandable (`==' tests the equality of values of
objects, True==1, and (from the documentation of hash) "Two objects
with the same value have the same hash value"). However, is this
approach really sound? Wouldn't it be more sensible to have bool its
own __hash__?

PEP 285 doesn't mention anything about hashing (in fact, it doesn't
contain the string `hash' at all). Is it that nobody has noticed the
problem, it is a well known fact usually classified as a non-problem,
or maybe there are some serious reasons to keep 1 and True having the
same hash value?

(Also, it is not completely clear what it means for two Python objects
to "have the same value". My first intuition would be that variables
may have a value, which usually is some Python object. The second
intuition would be that objects with compatible (i.e. one inherits
from the other) types and ==-equal dicts have the same value. However,
this is _sometimes_ true. On one hand:

In [173]: True == 1
Out[173]: True

In [165]: class MyDict(dict): pass
.....:

In [175]: x = {'a': 1}

In [176]: w = MyDict(a=1)

In [177]: w == x
Out[177]: True

But, on the other hand:

In [178]: class Foo(object): pass
.....:

In [179]: class Bar(Foo): pass
.....:

In [180]: foo = Foo()

In [181]: bar = Bar()

In [182]: bar.__dict__==foo.__dict__
Out[182]: True

In [183]: bar == foo
Out[183]: False

*Truly* puzzling, I must say.)

Best regards,

-- Richard
 
S

Steven D'Aprano

Hi all,

I've just read PEP 285 so I understand why bool inherits from int and
why, for example, ((False - True)*True)**False==1.

And don't think that the choice was uncontroversial.
This was necessary for backwards compatibility

"Necessary" is perhaps a little strong, but otherwise yes.
and to give the beast some ability to do moral reasoning.
For example, Python knows to value the whole truth more
than just a half-truth:

In [95]: True > 0.5*True
Out[95]: True

You're trying to be funny, yes?


Anyway, the thing that bothers me is the behavior of booleans when
passed as argument to the hash() function... That is, hash(True) ==
hash(1) and hash(False) == hash(0).

How do you feel about this?
True

It's the same thing: True is actually 1, just as 1.0 is, and so
hash(True) is the same as hash(1), hash(1.0) and hash(1L).

This leads to a rather
counterintuitive interaction with dicts:
[...]

Out[128]: {True: '1'}

Yes, that's one of the disadvantages of having bools actually be ints, in
the rare cases that you want bools to hash differently from ints, they
don't. But that's no different from longs and ints hashing the same, or
strings and unicode strings.

You may argue that this is a rather strange use case... However, you may
imagine that somebody would want a dict mapping from objects to their
representations, with 0, 1 and booleans among the objects, like in:

In [123]: dict((el, repr(el)) for el in [0, 1, True, False]) Out[123]:
{0: 'False', 1: 'True'}

Why bother with such a mapping? It already exists, and it is called
repr(). Best of all, repr() shouldn't give a KeyError, and it can take
mutable arguments.


In both cases, the result is rather unexpected, though after some
thinking, understandable (`==' tests the equality of values of objects,
True==1, and (from the documentation of hash) "Two objects with the same
value have the same hash value"). However, is this approach really
sound?

Absolutely. As a general data type, the most sensible behaviour for hash
tables is for dict[X] and dict[Y] to give the same result if X and Y are
equal.

Wouldn't it be more sensible to have bool its own __hash__?

Who cares what bools hash to? The real question is, should True be equal
to 1 (and 1.0 and 1L) or not?

The decision that it should was made a long time ago. It may or may not
have been the best decision, but it's a decision and I doubt that it will
be changed before Python 4000. Or possibly Python 5000.

PEP 285 doesn't mention anything about hashing (in fact, it doesn't
contain the string `hash' at all). Is it that nobody has noticed the
problem, it is a well known fact usually classified as a non-problem, or
maybe there are some serious reasons to keep 1 and True having the same
hash value?

It's a non-problem in general. There might be highly specialized
situations where you want 1.0 and 1 to map to different items, or 'xyz'
and u'xyz', but being specialist they belong in your application code and
not the language.

Here's a start in implementing such a thing:

class MyDict(dict):
def __getitem__(self, key):
key = (type(key), key)
return super(MyDict, self).__getitem__(key)
def __setitem__(self, key, value):
key = (type(key), key)
super(MyDict, self).__setitem__(key, value)

D = MyDict(); D[1] = "one"; D[1.0] = "one point oh"
D[1L] = "long one"; D[True] = "True"
D[1] 'one'
D[True]
'True'



(I leave implementing the other necessary methods as an exercise.)


(Also, it is not completely clear what it means for two Python objects
to "have the same value". My first intuition would be that variables may
have a value, which usually is some Python object.

I think that value should be interpreted rather fuzzily. I don't believe
it is strongly defined: the concept of the value of an object depends on
whatever the object wants it to be. For example, given an instance x with
an attribute "foo", is x.foo part of the value of x, or is it something
extra? Only the object x can make that decision.

However, for standard objects like strings, ints, floats, etc. the value
of the object corresponds to the intuitive ideas about strings, ints,
floats etc. The value of the int 5 is 5, the value of the string "xyz" is
"xyz", and so forth.

For "well-behaved" objects, x and y have the same value when x == y
returns True. Leave it to the objects to decide what their value is.


It's easy to create ill-behaved objects:

class Weird:
def __eq__(self, other):
if other is self:
return False
elif other is True:
return True
elif other == 1:
return False
else:
import time
return int(time.time()) % 2 == 0

but in general, you don't need to worry about such nonsense objects.

The second intuition
would be that objects with compatible (i.e. one inherits from the other)
types and ==-equal dicts have the same value. However, this is
_sometimes_ true.

Python rarely cares about the type of objects, only the behaviour.
Inheritance doesn't come into it, except as one possible way to get that
behaviour:

class MyInt: # DON'T inherit from int
def __init__(self, value):
if value == 'one': a, b = 0, 1
elif value == 'two': a, b = 1, 1
elif value == 'three': a, b = 1, 2
else:
raise ValueError("can't count that high")
self.data = (a, b)
def __eq__(self, other):
return other == sum(self.data)

Instances of MyInt have the same value as the ints 1, 2, or 3 as
appropriate. In all other ways though, MyInt and int behave very
differently: for example, you can't add MyInts.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top