hashability

James Stroud · Aug 12, 2009

Hello All,

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

py> class C(object): pass
...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences), I came
up with this test for hashability:

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)]))
True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Can anyone think of boundary cases I might be missing with this approach?

James

Carl Banks · Aug 12, 2009

Hello All,

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

py> class C(object): pass
...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

That's arguably the right thing to do.

Personally I've found that being able to use class instances as
hashable objects to be terribly useful (these objects are hashed and
compared by identity, of course), so I don't mind it. But I can
definitely see how this straddles the line between "practicality" and
"face of ambiguity". And if Python didn't do it by default, it would
be little trouble to add the appropriate __eq__ and __hash__ methods.

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences),

Even so, I would consider whether some of your users might, like me,
also find this terribly useful, and if so (probably a few will unless
they are all novices), allow them to disable or disregard this check.

I came
up with this test for hashability:

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

I wouldn't call the function "hashable". Class instances like C() are
hashable whether you approve or not. Something like
"deliberately_hashable" would be a better name.

It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)]))
True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Can anyone think of boundary cases I might be missing with this approach?

It is possible to redefine == operator by defining __ne__ instead of
__eq__, at least on Python 2.5, so you should keep that in mind.

Carl Banks

Asun Friere · Aug 12, 2009

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

Arbitrary, or anonymous objects and some uses or some users? I'm
can't see why anyone would expect different instance of a class to be
equivalent keys.

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated.

Perhaps the best solution would be for the unitiated to correct their
misaprehensions? If you don't understand that you are instantiating a
number of anonymous instances of a class you are missing something
very fundamental.

Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

Then you couldn't to this:

d = {C():1, C():2, C():3}
a,b,c = d.keys()
d[c]

Anonymous instances are a GoodThing(tm) and they can usually be de-
anonymised if need be.

James Stroud · Aug 12, 2009

Carl said:
Even so, I would consider whether some of your users might, like me,
also find this terribly useful, and if so (probably a few will unless
they are all novices), allow them to disable or disregard this check.

I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

However, I also want to avoid isinstance() and/or type checking in order
to allow maximal flexibility so that users don't have to work too hard
to make their keys.

I wouldn't call the function "hashable". Class instances like C() are
hashable whether you approve or not. Something like
"deliberately_hashable" would be a better name.

I chose "keyable".

It is possible to redefine == operator by defining __ne__ instead of
__eq__, at least on Python 2.5, so you should keep that in mind.

Thanks for this hint. I've already put it in.

James

Asun Friere · Aug 12, 2009

I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

What am I missing here? How, in terms of persistence, is {C():4, C():
4} conceptually different from {'spam':4, 'ham':4}?

Chris Rebert · Aug 12, 2009

2009/8/11 Asun Friere said:
I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

Click to expand...

What am I missing here? Â How, in terms of persistence, is {C():4, C():
4} conceptually different from {'spam':4, 'ham':4}?

Thought Experiment:
Consider, for each dict, the case of pickling it twice to 2 separate files.
When loaded from both files into the same program, the spam-ham dicts
will work as expected between each other.
The dicts with C()s will not. For example, assuming the unpickled
dicts are `cs1` and `cs2`, cs1[cs2.keys()[0]] will raise KeyError.

Cheers,
Chris

Asun Friere · Aug 12, 2009

You should be more imaginative.

I'm by no means discounting that there might be some actual problem
you're trying to solve here, but I honestly can't see it.

There really is no need to get personal about this, so rather than
asking for a level of imagination from me, (which I apparently lack),
please just explain to me how {one_instance_of_a_hashable_class : val,
another_instance_of_a_hashable_class :val} is conceptually different
{one_instance_of_class_str: val, another_instance_of_class_str: val},
in terms of persistence.

If I am missing something here, I would actually like to know. If on
the other hand, I'm not, then rather at taking umbrage, you might want
instead to save yourself the effort of solving a non-existent problem?

Can you come give a class to my users?

No.

However, I think it's fairly central to the notion of a class that it
is a template for creating different instances which themselves have a
unique identity. And that subsequent calls to a class' constructor
ought to create unique instances of that class (subject perhaps to
implementation tricks such as interning). If it is not obvious that {C
():4}[C()] invovles subsequent calls to C's constructor, then that
very example is itself didactic.

Chris Rebert · Aug 12, 2009

2009/8/11 Asun Friere said:
I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

Click to expand...

What am I missing here? Â How, in terms of persistence, is {C():4, C():
4} conceptually different from {'spam':4, 'ham':4}?

Consider the case of pickling the data twice to 2 separate files.
When loaded from both files into the same program, the spam-ham dicts
will work as expected.
The C()s will not. For cs1[cs2.keys()[0]] will raise KeyError.

Cheers,
Chris

Chris Rebert · Aug 12, 2009

Apologies for the possible repeated post. Gmail failed to mark the
draft as sent for some reason.

- Chris

Asun Friere · Aug 12, 2009

Thought Experiment:
Consider, for each dict, the case of pickling it twice to 2 separate files.
When loaded from both files into the same program, the spam-ham dicts
will work as expected between each other.
The dicts with C()s will not. For example, assuming the unpickled
dicts are `cs1` and `cs2`, cs1[cs2.keys()[0]] will raise KeyError.

Aha!

cheers.

James Stroud · Aug 12, 2009

Asun said:
I'm by no means discounting that there might be some actual problem
you're trying to solve here, but I honestly can't see it.

There really is no need to get personal about this, so rather than
asking for a level of imagination from me, (which I apparently lack),
please just explain to me how {one_instance_of_a_hashable_class : val,
another_instance_of_a_hashable_class :val} is conceptually different
{one_instance_of_class_str: val, another_instance_of_class_str: val},
in terms of persistence.

Sorry for being a twit. This list used to be quite nice but some people
on this list got pretty abrasive. I couldn't tell you weren't one of
these abrasive people from your post. I stopped giving the benefit of
the doubt many moons ago and promptly enter attack mode at the slightest
hint of abrasiveness. My attitude probably exacerbates the problem--but
it sure does make me feel better.

Anyway, I think the problem has been described better than I'm able, but
once an object goes to the file system, one can not take its hash value
for granted. It is not possible to rely on the ability to recreate any
arbitrary object de-novo and use the recreation as a key in proxy for
the original object. I'm after this "je ne sais pas que l'appeler"
quality of objects to use as keys (or to generate keys) for persistent
dictionaries. Carl Banks suggested that this quality should not be
called "hashable", so I'm calling it "keyable".

Asun Friere · Aug 12, 2009

Sorry for being a twit.

Don't be ridiculous. You haven't been a twit, I have!

I've just had a complete "blonde" moment here (with apologies to any
real blondes out there. My only excuse is that I've been up to 02:30
for the last three nights running (or maybe it's the ageing process, a
cerebrovascular accident or something).

I just checked a class I recently wrote specifically for the purposes
of hashing a dict (in case I had made this error IRL). Wouldn't you
know it, it's subclassed to tuple, and defines both __eq__ and
__cmp__. Luckily when I write production code the guy who knows what
he's doing takes over. And this in an app which compares keys from
different pickled files (representing DB holdings)?! Of all things.

I can't believe how unfathomably stupid I've been. I'm extremely
embarassed. I think I'll just go out and shoot myself now. Or get
some sleep.

Dennis Lee Bieber · Aug 12, 2009

...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

"A" C()... How would you expect

c1 = C()
c2 = C()

{c1:4}[c2]

to behave? That IS the equivalent of your statement -- two instances are
two distinctly different entities...

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences), I came
up with this test for hashability:

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)]))
True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Can anyone think of boundary cases I might be missing with this approach?

James

Asun Friere · Aug 12, 2009

c1 = C()
c2 = C()

{c1:4}[c2]

to behave? That IS the equivalent of your statement -- two instances are
two distinctly different entities...

Thankyou, that is EXACTLY the mistake I made that sent me off into
lunacy.

At the risk of further embarassment, the answer is that one would
expect it to behave analogously to:

c1 = int(x)
c2 = int(x)

{c1:4}[c2]

Or is my brain still on vacation?

James Stroud · Aug 12, 2009

Dennis said:
...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

Click to expand...

"A" C()... How would you expect

c1 = C()
c2 = C()

{c1:4}[c2]

to behave?

This seems like a subjective question. I think I demonstrated how it
*does* behave, which is purely objective--and I know I can't do anything
about that. But the subjective answer to the subjective question is that
I would like an exception to be raised when the dictionary is
constructed. I know an exception doesn't get thrown in the current
incarnation of the python language. That is the objective reality of the
situation which affronts my subjective notions of how reality should be.

That IS the equivalent of your statement -- two instances are

two distinctly different entities...

Tell that to two different machines on two different days. Then bet the
life of yourself and your nearest and dearest family on that fact and
see whether you really want to take a hash value for granted. If a
property of the python language fails the "bet the lives of your nearest
and dearest on a consistent result" test, I call it "ill defined" and,
subjectively speaking, I prefer exceptions to be thrown--And, by damned,
I'll throw them myself if I have to.

"If it saves one life, it's worth it all."

James

Steven D'Aprano · Aug 12, 2009

Hello All,

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

No it doesn't.

hash([])

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable

Python successfully hashes *non*-arbitrary objects, including those that
inherit from object.

py> class C(object): pass
...
py> {C():4}[C()]

Why would you expect that to succeed? The object you are using as the key
is a different instance than the object you are looking up. Since your
instances don't even compare equal, why would you expect them to hash
equal?

False

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated.

Yes, well, the ignorant and uninitiated frequently make mistakes. I feel
your pain.

Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

It's not ill-defined, it's a perfectly reasonable design you just don't
happen to like. It's quite useful for some.

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences), I came
up with this test for hashability:

That's not a test for hashability. That's a test for hashability and
equality testing together.

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

The test for __hash__ is redundant, given that hash() has already
succeeded.

It will wrongly reject classes that deliberately don't define equality,
for whatever reason, and that *expect* the behaviour you are treating as
an error.

It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)])) True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Well there you go -- why on earth would you prohibit None as a dictionary
key??? That's a serious failure.

Steven D'Aprano · Aug 12, 2009

Tell that to two different machines on two different days. Then bet the
life of yourself and your nearest and dearest family on that fact and
see whether you really want to take a hash value for granted.

As far as I know, Python doesn't make any guarantees about hashes being
repeatable on different machines, different versions, or even different
runs of the interpreter.

If a
property of the python language fails the "bet the lives of your nearest
and dearest on a consistent result" test, I call it "ill defined" and,
subjectively speaking, I prefer exceptions to be thrown--And, by damned,
I'll throw them myself if I have to.

"If it saves one life, it's worth it all."

Depends on the life, and the cost. Would you pay a million dollars from
your own pocket to save the life of a 119 year old with advanced lung
cancer, a bad heart, and a raging infection of Ebola, from choking on a
fish bone?

What if the effort of saving that one life kills two lives? Opportunity
costs are costs too.

James Stroud · Aug 12, 2009

Steven said:
Well there you go -- why on earth would you prohibit None as a dictionary
key??? That's a serious failure.

roentgen 1% python
Python 2.5 (r25:51908, Sep 20 2006, 17:36:21)
[GCC 3.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
135543872

mbi136-176 98% /usr/bin/python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
2030240

That's why. Read the whole thread. You are one of the abrasive ones.

Chris Rebert · Aug 12, 2009

Steven said:
Steven said:

Well there you go -- why on earth would you prohibit None as a dictionary
key??? That's a serious failure.

Click to expand...

roentgen 1% python
Python 2.5 (r25:51908, Sep 20 2006, 17:36:21) [GCC 3.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
135543872

mbi136-176 98% /usr/bin/python
Python 2.5.1 (r251:54863, Feb Â 6 2009, 19:02:12) [GCC 4.0.1 (Apple Inc.
build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
2030240

Actually, None is a special-case as a built-in singleton value --
there's only ever *exactly one* instance of it in a given interpreter
session. And I'm reasonably sure dict pickles don't store the hash
code of items (the dict gets recreated from scratch), so there's no
problem.

Cheers,
Chris

Steven D'Aprano · Aug 12, 2009

Steven said:
Steven said:

Well there you go -- why on earth would you prohibit None as a
dictionary key??? That's a serious failure.

Click to expand...

roentgen 1% python
Python 2.5 (r25:51908, Sep 20 2006, 17:36:21) [GCC 3.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
135543872

mbi136-176 98% /usr/bin/python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12) [GCC 4.0.1 (Apple Inc.
build 5465)] on darwin Type "help", "copyright", "credits" or "license"
for more information. py> hash(None)
2030240

That's why. Read the whole thread. You are one of the abrasive ones.

I've read the whole thread. Pay close attention:

[steve@ando ~]$ python
Python 2.4.3 (#1, Mar 14 2007, 18:51:08)
[GCC 4.1.1 20070105 (Red Hat 4.1.1-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.[steve@ando ~]$ ssh sylar
steve@sylar's password:
Last login: Wed Aug 12 21:44:47 2009
[steve@sylar ~]$ python2.6
Python 2.6.1 (r261:67515, Dec 24 2008, 00:33:13)
[GCC 4.1.2 20070502 (Red Hat 4.1.2-12)] on linux2
Type "help", "copyright", "credits" or "license" for more information.{None: 42}

I have successfully pickled a dict using None as a key on one machine
using Python 2.4, and unpickled it on a completely different machine
running Python 2.6.

Still think that pickling None is a problem?

Unexpected comparisons in dict lookup	1	Mar 18, 2014
Python Unicode handling wins again -- mostly	67	Nov 30, 2013
Attack a sacred Python Cow	273	Jul 24, 2008
Retrieving an object from a set	1	Jan 25, 2013
generating unique set of dicts from a list of dicts	0	Jan 10, 2012
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022
School Project	1	Dec 8, 2022
[ANN] map-1.5.0	3	Oct 29, 2010

hashability

James Stroud

Carl Banks

Asun Friere

James Stroud

Asun Friere

Chris Rebert

Asun Friere

Chris Rebert

Chris Rebert

Asun Friere

James Stroud

Asun Friere

Dennis Lee Bieber

Asun Friere

James Stroud

Steven D'Aprano

Steven D'Aprano

James Stroud

Chris Rebert

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads