Modules are hashable?!

L

Leif K-Brooks

I was just playing around, and noticed that modules seem to be hashable.
Can anyone explain that, especially given the fact that they're mutable?

Python 2.3.3 (#1, May 7 2004, 10:31:40)
[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> hash(sys) -150589324
>>> sys.x = 42
>>> hash(sys) -150589324
>>> foo = {sys:'bar'}
>>> foo[sys]
'bar'
 
J

Jason Lai

Leif said:
I was just playing around, and noticed that modules seem to be hashable.
Can anyone explain that, especially given the fact that they're mutable?

Most objects in Python are hashable, and also mutable. If you define a
class Foo and create x = Foo(), x will be hashable. But you can also do
x.stuff = 3. So modules appear to be pretty much regular objects, which
are usually hashed by ID -- which will be unique for any two objects.
Try hash(sys) == id(sys) to see for yourself.

Lists, tuples, dicts, strings, and a few other things are odd in that
they're hashed/compared by their contents rather than ID. So you can
have two different objects that are equivalent with regards to hashing
or comparing.

Due to the way hashing works, the hash value is not allowed to change
while it's in a dict. IDs never change, so regular objects have an
unchanging hash value even if you change their contents. For the special
objects, changing the contents would change the hash value, and that's
not allowed.

I think that's how it works, anyway :p

- Jason Lai
 
A

Alex Martelli

Leif K-Brooks said:
I was just playing around, and noticed that modules seem to be hashable.
Can anyone explain that, especially given the fact that they're mutable?

Any object x is hashable if type(x) does not expose __eq__ nor __cmp__.
In that case, the meaning of x==y for that object is 'x is y', that is,
id(x)==id(y), so having hash(x) return id(x) is perfectly functional.
Mutation is not a problem if it doesn't affect equality comparisons.


Alex
 
A

Alex Martelli

Jason Lai said:
Lists, tuples, dicts, strings, and a few other things are odd in that
they're hashed/compared by their contents rather than ID. So you can
have two different objects that are equivalent with regards to hashing
or comparing.

Two _distinct_ objects that are not different, actually (this also
applies to numbers -- their values get compared, not their identities).
I think you have the right ideas, just trying to perfect your choice of
words here -- distinct vs different. I think the best usage is:

"A is different from B" -> "A != B"
"A is distinct from B" -> "A is not B"
I think that's how it works, anyway :p

Pretty much, yes.


Alex
 
M

Maurice LING

Alex said:
Any object x is hashable if type(x) does not expose __eq__ nor __cmp__.
In that case, the meaning of x==y for that object is 'x is y', that is,
id(x)==id(y), so having hash(x) return id(x) is perfectly functional.
Mutation is not a problem if it doesn't affect equality comparisons.


Alex

The idea that I get from reading this thread is that objects that can be
type compared (comparing the contents) are not hashable, and they are
list, strings, tuples and dictionary. Is there any others that fall into
this category? Is there any way to make them hashable?

Hashable objects, on the other hand, are hashed based on say, the
pointer address pointing the object or an identifier in the Python VM
symbol table or something. It's like to say that when you hash a human,
you get the name and not the physical dimensions of the person. The
person can grow fat or slim down, but the name is still the same.

Maurice
 
J

Jason Lai

Maurice said:
The idea that I get from reading this thread is that objects that can be
type compared (comparing the contents) are not hashable, and they are
list, strings, tuples and dictionary. Is there any others that fall into
this category? Is there any way to make them hashable?

Well, strings and tuples are immutable, so they provide a hash function
(since it's safe to hash them by contents; the contents are pointers
that don't change.) Anything that provides a hash function can be
hashed. You could theoretically create a new list type that works
exactly like a normal list, but hashes based on ID.
Hashable objects, on the other hand, are hashed based on say, the
pointer address pointing the object or an identifier in the Python VM
symbol table or something. It's like to say that when you hash a human,
you get the name and not the physical dimensions of the person. The
person can grow fat or slim down, but the name is still the same.

Maurice

Heh, like giving every person a unique govt-assigned ID number. You
could have two people who are exactly alike -- cloned atom-by-atom, or
something -- but they wouldn't have the same ID. And the term "hashing"
originates from chopping things up, so hashing humans wouldn't be a good
idea :p

- Jason Lai
 
S

Sam Holden

The idea that I get from reading this thread is that objects that can be
type compared (comparing the contents) are not hashable, and they are
list, strings, tuples and dictionary. Is there any others that fall into
this category? Is there any way to make them hashable?

Define the __hash__ method for the class.

Strings for example, are hashable even though they comparisons are done
with respect to the contents and not the id. Tuples are also:

; python
Python 2.3.4 (#2, Aug 18 2004, 13:18:19)
[GCC 3.3.4 (Debian 1:3.3.4-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.;
 
M

Maurice LING

is there actually a practical reason to hash modules? can I call a
module using the hash key?

maurice
 
A

Alex Martelli

Maurice LING said:
The idea that I get from reading this thread is that objects that can be
type compared (comparing the contents) are not hashable, and they are

Never said that! I said the reverse: if objects are compared by id
they're also hashable (in the same way). "All cats are mammals" does
not imply "all mammals are cats". Objects can perfectly well be
hashable, AND compared by contents at the same time -- that's where
immutability (of those contents which affect comparison and thus
hashing) is a practical necessity.

"Practical", not theoretical: "def __hash__(self): return 23" will in
fact ensure correct semantics. Unfortunately, it will do so at the
expense of intolerable performance hits any time a number of objects of
this type are used as dictionary keys... if you know anything about
hashing you can see why this can't fail to be so.
list, strings, tuples and dictionary. Is there any others that fall into
this category? Is there any way to make them hashable?

strings are hashable, and so are tuples if all their items are hashable.
That is because they define proper __hash__ methods (or rather the C API
equivalent, of course) that cooperate properly with their __cmp__
methods, basically ensuring that x==y implies hash(x)==hash(y). But
that's hard to ensure, together with the fact that hash(x) must always
give the same result for a given x, for general mutable containers such
as lists and dicts.

Hashable objects, on the other hand, are hashed based on say, the
pointer address pointing the object or an identifier in the Python VM
symbol table or something. It's like to say that when you hash a human,
you get the name and not the physical dimensions of the person. The
person can grow fat or slim down, but the name is still the same.

As long as your equality-comparison only relies on immutable
characteristics you're fine. Unfortunately in most countries it IS
legal to change name at least in some circumstances (e.g. it used to
happen almost automatically to a woman when she married, in many
countries, some time ago...) so this wouldn't work in this case;-).


Alex
 
A

Alex Martelli

Jason Lai said:
Well, strings and tuples are immutable, so they provide a hash function
(since it's safe to hash them by contents; the contents are pointers
that don't change.) Anything that provides a hash function can be

Yes, a tuple's "pointers" don't change, but they may refer to objects
that do, and in this case the tuple need not be hashable. E.g.:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: dict objects are unhashable

Since the tuple has (as its only item) a dict, the tuple itself isn't
hashable -- the error message indicates the type of the item that fails
to be hashable.
hashed. You could theoretically create a new list type that works
exactly like a normal list, but hashes based on ID.

No, if a==b and hash(a)!=hash(b) all hell would break loose. Don't go
there.


Alex
 
A

Alex Martelli

Maurice LING said:
is there actually a practical reason to hash modules? can I call a
module using the hash key?

You cannot call a module: a module does not have a __call__ method.
This has nothing to do with hashing.

A practical reason to hash modules would be to associate to each module
some value or group of values -- without sticking those values in the
module itself -- or keep track of a set of modules having some special
characteristic whereby you want to "logically group them together" as
the set of keys into a certain dict (or in 2.4 the elements of a certain
set, since set is now a builtin type).

Consider for example a module that starts:

import foo, fee, fie, fo, fum, bar, baz, bat
yet_untested_modules = dict.fromkeys([ fee, bar, baz ])

later on you might have code like

if somemodule in yet_untested_modules:
somemodule.perform_all_tests()
del yet_untested_modules[somemodule]

for "just in time testing" of a certain set of modules. OK, weird-ish,
but it's what I could come up in 20 seconds;-). To have modules as dict
keys or set members they must be hashable, and since there's no reason
to make them unhashable, why not?


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,204
Messages
2,571,065
Members
47,672
Latest member
svaraho

Latest Threads

Top