You answered it yourself later. For a mapping service, hash is just one
way to do things. What you need is for each item in the collection, a
unique key.
And does the mapping get access to that unique key (the key's key)? It
can't keep a mapping of object:key, because if it could do that, it
wouldn't need the key, it could just keep object
ayload.
Since the mapping can't know the key's key, it has to ask the key, and
that's what the __hash__ method does.
How you go from the key to the value is not something a programmer needs
to know.
You are correct. All you need to know is that in Python, you can't use
lists and sets as keys directly.
You only need to know about the details of the way dicts look up keys if
you are writing your own class, and you aren't happy with Python's
default hash for instance objects. It isn't a common task.
Your mind is set on thinking on hash alone and hence you don't see
beyond it.
No, these issues exist regardless of the implementation of the mapping.
Whether you use a hash table or a binary tree of some sort, or a linear
linked list, or a database, or folders in a filing cabinet.
The behaviour of mutable objects in a mapping is always problematic,
regardless of the mapping implementation.
Oh yes. If the keys are all integers (or any set of items that can be
ordered), why not an avl. It has guaranteed O(log N) while a hash in
worst case is O(N).
But both the best and average cases are O(1), which beats AVL trees by a
lot. Since dicts are used for Python's internals, they are HIGHLY
optimized and VERY fast. Their O(1) will beat the O(log N) of AVL trees
easily.
Hash tables can also use keys even if they can't be ordered: {1+2j: None}
Why you want to tie yourself to the drawbacks of one
datastructure? Understand your goal is not to provide a hash; but to
provide a mapping service.
No, the goal of a good language designer is to provide the fastest, most
lightweight, simplest, most flexible mapping as a built-in. Any old slow,
heavyweight, complicated, inflexible mapping will not do the job. That
goal is best provided by a hash table.
If people want additional mappings as well, they can be added as
libraries. If those libraries are very useful, they can be added to the
standard library. If they are HUGELY useful, they will become built-ins.
AVL trees are not as flexible or fast as hash tables, and even if they
were, you would *still* need to resolve the difficulties that occur if
the keys mutate.
Yes, if you keep thinking hash is the only tool you got.
Fine, let me re-word it in terms of an AVL.
Suppose you have two lists in an AVL:
L1 = [1, 2, 3]
L2 = [4, 5, 6]
M = avl((L1, True), (L2, False))
The tree M (for mapping) has L1 at the top of the tree, and L2 as the
right node.
But now, EVERY time ANY mutable object changes, Python has to check
whether it is a key in EVERY avl, and if so, re-built the tree. Otherwise
the tree can become corrupt because the AVL invariants are no longer true.
(Consider what happens if we say L1[0] = 999. Now L1 > L2. If you don't
reorder the avl, M[L2] cannot be reached except by an exhaustive search
of every node. That means it is no longer an AVL tree, just an inordered
tree. Might as well save memory and use a linear linked list.)
Needless to say, this will slow down Python just a tad.
Look, you can go back to the archives of 1993 when this was first
discussed. Nothing has changed since then. Mutable keys are still
problematic, regardless of the implementation, and the simplest solution
is to simply prohibit them.
http://www.python.org/search/hypermail/python-1993/0044.html
If you want to use a mutable object as a key, by object identity rather
than value, then the easy way is to wrap it in an instance:
class Wrapper: # don't even need new-style classes
pass # or even an __init__
key = Wrapper()
key.payload = [1, 2, 3]
D = {key: "value"}
But of course you can't look up the dict by value, only by identity. But
that's what you wanted.
Another way is to use this class:
class HashableList(list):
def __hash__(self):
return hash(tuple(self))
Have fun, and remember us when you're debugging your code, because you'll
be doing a lot of it.