Case-insensitive dict, non-destructive, fast, anyone?

V

Ville Vainio

I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).

d = CiDict([('Hi', 12),('hoho',13)])
d['hi']
12

['Hi','hoho']

Note that 'Hi' preserved the case. I imagine that 'Hi' and 'hi' would
need to share the same hash value in order for the lookup to be fast.

Anyone have a an implementation that I could use? Quick googling only
produced implementations that coerce all keys to lowercase.
 
D

Daniel Dittmar

Ville said:
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).

Store the original key together with the value and use a lowercase key
for lookup.

only a sketch:

class MyDict:
def __init__ (self):
self.inner = {}

def __setitem__ (self, key, value):
self.inner [key.lower ()] = (key, value]

def __getitem__ (self, key):
realkey, realvalue = self.inner [self]
return realvalue

def get (self, key, default = None):
try:
return self [key]
except KeyError:
return default
# or: return self.inner.get (key.lower (), (None, default)) [1]

def keys (self):
return [realkey for realkey, realval in self.inner.values ()]

def values (self):
return [realval for realkey, realval in self.inner.values )]

def items ():
return self.inner.values ()

# iterators are left as an exercise
 
V

Ville Vainio

Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the "simplest thing that could possibly work", though.
 
D

Daniel Dittmar

Ville said:
Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

You could write a string wrapper that changes comparison and hashing.
I'm not sure that this (+ 1 (Python object + instance dictionary)) would
use less memory than the other proposal (+ 1 Tuple + 1 String).

Daniel
 
R

Ron_Adam

Daniel> Store the original key together with the value and use a
Daniel> lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the "simplest thing that could possibly work", though.

Try access the keys indirectly though another dictionary. That way
you don't have to change the original.

Lkeys = {}
for k dict.keys():
Lkeys[ k.lower] = dict[k]

Then use:

value = dict[ Lkeys[ key.lower() ] ]

To get your value from the original dictionary.

Watch out for duplicate keys in differing case in the original dict.


Ron
 
R

Raymond Hettinger

[Ville Vainio]
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).
def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()

d = {}
d[S('ThE')] = 'quick'
d[S('the')] 'quick'
d
{'ThE': 'quick'}


Raymond Hettinger
 
B

Bengt Richter

[Ville Vainio]
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).
def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()

d = {}
d[S('ThE')] = 'quick'
d[S('the')] 'quick'
d
{'ThE': 'quick'}
Building on your S to sneak in a generalized idea ;-)
... def __setitem__(self, key, value):
... dict.__setitem__(self, S(key), value)
... def __getitem__(self, key):
... return dict.__getitem__(self, S(key))
...
>>> d = idict()
>>> d['ThE'] = 'quick'
>>> d['the'] 'quick'
>>> d
{'ThE': 'quick'}

Ok, the idea:
I wonder if a dict with a general override hook for hashing all keys would be useful.
E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__()
but that you could override. Seems like this could potentially be more efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)

Regards,
Bengt Richter
 
R

Raymond Hettinger

[Bengt Richter]
I wonder if a dict with a general override hook for hashing all keys would be useful.
E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__()
but that you could override. Seems like this could potentially be more efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)

There has also been a discussion of adding a seqdict that maintains a keys in
insertion order. Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building). Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas. Essentially, the thought is to put all the bells and
whistles in one place. If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1 # case insensitive dictionary
assert t['Abc'] == 1

t = TrickyDict()
t.setdefaultvalue(0)
t['x'] += 1 # very bag like
assert t['x'] == 1

t = TrickyDict()
t.setdefaultfunction(list)
t['x'] += ['first'] # rather like: t.setdefault('x',
[]).append('first')
t['x'] += ['second']
assert t == {'x': ['first', 'second']}

t = TrickyDict()
t.sortedkeys = True
t['x'] = 1
t['y'] = 2
assert t.keys() == ['x', 'y'] # order is guaranteed

This universal dictionary type could also incorporate some methods for other
recurrent themes such as a inverting a dictionary:

def invdict(self)
return self: dict((v,k) for k,v in self.iteritems()).

For the performance minded, there could also be a control for dictionary
speed/space efficiency:

t = TrickyDict()
t.maxkeydensity = 40 # resize whenever the dictionary is more than 40%
full

Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.


Raymond
 
B

Bengt Richter

[Bengt Richter]
I wonder if a dict with a general override hook for hashing all keys would be useful.
E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__()
but that you could override. Seems like this could potentially be more efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)

There has also been a discussion of adding a seqdict that maintains a keys in
insertion order. Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building). Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas. Essentially, the thought is to put all the bells and
whistles in one place. If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1 # case insensitive dictionary
assert t['Abc'] == 1
assert t.keys() == ['abC'] # actual keys unchanged (but transformed for hash and cmp)
[...]
Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.
You think as much as 10% ?

Regards,
Bengt Richter
 
R

Raymond Hettinger

Taken together, these six attributes/methods could cover many wished for
You think as much as 10% ?

Rounded up from 9.6 ;-)

More important than the percentage is the clarity of the resulting code and the
avoidance of continous reinvention of workarounds.

Separating tool features into a basic and an advanced version is common solution
to managing option/customization complexity.


Raymond
 
T

Terry Reedy

Raymond Hettinger said:
More important than the percentage is the clarity of the resulting code
and the
avoidance of continous reinvention of workarounds.

Separating tool features into a basic and an advanced version is common
solution
to managing option/customization complexity.

A super bells & whistles dict would certainly pre-answer a lot of queries
and save much newsgroup/list response time.

tjr
 
V

Ville Vainio

Bengt> I wonder if a dict with a general override hook for hashing
Bengt> all keys would be useful. E.g., a dict.__keyhash__ that
Bengt> would take key as arg and default as now returning
Bengt> key.__hash__() but that you could override. Seems like this

There would need to be an override hook for key comparison as well (I
suppose it always uses == operation now?). But yes, I think it would
be *much* more useful than any 'keytransform' feature - is there any
use for 'keytransform' feature apart from the uses that would be
better covered by hash/comparison hooks?

It would be lovely to have something like this in the stdlib (or
anywhere, for that matter). Think about the use cases for hashing via
by os.path.normcase, str.lower...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,996
Messages
2,570,237
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top