Subclassing Hash to enforce value uniqueness ala key uniqueness.

A

Adam Gardner

First of all, hello to everyone. This is my first message to this list.

I find myself needing something very much like a Hash, but with unique
one-to-one mapping in both directions, both key-to-value, and
value-to-key. Now, I've been looking around a bit on Google, then
Rubyforge, the RAA, and the archives of this list, and I'm a bit
surprised I wasn't able to find a prior implimentation of this. Time to
impliment my own, right?

Well, before I got too far into the process, I though I would ask for
people's thoughts here. First off, here is the effect I would like to
see. I'll call my subclass OneToOne for now, though I'm open to better
names.
oto = OneToOne[:a=>1, :b=>2, :c=>3] => {:a=>1, :b=>2, :c=>3} # or on 1.8.x, {:b=>2, :c=>3, :a=>1}
oto[:d] = 1 => 1
oto => {:b=>2,:c=>3,:d=>1}
OneToOne[:a=>1,:b=>1,:c=>1]
=> {:c=>1}
...
etc

Here is some code which begins to implement such a class:

class OneToOne < Hash

def self.[](*args)
super(*args).invert.invert
end

def initialize(*args)
super(*args)
self.fix
end

def []=(key,val)
self.delete_if {|k,v| v == val}
super(key,val)
end

def store(key,val)
self.delete_if {|k,v| v == val}
super(key,val)
end

def key(val) #this is just a nicety to arrange for code to run in 1.9
and 1.8
if Hash.method_defined?:)key)
super(val)
else
self.index(val)
end
end

def fix
self.replace(self.invert.invert)
end
end

This code works to a certain degree, but I see two problems going
forward. First of all, Hash is a built-in in ruby which is written (as
far as I am aware) entirely in C. As such, there is no 'core' method
which can be overridden to affect change in all the methods which build
on it. At least, implementing []= did not give me store, and
implimenting store did not give me []=. Neither gives me merge. How many
other methods will I therefore have to override to ensure value
uniqueness? It seems to me, every method which could modify the contents
of the hash in some way (other than simply removing entries). Even then,
could I guarantee no two keys would have the same value, with the same
level of certainty I have that no two values have the same key?

The second problem is efficiency: It seems to me that this could
probably be done much more efficiently, especially if implemented in C.
It doesn't affect my much for what I need it for right now, but if I
want to reuse it later (or if someone else wants to use it), it could
matter a great deal.

(Incidentally: searching for information about this has been somewhat
hampered by the fact that the phrase 'hash value' is very ambiguous
between the a value contained in a hash and paired with a key, or value
returned by a hashing algorithm such as Object#hash or MD5. It doesn't
help that most of other terms I can think of to search for, such as
bidirectional or unique or one-to-one, are more common when talking
about the latter than the former. Now I understand why several other
languages choose the term 'Dictionary'.)

Anyway, if someone has either a) a prior implementation of this, or b)
some suggestions on improving my implimentation, I'd love to hear it.

Thanks for your time,
- Adam Gardner
 
T

Trans

Anyway, if someone has either a) a prior implementation of this, or b)
some suggestions on improving my implimentation, I'd love to hear it.

You may be able to save yourself some time with a little meta
programming. Eg.

class HashSet < Hash

Hash.instance_methods(false).each do |m|
define_method(m, *a, &b)
r =3D super
fix
r
end
end

def fix
self.replace(self.invert.invert)
end

end

You'll want to improve upon that code of course, probably limit it to
only certain methods. But you get the idea.

T.
 
S

Sebastian Hungerecker

Adam said:
The second problem is efficiency: It seems to me that this could
probably be done much more efficiently, especially if implemented in C.

Every time you add a value you iterate over all the other values to check
whether the value is already there. This makes adding an element O(n). Having
adding to a datastructure be an O(n) operation is usually a bad idea. Here's
how I'd probably do it (untested):

class OneToOne
def initialize()
@key_value = {}
@value_key = {}
end

def [](k)
@key_value[k]
end

def []=(k,v)
if @key_value.has_key?(k)
@value_key.delete(@key_value[k])
end
if @value_key.has_key(?v)
@key_value.delete(@value_key[v])
end
@key_value[k] = v
@value_key[v] = k
end

def to_hash
@key_value
end
def invert
@value_key
end
...
end


HTH,
Sebastian
 
R

Robert Klemme

Every time you add a value you iterate over all the other values to check
whether the value is already there. This makes adding an element O(n). Having
adding to a datastructure be an O(n) operation is usually a bad idea. Here's
how I'd probably do it (untested):

<snip>good example</snip>

Right, my main point would be: why subclass? You inherit a lot of
methods that you need to watch out for. Rather, as Basti showed,
implement a new class with all the necessary methods.

Kind regards

robert


PS: Sorry for the nicknaming. :)
 
J

James Coglan

[Note: parts of this message were removed to make it a legal post.]
Every time you add a value you iterate over all the other values to check
whether the value is already there. This makes adding an element O(n).
Having
adding to a datastructure be an O(n) operation is usually a bad idea.
Here's
how I'd probably do it (untested):

# example (elided)



This seems like a good idea, I'd have suggested the same. This is exactly
what Set in the standard library does to ensure uniqueness, it puts the
members as keys in a hash and lets Ruby sort out the hashtable as
efficiently as possible. So a pair of mutually inverted hash tables is
probably the way to go.
 
S

Sebastian Hungerecker

Sebastian said:
=A0 def to_hash
=A0 =A0 @key_value
=A0 end

This should return @key_value.dup=20
=A0 def invert
=A0 =A0 @value_key
=A0 end

And this should either return @value_key.dup or another instance of OneToOn=
e=20
where the both hashes are switched.

HTH,
Sebastian
=2D-=20
Jabber: (e-mail address removed)
ICQ: 205544826
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top