Whats so different about a Hash?

A

Andrew Walrond

Consider:

irb(main):001:0> a=1 => 1
irb(main):002:0> b=1 => 1
irb(main):003:0> c={} => {}
irb(main):004:0> c[a]=1 => 1
irb(main):005:0> c=1 => 1
irb(main):006:0> c.inspect => "{1=>1}"

and

irb(main):007:0> a="K" => "K"
irb(main):008:0> b="K" => "K"
irb(main):009:0> c={} => {}
irb(main):010:0> c[a]=1 => 1
irb(main):011:0> c=1 => 1
irb(main):012:0> c.inspect => "{\"K\"=>1}"

and

irb(main):013:0> a=[1,2] => [1, 2]
irb(main):014:0> b=[1,2] => [1, 2]
irb(main):015:0> c={}=> {}
irb(main):016:0> c[a]=1 => 1
irb(main):017:0> c=1 => 1
irb(main):018:0> c.inspect => "{[1, 2]=>1}"

But the hash behaves differently:

irb(main):019:0> a={1=>2} => {1=>2}
irb(main):020:0> b={1=>2} => {1=>2}
irb(main):021:0> c={} => {}
irb(main):022:0> c[a]=1 => 1
irb(main):023:0> c=1 => 1
irb(main):024:0> c.inspect => "{{1=>2}=>1, {1=>2}=>1}"

Why?

I'm going for another coffee ;)

Andrew Walrond
 
B

Brian Schröder

Consider:

irb(main):001:0> a=1 => 1
irb(main):002:0> b=1 => 1
irb(main):003:0> c={} => {}
irb(main):004:0> c[a]=1 => 1
irb(main):005:0> c=1 => 1
irb(main):006:0> c.inspect => "{1=>1}"

and

irb(main):007:0> a="K" => "K"
irb(main):008:0> b="K" => "K"
irb(main):009:0> c={} => {}
irb(main):010:0> c[a]=1 => 1
irb(main):011:0> c=1 => 1
irb(main):012:0> c.inspect => "{\"K\"=>1}"

and

irb(main):013:0> a=[1,2] => [1, 2]
irb(main):014:0> b=[1,2] => [1, 2]
irb(main):015:0> c={}=> {}
irb(main):016:0> c[a]=1 => 1
irb(main):017:0> c=1 => 1
irb(main):018:0> c.inspect => "{[1, 2]=>1}"

But the hash behaves differently:

irb(main):019:0> a={1=>2} => {1=>2}
irb(main):020:0> b={1=>2} => {1=>2}
irb(main):021:0> c={} => {}
irb(main):022:0> c[a]=1 => 1
irb(main):023:0> c=1 => 1
irb(main):024:0> c.inspect => "{{1=>2}=>1, {1=>2}=>1}"

Why?

I'm going for another coffee ;)

Andrew Walrond


I think the difference is:

irb(main):005:0> [1,2].hash === [1,2].hash
=> true
irb(main):006:0> {1=>2}.hash === {1=>2}.hash
=> false

regards,

Brian
 
A

Andrew Walrond

Hi Brian,

I think the difference is:

irb(main):005:0> [1,2].hash === [1,2].hash
=> true
irb(main):006:0> {1=>2}.hash === {1=>2}.hash
=> false

ri Object.hash says

------------------------------------------------------------ Object#hash
obj.hash => fixnum
------------------------------------------------------------------------
Generates a +Fixnum+ hash value for this object. This function must
have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
hash value is used by class +Hash+. Any hash value that exceeds the
capacity of a +Fixnum+ will be truncated before being used.

So if a.eql?(b), then a.hash MUST EQUAL b.hash.

Lets see:

irb(main):038:0> a={1=>2}
=> {1=>2}
irb(main):039:0> b={1=>2}
=> {1=>2}
irb(main):040:0> a.eql?(b)
=> true
irb(main):041:0> a.hash == b.hash
=> false

So what gives?

irb(main):034:0> {1=>2}.hash => -605137920
irb(main):035:0> {1=>2}.hash => -605147880
irb(main):036:0> {1=>2}.hash => -604944118

Confused...

Andrew
 
F

Florian Groß

Andrew said:
So if a.eql?(b), then a.hash MUST EQUAL b.hash.

Lets see:

irb(main):038:0> a={1=>2}
=> {1=>2}
irb(main):039:0> b={1=>2}
=> {1=>2}
irb(main):040:0> a.eql?(b)
=> true
irb(main):041:0> a.hash == b.hash
=> false

So what gives?

irb(main):034:0> {1=>2}.hash => -605137920
irb(main):035:0> {1=>2}.hash => -605147880
irb(main):036:0> {1=>2}.hash => -604944118

Odd, I thought Hash#hash had already been added to Ruby itself recently.

I guess you can workaround it like this:

class Hash
def hash()
[self.to_a, self.class].hash
end
end

Though I am not sure if that is the correct semantics. (It could be the
case that Hash#eql?() also compares default values.)
 
A

Adriano Ferreira

So if a.eql?(b), then a.hash MUST EQUAL b.hash.

Only if the object implementation (in case, Hash) agreeds on the
general contract for objects that try to be useful as hash keys.
irb(main):040:0> a.eql?(b)
=> true

When I do this, I get:

irb(main):003:0> {1=>2}.eql?({1=>2})
=> false


For numbers, strings and arrays you will find .hash and .eql? methods
which are consistent, meaning they play fair as hash keys.

This is not the same with Hash objects. Probably for performance
reasons and because in some ocasions one expects hashes to be equal
only if they are the same objects, .hash and .eql? are not defined
such that hashes are equal if they have the same pairs. I think they
actually inherit Object methods such that
hash1.eql?(hash2) is the same as hash1===hash2
and
hash1.hash calls Object.hash which is based on object ID

But you can do that. Try:

class Hash
def hash
return self.to_a.hash
end

def eql?(b)
return self.to_a.eql?(b.to_a)
end
end


a={1=>2}
b={1=>2}
c={}
c[a]=1
c=1
c.inspect
 
A

Andrew Walrond

When I do this, I get:

irb(main):003:0> {1=>2}.eql?({1=>2})
=> false

I've been discussing this on ruby-core aswell. It turns out that this was
'fixed' in then latest stable snapshot.
For numbers, strings and arrays you will find .hash and .eql? methods
which are consistent, meaning they play fair as hash keys.

This is not the same with Hash objects. Probably for performance
reasons and because in some ocasions one expects hashes to be equal
only if they are the same objects, .hash and .eql? are not defined
such that hashes are equal if they have the same pairs. I think they
actually inherit Object methods such that
hash1.eql?(hash2) is the same as hash1===hash2
and
hash1.hash calls Object.hash which is based on object ID

Understood. See my forwarded message...

Andrew Walrond
 
A

Andrew Walrond

I guess you can workaround it like this:

class Hash
def hash()
[self.to_a, self.class].hash
end
end

Though I am not sure if that is the correct semantics. (It could be the
case that Hash#eql?() also compares default values.)

Is Hash#to_a guaranteed to sort the elements? My tests show that it does, but
perhaps .to_a.sort might be safer?


So I propose

class Hash
def hash()
[self.to_s.sort,self.class].hash
end
def eql?(other)
self.hash === other.hash
end
end

to consolidate behaviour across ruby versions. Does that look right to you?
(I'm not worked about the default value)

Andrew
 
F

Florian Groß

Andrew said:
Is Hash#to_a guaranteed to sort the elements? My tests show that it does, but
perhaps .to_a.sort might be safer?

I think that two hashes with the same elements will also have the same
order. This might however not be true anymore in the future. (There were
talks about retaining insert order.)

So calling .sort might indeed be a good idea.
So I propose

class Hash
def hash()
[self.to_s.sort,self.class].hash
end
def eql?(other)
self.hash === other.hash
end
end

Be careful about defining eql?() in terms of hash(). hash values will
and can produce collisions and Ruby will always use eql?() even after
the hash codes match to check if the elements are indeed the same.

If you implement eql?() in terms of hash() you will however get false
positives in case of collisions.
 
A

Andrew Walrond

Andrew said:
Is Hash#to_a guaranteed to sort the elements? My tests show that it does,
but perhaps .to_a.sort might be safer?

I think that two hashes with the same elements will also have the same
order. This might however not be true anymore in the future. (There were
talks about retaining insert order.)

So calling .sort might indeed be a good idea.
So I propose

class Hash
def hash()
[self.to_s.sort,self.class].hash
end
def eql?(other)
self.hash === other.hash
end
end

Be careful about defining eql?() in terms of hash(). hash values will
and can produce collisions and Ruby will always use eql?() even after
the hash codes match to check if the elements are indeed the same.

If you implement eql?() in terms of hash() you will however get false
positives in case of collisions.

What does 1.9 do for Hash#eql? ?

Andrew
 
A

Andrew Walrond

Actually,

self == other

is the obvious choice I suppose. I was momentarily confused by something in
Adriano's earlier reply :)

Andrew
 
A

Adriano Ferreira

What does 1.9 do for Hash#eql? ?

Probably some equivalent to:

class Hash
def eql?(b)
h1 = (self.length >= b.length) ? self : b
h2 = (self.length >= b.length) ? b : self
return h1.all? { | k, v | v.eql?(h2[k]) }
end
end

where setting h1, h2 is a trick for granting there will be no false positives.
 
F

Florian Groß

Adriano said:
What does 1.9 do for Hash#eql? ?

Probably some equivalent to:

class Hash
def eql?(b)
h1 = (self.length >= b.length) ? self : b
h2 = (self.length >= b.length) ? b : self
return h1.all? { | k, v | v.eql?(h2[k]) }
end
end

where setting h1, h2 is a trick for granting there will be no false positives.

Hm, I'm okay with the last part (checking if all pairs are shared), but
I'd replace the h1, h2 stuff with self.size == other.size.
 
A

Adriano Ferreira

Hm, I'm okay with the last part (checking if all pairs are shared), but
I'd replace the h1, h2 stuff with self.size == other.size.

Agreed. That was unnecessarily complicated. That should do it:

class Hash
def eql?(b)
return (self.size==b.size) && self.all? { | k, v | v.eql?(b[k]) }
end

def hash
return self.inject(0) { | h, k, v | h ^ k.hash ^ v.hash }
end
end

The Hash#hash is a suggestion for a hash method which obeys
h1.hash==h2.hash if h1.eql?(h2). None of the methods above rely on
sorting and other expensive operations, even though traversing the
hash and possibly recursion (if there are hashes inside hashes) is
expensive enough.

Regards,
Adriano.
 
A

Andrew Walrond

Adriano said:
What does 1.9 do for Hash#eql? ?

Probably some equivalent to:

class Hash
def eql?(b)
h1 = (self.length >= b.length) ? self : b
h2 = (self.length >= b.length) ? b : self
return h1.all? { | k, v | v.eql?(h2[k]) }
end
end

where setting h1, h2 is a trick for granting there will be no false
positives.

Hm, I'm okay with the last part (checking if all pairs are shared), but
I'd replace the h1, h2 stuff with self.size == other.size.

Whats wrong with just using

cdef eql?(other)
self == other
end

It seems to work. I assume == does something similar anyway?. Am I missing
something?

Andrew
 
A

Adriano Ferreira

Whats wrong with just using
cdef eql?(other)
self == other
end

You're right. Hash#== is already there. But, at least in Ruby 1.8,
you're still missing a Hash#hash consistent with that Hash#eql?

Adriano.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top