2010/6/11 Shot (Piotr Szotkowski) said:
Rein Henrichs:
#hash makes sense for Hash#[] and etc. #eql? makes more
sense for Array#&. I too find it odd that both are necessary.
Both are necessary because #eql? says whether two objects are surely
the same, while #hash says whether they=92re surely different =96 which,
perhaps counterintuitively, is not the same problem.
The difference is that in many, many cases it=92s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether they=92re surely the same (#eql? can be quite slow).
This is not necessarily true. Any reasonable implementation of #eql?
will bail out as soon as it sees a difference. On the contrary, you
always need to look at the complete state of an instance to calculate
#hash. I can easily construct an example where #eql? beats #hash:
14:40:54 Temp$ ruby19 eql-test.rb
same
0.110000 0.000000 0.110000 ( 0.098000)
0.093000 0.000000 0.093000 ( 0.099000)
0.157000 0.000000 0.157000 ( 0.151000)
different early
0.093000 0.000000 0.093000 ( 0.101000)
0.094000 0.000000 0.094000 ( 0.096000)
0.000000 0.000000 0.000000 ( 0.000000)
different late
0.109000 0.000000 0.109000 ( 0.105000)
0.094000 0.000000 0.094000 ( 0.098000)
0.156000 0.000000 0.156000 ( 0.149000)
14:40:56 Temp$ cat eql-test.rb
require 'benchmark'
a1 =3D Array.new 1_000_000
a2 =3D Array.new 1_000_000
puts "same"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a1[0] =3D 1
a2[0] =3D 2
puts "different early"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a2[0] =3D a1[0]
a2[999_999] =3D 1
puts "different late"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
14:40:58 Temp$
Notice also how #eql? with equal arrays is not much slower than #hash.
The main difference betwen #eql? and #hash is that #hash can return the
same value for objects that are not #eql? (but if two objects are #eql?
then #hash must return the same value).
An untested, and definitely not optimal
(but hopefully simple) example follows.
Imagine that you want to implement a new immutable string class, one
which caches the string length (for performance reasons). Imagine also
that the vast majority of such strings you use are of different lenghts,
and that you want to use them as Hash keys.
class ImmutableString
=A0def initialize string
=A0 =A0@string =3D string.dup.freeze
=A0 =A0@length =3D string.length
=A0end
end
Given the above assumptions, it might make sense for #hash to
return the @length, while #eql? makes the =91proper=92 comparison:
class ImmutableString
=A0def hash
=A0 =A0@length
Bad hash implementation. Why don't you use String#hash?
=A0end
=A0alias eql? =3D=3D
end
This way in the vast majority of cases, when your ImmutableStrings will
be considered for Hash keys, the check whether a given key exists will
be very quick; only when two objects #hash to the same value (i.e.,
when they=92re not surely different) the #eql? is called to tell whether
they=92re surely the same.
If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection. That's why I suggested to use a Struct for
key fields (which has proper #hash and #eql? built in).
Kind regards
robert
--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/