comparing objects

M

Mark Abramov

Benoit said:
I searched a bit and concluded this:

Array methods using comparison
- with #hash and #eql?
&, |, uniq(!), -
- with #==
include?, (r)assoc, count, delete, (r,find_)index
(please say me if I forgot one)

I think Array methods should never have to look at #hash and #eql?
methods.
I suppose this is done for performance.

I think this should change, because:
- it violates POLS
- it can make unexpected behavior because you defined #hash and #eql? ,
for
objects which should not need that (when you manage objects in an Array,
you
do not expect to need to think about Hash's keys).
- it is not consistent with other Array's methods

For me it doesn't work anyway.
Unsure how to paste code here, you could see an example here:
http://pastie.org/999353
I am still like "WTF?"

$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin9.8.0]
 
M

Mark Abramov

Mark said:

Sorry, guys, didn't notice how I used eql instead of eql?
Btw, without #hash it won't work anyways which I consider *weird* at the
very least.
 
M

Marcin Wolski

Rein said:
Mark said:

Sorry, guys, didn't notice how I used eql instead of eql?
Btw, without #hash it won't work anyways which I consider *weird* at the
very least.

#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.

If two objects are set to be eql?, their hash methods must also return
the same value. More details in The Ruby Programming Language book.

Thus, when you redefine eql?, the hash methods also should be redefined.
 
M

Mark Abramov

Marcin said:
Rein said:
Mark Abramov wrote:
[tl;dr]

Sorry, guys, didn't notice how I used eql instead of eql?
Btw, without #hash it won't work anyways which I consider *weird* at the
very least.

#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.

If two objects are set to be eql?, their hash methods must also return
the same value. More details in The Ruby Programming Language book.

Thus, when you redefine eql?, the hash methods also should be redefined.

http://ruby-doc.org/core-1.8.7/classes/Object.html#M000617
Well, it doesn't say much in core api :(
 
R

Robert Klemme

Even if
Hmm? Would you care to show an example where overloading those methods
(#eql? and #hash) is needed to ensure proper behavior? I am willing to
learn. But I am not willing to accept this statement as such.
Cheers
R.

You have been presented with one in this very thread. The OP wants
objects of his class to have the correct semantics for Array#& and
Hash#[], etc. The correct answer is to implement #hash and #eql?, just
as implementing <=> provides objects of his class with the correct
semantics for Array#sort.

See also
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html

Cheers

robert
 
R

Robert Dober

On 2010-06-10 06:59:40 -0700, Robert Dober said:
You have been presented with one in this very thread. The OP wants objects
of his class to have the correct semantics for Array#& and Hash#[], etc. The
correct answer is to implement #hash and #eql?, just as implementing <=>
provides objects of his class with the correct semantics for Array#sort.
I guess you really do not know what I was talking about? Or do you
just repeat the same stuff over and over again in order to convince
me?
overwriting #hash and #eql? breaks Hash! Why the hack should OP's
usecase justify this?
And it does not answer my question. Where would I like that Hash
behaves accordingly to the redefined #eql? and #hash. And BTW I asked
Wilson, did I not?
Cheers
Robert
 
R

Robert Dober

That's not true, I think.
Judge for yourself

require "forwardable"

def count klass
ObjectSpace.each_object( klass ).to_a.size
end
class N
extend Forwardable
attr_reader :n
def_delegators :n, :hash
def eql? otha
n =3D=3D otha.n
end
private
def initialize n
@n =3D n
end
end # class N


h =3D { N.new( 42 ) =3D> true }
h[ N.new( 42 ) ] =3D 42
p h
GC.start
p count(N)

Cheers
R.
 
R

Robert Klemme

2010/6/11 Shot (Piotr Szotkowski) said:
Rein Henrichs:
#hash makes sense for Hash#[] and etc. #eql? makes more
sense for Array#&. I too find it odd that both are necessary.

Both are necessary because #eql? says whether two objects are surely
the same, while #hash says whether they=92re surely different =96 which,
perhaps counterintuitively, is not the same problem.

The difference is that in many, many cases it=92s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether they=92re surely the same (#eql? can be quite slow).

This is not necessarily true. Any reasonable implementation of #eql?
will bail out as soon as it sees a difference. On the contrary, you
always need to look at the complete state of an instance to calculate
#hash. I can easily construct an example where #eql? beats #hash:

14:40:54 Temp$ ruby19 eql-test.rb
same
0.110000 0.000000 0.110000 ( 0.098000)
0.093000 0.000000 0.093000 ( 0.099000)
0.157000 0.000000 0.157000 ( 0.151000)
different early
0.093000 0.000000 0.093000 ( 0.101000)
0.094000 0.000000 0.094000 ( 0.096000)
0.000000 0.000000 0.000000 ( 0.000000)
different late
0.109000 0.000000 0.109000 ( 0.105000)
0.094000 0.000000 0.094000 ( 0.098000)
0.156000 0.000000 0.156000 ( 0.149000)
14:40:56 Temp$ cat eql-test.rb
require 'benchmark'
a1 =3D Array.new 1_000_000
a2 =3D Array.new 1_000_000
puts "same"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a1[0] =3D 1
a2[0] =3D 2
puts "different early"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a2[0] =3D a1[0]
a2[999_999] =3D 1
puts "different late"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
14:40:58 Temp$

Notice also how #eql? with equal arrays is not much slower than #hash.
The main difference betwen #eql? and #hash is that #hash can return the
same value for objects that are not #eql? (but if two objects are #eql?
then #hash must return the same value).

An untested, and definitely not optimal
(but hopefully simple) example follows. :)

Imagine that you want to implement a new immutable string class, one
which caches the string length (for performance reasons). Imagine also
that the vast majority of such strings you use are of different lenghts,
and that you want to use them as Hash keys.


class ImmutableString

=A0def initialize string
=A0 =A0@string =3D string.dup.freeze
=A0 =A0@length =3D string.length
=A0end

end



Given the above assumptions, it might make sense for #hash to
return the @length, while #eql? makes the =91proper=92 comparison:



class ImmutableString

=A0def hash
=A0 =A0@length

Bad hash implementation. Why don't you use String#hash?
=A0end

=A0alias eql? =3D=3D

end



This way in the vast majority of cases, when your ImmutableStrings will
be considered for Hash keys, the check whether a given key exists will
be very quick; only when two objects #hash to the same value (i.e.,
when they=92re not surely different) the #eql? is called to tell whether
they=92re surely the same.

If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection. That's why I suggested to use a Struct for
key fields (which has proper #hash and #eql? built in).

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
R

Robert Klemme

I
You define #eql? and #hash for your convenience. So good, so bad. My
question simply was: Show my why *not* redefining #hash and #eql? will
cause problems, because that was Wilson's statement. I am still
waiting :(.

The advice to implement #eql? and #hash really only makes sense if
equivalence can reasonably be defined for a class and if instances of
that class should be used as Hash keys or in Set. If not at least
equivalence can be defined other than via identity (which is the
default) then it is perfectly reasonable to not override both methods
and go with the default implementation.

Kind regards

robert
 
R

Robert Dober

The advice to implement #eql? and #hash really only makes sense if
equivalence can reasonably be defined for a class and if instances of tha= t
class should be used as Hash keys or in Set. =A0If not at least equivalen= ce
can be defined other than via identity (which is the default) then it is
perfectly reasonable to not override both methods and go with the default
implementation.
But that was *exactly* my point.

OP wanted to use Array#&, and Array#&, for a reason not too clear to
me, uses Object#eql? instead of Object#=3D=3D I did discourage the
overloading of Object#eql? and Object#hash for *that purpose*.

If you want to change Hash then it is the right thing to do.
Now I might strongly disagree about if one should do that, but that is
rather OT and I would never have made such strong statements about
that issue.
However the technique you suggest is not to be put into non expert
hands as I tried to show with the memory leaking code above.

Cheers
Robert
Kind regards

=A0 =A0 =A0 =A0robert



--=20
The best way to predict the future is to invent it.
-- Alan Kay
 
C

Caleb Clausen

OP wanted to use Array#&, and Array#&, for a reason not too clear to
me, uses Object#eql? instead of Object#== I did discourage the
overloading of Object#eql? and Object#hash for *that purpose*.

Array#& uses eql? instead of == because internally, it works something
like this:

class Array
def &(other)
h1={}
other.each{|x| h1[x]=true}
select{|x| h1[x] }
end
end

In other words, it creates a (hash) index to get a speedup. (From
O(M*N) to O(M+N).)
 
R

Robert Klemme

But that was *exactly* my point.

I don't think we disagree, nor do I argue with you. I just posted blog
links as illustration to Rein's point about how to implement those methods.

Kind regards

robert
 
R

Robert Dober

I don't think we disagree, nor do I argue with you. =A0I just posted blog
links as illustration to Rein's point about how to implement those method=
s.

Forgive my confusion then.
Cheers
Robert

--=20
The best way to predict the future is to invent it.
-- Alan Kay
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,151
Messages
2,570,854
Members
47,394
Latest member
Olekdev

Latest Threads

Top