Sets, uniqueness not unique.

H

Hugh Sasse

I have been splitting a comma separated values file, and putting
some of the values into an Student class, simply a collection of strings,
so that I can build a database table from them:

require 'set'

class Student
attr_accessor :forename, :surname, :birth_dt,
:picture, :coll_status
def initialize(forename0, surname0, birth_dt0,
picture0, coll_status0)
@forename = forename0
@surname = surname0
@birth_dt = birth_dt0
@picture = picture0
puts "in student.new() picture is #{picture0.inspect}, @picture is #{@picture.inspect} " if $debug
@coll_status = coll_status0
end

def eql?(other)
# if self.forename == "John" and other.forename == "John"
debug = true
# end
res = [:forename, :surname, :birth_dt, :picture, :coll_status].all? do |msg|
print "#{self.send(msg)} == #{(other.send(msg))} gives #{self.send(msg) == (other.send(msg))}" if debug
self.send(msg) == (other.send(msg))
end
return res
end

def to_s
"#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
end
end

And in the body of my program I read the records in from the csv and
add the students if they are new. They tend to be clustered in the
input, hence the last_student test.

class TableMaker
INPUT = "hugh.csv"

ACCEPTED_MODULES = /^\"TECH(100[1-7]|200\d|201[01]|300\d|301[0-2])/

# Read in the database and populate the tables.
def initialize(input=INPUT)

@students = Set.new()
# [...]
open(input, 'r') do |infp|
while record = infp.gets
record.chomp!
puts "record is #{record}"
forename, surname, birth_dt, institution_id, aos_code,
various, other, fields,
picture, coll_status, full_desc = record.split(/\s*\|\s*/)

next unless aos_code =~ ACCEPTED_MODULES

puts "from record, picture is [#{picture.inspect}]." if $debug
# Structures for student
student = Student.new(forename, surname, birth_dt, picture, coll_status)
if student == last_student
student = last_student
else
student.freeze

# Avoid duplicates
unless @students.include? student
@students.add student
end
last_student = student
end
# [...]
end
end
end

# [...]

end


This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

2. I end up with duplicate students.

Sets *can't* hold duplicates, and include depends on eql? for Sets.
So what's going on? I have checked, and the duplicate students seem to
have identical strings, so I wrote the eql? to be sure.

I bet this will be a self.kick(self) reason, but I can't see it yet.

Thank you,
Hugh
 
D

David A. Black

Hi --

I have been splitting a comma separated values file, and putting
some of the values into an Student class, simply a collection of strings,
so that I can build a database table from them: [...]
picture, coll_status, full_desc = record.split(/\s*\|\s*/)

I notice you mentioned comma separation but you're splitting on a
pipe. I don't know if this is related to the problem, but I thought
I'd flag it just in case.

Can you provide a couple of sample lines of data?


David
 
A

Ara.T.Howard

require 'set'

class Student
attr_accessor :forename, :surname, :birth_dt,
:picture, :coll_status
def initialize(forename0, surname0, birth_dt0,
picture0, coll_status0)
@forename = forename0
@surname = surname0
@birth_dt = birth_dt0
@picture = picture0
puts "in student.new() picture is #{picture0.inspect}, @picture is
#{@picture.inspect} " if $debug
@coll_status = coll_status0
end

def eql?(other)
# if self.forename == "John" and other.forename == "John"
debug = true
# end
res = [:forename, :surname, :birth_dt, :picture, :coll_status].all? do
|msg|
print "#{self.send(msg)} == #{(other.send(msg))} gives #{self.send(msg)
== (other.send(msg))}" if debug
self.send(msg) == (other.send(msg))
end
return res
end

def to_s
"#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
end
end

well this works:

s0 = Student::new 'a', 'b', 'c', 'd', 'e'
s1 = Student::new 'a', 'b', 'c', 'd', 'e'
p(s0.eql?(s1)) #=> true

but this doesn't

p s0 == s1 #=> false
And in the body of my program I read the records in from the csv and
add the students if they are new. They tend to be clustered in the
input, hence the last_student test.

class TableMaker
INPUT = "hugh.csv"

ACCEPTED_MODULES = /^\"TECH(100[1-7]|200\d|201[01]|300\d|301[0-2])/

# Read in the database and populate the tables.
def initialize(input=INPUT)

@students = Set.new()
# [...]
open(input, 'r') do |infp|
while record = infp.gets
record.chomp!

try : record.strip!
puts "record is #{record}"
forename, surname, birth_dt, institution_id, aos_code,
various, other, fields,
picture, coll_status, full_desc = record.split(/\s*\|\s*/)

or
fields = record.split(%r/\|/).map{|field| field.strip}
forename, surname, birth_dt, institution_id, aos_code,
various, other, fields,
picture, coll_status, full_desc =


if you don't do one of these two things the either

- forname may have leading space
- full_desc may have trailing space

that's because chomp! only blows away trailing newline - not extraneous
spaces and leading space on record is never dealt with.
next unless aos_code =~ ACCEPTED_MODULES

puts "from record, picture is [#{picture.inspect}]." if $debug
# Structures for student
student = Student.new(forename, surname, birth_dt, picture,
coll_status)
if student == last_student

so, as shown above, this (==) does not work
student = last_student
else
student.freeze

# Avoid duplicates
unless @students.include? student
@students.add student
end
last_student = student
end
# [...]
end
end
end

# [...]

end


This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

set uses Object#hash - so maybe something like (untested)

class Student
def hash
%w( forename surname birth_dt picture coll_status).inject(0){|n,m| n += send(m).hash}
end
end

i dunno if this will wrap and cause issues though...

if so maybe something like

class Student
def hash
%w( forename surname birth_dt picture coll_status).map{|m| send %m}.join.hash
end
end

or, perhaps simple something like:

class Student < ::Hash
FIELDS = %w( forename surname birth_dt picture coll_status )
def initialize(*fs)
FIELDS.each do |f|
self[f] = (fs.shift || raise(ArgumentError, "no #{ f }!"))
end
end
def eql? other
values == other.values
end
alias == eql?
def keys
FIELDS
end
def values
values_at(*FIELDS)
end
def hash
FIELDS.map{|m| self[m]}.join.hash
end
end

s0 = Student::new 'a', 'b', 'c', 'd', 'e'
s1 = Student::new 'a', 'b', 'c', 'd', 'e'

require 'set'
set = Set::new
set.add s0
set.add s1
p set #=> #<Set: {{"forename"=>"a", "coll_status"=>"e", "birth_dt"=>"c", "picture"=>"d", "surname"=>"b"}}>

the FIELDS const can be used to do ordered prints, etc.

it sure seems odd that set doesn't use 'eql?' or '==' up front though doesn't
it?

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| Your life dwells amoung the causes of death
| Like a lamp standing in a strong breeze. --Nagarjuna
===============================================================================
 
H

Hugh Sasse

Hi --

I have been splitting a comma separated values file, and putting
some of the values into an Student class, simply a collection of strings,
so that I can build a database table from them: [...]
picture, coll_status, full_desc = record.split(/\s*\|\s*/)

I notice you mentioned comma separation but you're splitting on a
pipe. I don't know if this is related to the problem, but I thought
I'd flag it just in case.

Yes, sorry, I was using the generic term for this, to facilitate
explaining the concept of what I was doing. The data I
get is pipe(|) separated.
Can you provide a couple of sample lines of data?

Not really, this is data about real people, and data protection law
means I can't. But I can tell you that the splitting works fine, the
selection of fields for the student works correctly, the students
don't end up with data from other fields, and the nature of the
split command means that we can be sure they are all Strings.

Therefore I think it boils down to:

How can two collections of strings appear to be the same and yet
both of them end up in the Set structure? Whitespace is always
white, be it tab or space, so that's one way, but I still think that
should look different to == or to eql?
Thank you,
Hugh
 
D

David A. Black

Hi --

This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

2. I end up with duplicate students.

Sets *can't* hold duplicates, and include depends on eql? for Sets.

Are you sure about that latter point? In set.rb:

def include?(o)
@hash.include?(o)
end

and in hash.c:

if (st_lookup(RHASH(hash)->tbl, key, 0)) {
return Qtrue;
... }

I haven't followed the trail beyond that... but I think any two
student objects will count as different hash keys, even if they have
similar string data.


David
 
H

Hugh Sasse

require 'set'

class Student
attr_accessor :forename, :surname, :birth_dt,
:picture, :coll_status
def initialize(forename0, surname0, birth_dt0, [...]
end

def eql?(other) [...]
end

def to_s
"#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
end
end

well this works:

s0 = Student::new 'a', 'b', 'c', 'd', 'e'
s1 = Student::new 'a', 'b', 'c', 'd', 'e'
p(s0.eql?(s1)) #=> true

but this doesn't

p s0 == s1 #=> false

Hmmm. Yes, I should have more unit tests!
(well. pipe separated -- see other reply :))
add the students if they are new. They tend to be clustered in the
input, hence the last_student test.

class TableMaker [...]
def initialize(input=INPUT) [...]
open(input, 'r') do |infp|
while record = infp.gets
record.chomp!

try : record.strip!
puts "record is #{record}"
forename, surname, birth_dt, institution_id, aos_code,
various, other, fields,
picture, coll_status, full_desc = record.split(/\s*\|\s*/)

or
fields = record.split(%r/\|/).map{|field| field.strip}
forename, surname, birth_dt, institution_id, aos_code,
various, other, fields,
picture, coll_status, full_desc =

I think the former may be faster, but I'll look into these, thanks.
if you don't do one of these two things the either

- forname may have leading space
- full_desc may have trailing space

Yes, I'd missed that.
that's because chomp! only blows away trailing newline - not extraneous
spaces and leading space on record is never dealt with.
next unless aos_code =~ ACCEPTED_MODULES

puts "from record, picture is [#{picture.inspect}]." if $debug
# Structures for student
student = Student.new(forename, surname, birth_dt, picture,
coll_status)
if student == last_student

so, as shown above, this (==) does not work

OK, I'll just lose optimisation, but thanks.
student = last_student
else
student.freeze

# Avoid duplicates
unless @students.include? student
@students.add student
end
last_student = student [...]

This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

set uses Object#hash - so maybe something like (untested)

class Student
def hash
%w( forename surname birth_dt picture coll_status).inject(0){|n,m| n +=
send(m).hash}
end
end

i dunno if this will wrap and cause issues though...

Nor me.
if so maybe something like

class Student
def hash
%w( forename surname birth_dt picture coll_status).map{|m| send
%m}.join.hash
end
end

Yes, that seems safer
or, perhaps simple something like:

class Student < ::Hash
FIELDS = %w( forename surname birth_dt picture coll_status ) [...]
end

s0 = Student::new 'a', 'b', 'c', 'd', 'e'
s1 = Student::new 'a', 'b', 'c', 'd', 'e'

require 'set'
set = Set::new
set.add s0
set.add s1
p set #=> #<Set: {{"forename"=>"a", "coll_status"=>"e", "birth_dt"=>"c",
"picture"=>"d", "surname"=>"b"}}>

the FIELDS const can be used to do ordered prints, etc.

Yes, I might factor that in to my current solution. I didn't want
to allow just any keys, so that's why I didn't subclass Hash, but
it's an interesting approach.
it sure seems odd that set doesn't use 'eql?' or '==' up front though doesn't
it?

Probably a reason I don't know about. The Pickaxe II says it uses
eql? and hash (p731) but doesn't say where.

Thank you for such a full response,
Hugh.
 
H

Hugh Sasse

Hi --



Are you sure about that latter point? In set.rb:

Yes I was, but it turns out that it was with the certainty that comes
before falling flat on one's face. I remembered seeing it in the ri
docs, and sure enough, it isn't there!

[What was that fidonet .sig? "Open mouth, insert foot, echo
internationally"? :)]
def include?(o)
@hash.include?(o)
end

and in hash.c:

if (st_lookup(RHASH(hash)->tbl, key, 0)) {
return Qtrue;
... }

I haven't followed the trail beyond that... but I think any two
student objects will count as different hash keys, even if they have
similar string data.

Which would explain a lot. Thank you. Ara's hash function should
fix this for me.
Thank you,
Hugh
 
H

Hugh Sasse

Hi --



Are you sure about that latter point? In set.rb:

def include?(o)
@hash.include?(o)
end

and in hash.c:

if (st_lookup(RHASH(hash)->tbl, key, 0)) {
return Qtrue;
... }

I haven't followed the trail beyond that... but I think any two
student objects will count as different hash keys, even if they have
similar string data.


David

Right, there is some definite wierdness going on here. I removed
the definition of eql? and set the hash to use MD5 sums. I still
didn't get unique entries in my set. Now I have

require 'md5'

class Student
# [...]
FIELDS = [:forename, :surname, :birth_dt, :picture, :coll_status]
def initialize(forename0, surname0, birth_dt0,
picture0, coll_status0)
# [...]
@hash = FIELDS.inject(MD5.new()) do |d,m|
d << send(m)
end.hexdigest.hex
end

def hash
@hash
end

def eql?(other)
self.hash == other.hash
end

end

And this works. Remmove the definition of eql? and include? always
gives untrue (I've not checked to see if it is nil or false).


This is in accordance with the entry in Pickaxe2 (page 570,
Object#hash) and ri, that:
------------------------------------------------------------ Object#hash
obj.hash => fixnum
------------------------------------------------------------------------
Generates a +Fixnum+ hash value for this object. This function must
have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
hash value is used by class +Hash+. Any hash value that exceeds the
capacity of a +Fixnum+ will be truncated before being used.

(I'm not sure if my digests are too big)

What i don't really know is what the sufficient conditions are for
this? Is it *necessary* to change hash and eql together? What are the
defaults for Set?

I suspect that my eql? ought to be

def eql?(other)
FIELDS.inject(true) do |b,v|
t && (self.send(m) == other.send(m))
end
end

for that matter

Hugh
 
M

Mauricio Fernández

object.c:

VALUE
rb_obj_id(VALUE obj)
{
if (SPECIAL_CONST_P(obj)) {
return LONG2NUM((long)obj);
}
return (VALUE)((long)obj|FIXNUM_FLAG);
}
[...]
rb_define_method(rb_mKernel, "hash", rb_obj_id, 0);

[...]
What i don't really know is what the sufficient conditions are for
this? Is it *necessary* to change hash and eql together? What are the
defaults for Set?

The defaults are actually those of Hash. You can follow the call chain
starting from

static struct st_hash_type objhash = {
rb_any_cmp,
rb_any_hash,
};

in hash.c. For user-defined classes, it will end up using #hash and #eql?
defined in Kernel. [rb_any_cmp and rb_any_hash have some extra logic for
Symbol, Fixnum and String values, and some core classes redefine the
associated methods].

Given the above definition of Kernel#hash, if you redefine it, you'll
most probably want to change #eql? too (see below). As far as Hash
objects (and hence Sets) are concerned, modifying #eql? while keeping
#hash unchanged would be effectless (unless you restrict it further so
that obj.eql?(obj) is false, which doesn't seem quite right).


static VALUE
rb_obj_equal(VALUE obj1, VALUE obj2)
{
if (obj1 == obj2) return Qtrue;
return Qfalse;
}

[...]
rb_define_method(rb_mKernel, "eql?", rb_obj_equal, 1);
 
H

Hugh Sasse

---559023410-440155785-1126707084=:29921
Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-440155785-1126707084=:29921"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-440155785-1126707084=:29921
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

The defaults are actually those of Hash. You can follow the call chain
starting from

static struct st_hash_type objhash =3D {
rb_any_cmp,
rb_any_hash,
};

in hash.c. For user-defined classes, it will end up using #hash and #eql?
defined in Kernel. [rb_any_cmp and rb_any_hash have some extra logic for
Symbol, Fixnum and String values, and some core classes redefine the
associated methods].
OK, it seems I'm thinking along the right lines now. Here is what I
did in the end:

--- /tmp/T0oTa4V2 Wed Sep 14 15:05:35 2005
+++ populate_tables.rb Wed Sep 14 15:01:24 2005
@@ -9,10 +9,31 @@

$debug =3D true

+module StringCollection
+
+ def hash
+ (self.class)::FIELDS.inject(MD5.new()) do |d,m|
+ d << send(m)
+ end.hexdigest.hex
+ end
+
+ def eql?(other)
+ (self.class)::FIELDS.inject(true) do |b,v|
+ begin
+ b && (self.send(v) =3D=3D other.send(v))
+ rescue
+ b =3D false
+ end
+ end
+ end
+
+end
+
class Student
- attr_accessor :forename, :surname, :birth_dt,
- :picture, :coll_status
+ include StringCollection
+
FIELDS =3D [:forename, :surname, :birth_dt, :picture, :coll_status]
+ FIELDS.each{|f| attr_accessor f }

def initialize(forename0, surname0, birth_dt0,
picture0, coll_status0)
@@ -22,28 +43,22 @@
@picture =3D picture0
puts "in student.new() picture is #{picture0.inspect}, @picture is #{=
@picture.inspect} " if $debug
@coll_status =3D coll_status0
- @hash =3D FIELDS.inject(MD5.new()) do |d,m|
- d << send(m)
- end.hexdigest.hex
end

- def hash
- @hash
- end

- def eql?(other)
- self.hash =3D=3D other.hash
- end
-
def to_s
- "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}=
, #{@hash}"
+ "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}=
, #{hash}"
end

end

class CourseModule
- attr_accessor :aos_code, :dept_code, :aos_type, :full_desc

+ include StringCollection
+
+ FIELDS =3D [:aos_code, :dept_code, :aos_type, :full_desc]
+ FIELDS.each{|f| attr_accessor f }
+
def initialize( aos_code, dept_code, aos_type, full_desc)
@aos_code =3D aos_code
@dept_code =3D dept_code



I was particularly pleased to be able not to repeat the FIELDS, by
means of attr_accessor, and that the idea of doing
(self.class)::FIELDS
actually worked.

In the hope that this helps someone else, and thank you,
Hugh

---559023410-440155785-1126707084=:29921--
---559023410-440155785-1126707084=:29921--
 
A

Ara.T.Howard

OK, it seems I'm thinking along the right lines now. Here is what I did in
the end:

I was particularly pleased to be able not to repeat the FIELDS, by means of
attr_accessor, and that the idea of doing (self.class)::FIELDS actually
worked.

i do alot of that type of thing and use my traits lib a lot for it - it can
make it pretty compact. for instance:

harp:~ > cat a.rb
require 'md5'
require 'traits'

module TraitCollection
def initialize(*list)
list = [ list ].flatten
wt.each_with_index do |t,i|
v = list or
raise ArgumentError, "no <#{ t }> given in <#{ list.inspect }>!"
send t, v
end
end
def to_s
(rt.map{|t| [t, send(t)].join '='}. << "hash=#{ hash }").inspect
end
alias inspect to_s
def hash
rt.inject:):MD5::new()){|d,m| d << send(m)}.hexdigest.hex
end
def eql?(other)
rt.inject(true){|b,v| b && (send(v) == other.send(v)) rescue false}
end
def wt; self::class::writer_traits; end
def rt; self::class::reader_traits; end
def self::included other
super; class << other; class << self; alias [] new; end; end
end
end

class Student
include TraitCollection
traits *%w( forename surname birth_dt picture coll_status )
end
class Course
include TraitCollection
traits *%w( aos_code dept_code aos_type full_desc )
end

require 'set'

sset = Set::new
s0, s1 = Student[%w( a b c d e )], Student[%w( f g h i j )]
sset.add s0
42.times{ sset.add s1 }
p sset

cset = Set::new
c0, c1 = Course[%w( a b c d )], Course[%w( e f g h )]
cset.add c0
42.times{ cset.add c1 }
p cset

harp:~ > ruby a.rb
#<Set: {["forename=a", "coll_status=b", "birth_dt=c", "picture=d", "surname=e", "hash=227748192848680293725464448333830731654"], ["forename=f", "coll_status=g", "birth_dt=h", "picture=i", "surname=j", "hash=116663401890982171087417074910604104991"]}>

#<Set: {["dept_code=a", "full_desc=b", "aos_code=c", "aos_type=d", "hash=301716283811389038011477436469853762335"], ["dept_code=e", "full_desc=f", "aos_code=g", "aos_type=h", "hash=41821698252824551223787888325781077799"]}>


cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| Your life dwells amoung the causes of death
| Like a lamp standing in a strong breeze. --Nagarjuna
===============================================================================
 
M

Mauricio Fernández

+ def eql?(other)
+ (self.class)::FIELDS.inject(true) do |b,v|
+ begin
+ b && (self.send(v) == other.send(v))
+ rescue
+ b = false
+ end
+ end
+ end

Just one minor comment:

batsman@tux-chan:~$ cat /tmp/fdsfdsdsd.rb
class Foo
FIELDS = %w[name stuff foo bar]
attr_reader(*FIELDS)

def initialize(name, stuff, foo, bar)
@name, @stuff, @foo, @bar = name, stuff, foo, bar
end

def eql1?(other)
(self.class)::FIELDS.inject(true) do |b,v|
begin
b && (self.send(v) == other.send(v))
rescue
b = false
end
end
end

def eql2?(other)
# maybe add self.class::FIELDS == other.class::FIELDS test plus rescue NameError ?
self.class::FIELDS.each{|m| break false if self.send(m) != other.send(m) } && true
rescue NoMethodError
false
end
end

require 'benchmark'
a = Foo.new("a", "b", "c", "d")
b = Foo.new("e", "b", "c", "d")
c = Foo.new("a", "b", "c", "e")

TIMES = 100000
%w[a b c].each{|x| puts "#{x} = #{eval(x).inspect}"}
Benchmark.bmbm do |x|
%w[a b c].each do |o|
%w[eql1? eql2?].each do |m|
s = "a.#{m}(#{o})"
x.report("#{s}: #{eval(s)}") { eval("TIMES.times{#{s}}") }
end
end
end
batsman@tux-chan:~$ ruby -v /tmp/fdsfdsdsd.rb
ruby 1.8.3 (2005-05-22) [i686-linux]
a = #<Foo:0xb7dc9c98 @name="a", @bar="d", @foo="c", @stuff="b">
b = #<Foo:0xb7dc9c20 @name="e", @bar="d", @foo="c", @stuff="b">
c = #<Foo:0xb7dc9ba8 @name="a", @bar="e", @foo="c", @stuff="b">
Rehearsal -----------------------------------------------------
a.eql1?(a): true 1.520000 0.000000 1.520000 ( 1.658224)
a.eql2?(a): true 0.880000 0.000000 0.880000 ( 0.970675)
a.eql1?(b): false 1.070000 0.000000 1.070000 ( 1.156081)
a.eql2?(b): false 0.360000 0.010000 0.370000 ( 0.410011)
a.eql1?(c): false 1.570000 0.000000 1.570000 ( 1.734145)
a.eql2?(c): false 0.910000 0.000000 0.910000 ( 1.003833)
-------------------------------------------- total: 6.320000sec

user system total real
a.eql1?(a): true 1.510000 0.010000 1.520000 ( 1.679369)
a.eql2?(a): true 0.890000 0.000000 0.890000 ( 0.950153)
a.eql1?(b): false 1.100000 0.010000 1.110000 ( 1.200057)
a.eql2?(b): false 0.360000 0.000000 0.360000 ( 0.383755)
a.eql1?(c): false 1.560000 0.010000 1.570000 ( 1.739114)
a.eql2?(c): false 0.920000 0.000000 0.920000 ( 0.978109)
 
H

Hugh Sasse

---559023410-1663602767-1126715479=:29921
Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-1663602767-1126715479=:29921"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-1663602767-1126715479=:29921
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Just one minor comment:

batsman@tux-chan:~$ cat /tmp/fdsfdsdsd.rb
class Foo
FIELDS =3D %w[name stuff foo bar]
attr_reader(*FIELDS)

That's rather nice :)
[...]
def eql2?(other)
# maybe add self.class::FIELDS =3D=3D other.class::FIELDS test plus re=
scue NameError ?

Good point.
self.class::FIELDS.each{|m| break false if self.send(m) !=3D other.sen=
d(m) } && true

Nice optimisation! I was having enough of a job keeping my head
around inject to think of that!
[...]
Rehearsal -----------------------------------------------------
a.eql1?(a): true 1.520000 0.000000 1.520000 ( 1.658224)
a.eql2?(a): true 0.880000 0.000000 0.880000 ( 0.970675)

[and similar]

That makes quite a difference. Thank you.
--=20
Mauricio Fernandez

Hugh

---559023410-1663602767-1126715479=:29921--
---559023410-1663602767-1126715479=:29921--
 
H

Hugh Sasse

This seems to be the canonical way to define compund hashes:

class Student
def hash
[@forename, @surname, @birth_dt, @picture, @coll_status].hash
end
end

That does seem to preserve the properties I need for strings, and is
probably cheaper than MD5sums.
[...]

Set uses a Hash to store the objects.

That said, I think it would be nice to have something along this in
the stdlib:

class Student
equal_compares :mad:forename, :mad:surname, :mad:birth_dt, :mad:picture, :mad:coll_status
end

Above call should result in appropriate definitions of ==, eql? and

I don't know how it could know how to create the different
definitions correctly given a completely open spec as to what the
vars are.
hash. (Something like "ordered_by" would be pretty useful too.)

I think that could be tricky too.Thank you.
Hugh
 
R

Robert Klemme

Christian said:
Hugh Sasse said:
This seems to be the canonical way to define compund hashes:

class Student
def hash
[@forename, @surname, @birth_dt, @picture, @coll_status].hash
end
end

That does seem to preserve the properties I need for strings, and is
probably cheaper than MD5sums.
[...]

Set uses a Hash to store the objects.

That said, I think it would be nice to have something along this in
the stdlib:

class Student
equal_compares :mad:forename, :mad:surname, :mad:birth_dt, :mad:picture,
:mad:coll_status end

Above call should result in appropriate definitions of ==, eql? and

I don't know how it could know how to create the different
definitions correctly given a completely open spec as to what the
vars are.

Well, you just list all instance variables that define the
object... if they are the same, the objects are eql?.
I think that could be tricky too.

In the end, [*fields] <=> [*other.fields] does the job.

You can also steal the code from RCR 293 for a general solution:
http://rcrchive.net/rcr/show/293

Kind regards

robert
 
H

Hugh Sasse

You can also steal the code from RCR 293 for a general solution:
http://rcrchive.net/rcr/show/293

Hmm, that's interesting, but I don't get:

code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
") << " end\n"

Shouldn't hash return a Fixnum?

------------------------------------------------------------ Object#hash
obj.hash => fixnum
------------------------------------------------------------------------
Generates a +Fixnum+ hash value for this object. This function must
have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
hash value is used by class +Hash+. Any hash value that exceeds the
capacity of a +Fixnum+ will be truncated before being used.

The function above appears to return a string with numbers separated
by " ^ ".
Kind regards

robert
Thank you,
Hugh
 
R

Robert Klemme

Hugh said:
Hmm, that's interesting, but I don't get:

code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
") << " end\n"

Shouldn't hash return a Fixnum?
Definitely!

------------------------------------------------------------
Object#hash obj.hash => fixnum
------------------------------------------------------------------------
Generates a +Fixnum+ hash value for this object. This function
must have the property that +a.eql?(b)+ implies +a.hash ==
b.hash+. The hash value is used by class +Hash+. Any hash value
that exceeds the capacity of a +Fixnum+ will be truncated
before being used.

The function above appears to return a string with numbers separated
by " ^ ".

Nope. The join appears during code generation and not during evaluation
of the method. You can easily verify this by printing code after it's
completed. :)

Kind regards

robert
 
H

Hugh Sasse

Hugh Sasse wrote:
Hmm, that's interesting, but I don't get:

code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
") << " end\n"

Shouldn't hash return a Fixnum?
Definitely!
[...]
The function above appears to return a string with numbers separated
by " ^ ".

Nope. The join appears during code generation and not during evaluation
of the method. You can easily verify this by printing code after it's
completed. :)

Oh, then it's exclusive or. I'm clearly being as sharp as a sponge
today.

While my brain is behaving like cottage cheese, it's probably not
the time to ask how one might guarantee that you don't stomp on the
hashes of other ojects in the system. If you have an even number of
elements, all the same Fixnum, like [1,1,1,1] then they would hash
to 0, as would [2,2], I "think".
irb(main):004:0> [1,1].inject(0) { |a,b| a ^= b.hash}
=> 0
irb(main):005:0> [2,1,1,2].inject(0) { |a,b| a ^= b.hash}
=> 0
irb(main):006:0>
Kind regards

robert
Hugh
 
R

Robert Klemme

Hugh said:
Hugh Sasse wrote:
Hmm, that's interesting, but I don't get:

code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
") << " end\n"

Shouldn't hash return a Fixnum?
Definitely!
[...]
The function above appears to return a string with numbers separated
by " ^ ".

Nope. The join appears during code generation and not during
evaluation of the method. You can easily verify this by printing
code after it's completed. :)

Oh, then it's exclusive or. I'm clearly being as sharp as a sponge
today.

I'll have to remember that phrase - I could use it myself from time to
time. :)
While my brain is behaving like cottage cheese, it's probably not
the time to ask how one might guarantee that you don't stomp on the
hashes of other ojects in the system. If you have an even number of
elements, all the same Fixnum, like [1,1,1,1] then they would hash
to 0, as would [2,2], I "think".
irb(main):004:0> [1,1].inject(0) { |a,b| a ^= b.hash}
=> 0
irb(main):005:0> [2,1,1,2].inject(0) { |a,b| a ^= b.hash}
=> 0

Btw, the assignment is superfluous. The result of a^b.hash is the next
iteration's a.
irb(main):006:0>

Yes. The algorithm can certainly be improved on. Typically you rather do
something similar to

(a.hash ^ (b.hash << 3) ^ (c.hash << 7)) & MAX_HASH

09:53:59 [~]: irbs
[2,1,1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF} => 2781
[1,2, 1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF} => 1885

i.e. by shifting you make sure that order matters etc.

Kind regards

robert
 
H

Hugh Sasse

Hugh said:
While my brain is behaving like cottage cheese, it's probably not
the time to ask how one might guarantee that you don't stomp on the
hashes of other ojects in the system. If you have an even number of
elements, all the same Fixnum, like [1,1,1,1] then they would hash
to 0, as would [2,2], I "think".
irb(main):004:0> [1,1].inject(0) { |a,b| a ^= b.hash}
=> 0
irb(main):005:0> [2,1,1,2].inject(0) { |a,b| a ^= b.hash}
=> 0

Btw, the assignment is superfluous. The result of a^b.hash is the next
iteration's a.

Yes, good point, the result of the block....

------------------------------------------------------ Enumerable#inject
enum.inject(initial) {| memo, obj | block } => obj
enum.inject {| memo, obj | block } => obj
------------------------------------------------------------------------
Combines the elements of _enum_ by applying the block to an
accumulator value (_memo_) and each element in turn. At each step,
_memo_ is set to the value returned by the block. The first form
================================================
[...]
irb(main):006:0>

Yes. The algorithm can certainly be improved on. Typically you rather do
something similar to

(a.hash ^ (b.hash << 3) ^ (c.hash << 7)) & MAX_HASH

09:53:59 [~]: irbs
[2,1,1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF}
=> 2781
lo>> [1,2, 1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF}

Ah, so that's what MAX_HASH is, I couldn't remember how big Fixnums
were.

I was thinking about something like a linear congruential
random number generator like:
brains hgs 22 %> irb
irb(main):001:0> [2,1,1,2].inject(0) {|a,b| ((a * 31)+b.hash) % 4093082899}
=> 151936
irb(main):002:0> [2,2,1,1].inject(0) {|a,b| ((a * 31)+b.hash) % 4093082899}
=> 153856
irb(main):003:0> [2,1,2,1].inject(0) {|a,b| ((a * 31)+b.hash) % 4093082899}
=> 151996
irb(main):004:0>

like
http://www.cs.bell-labs.com/cm/cs/pearls/markovhash.c

from "Programming Pearls".

with a largish prime grabbed from
http://primes.utm.edu/lists/small/small.html#10

being the biggest I could see less than 0XFFFF_FFFF (4294967295)


------------------------------------------------ Class: Fixnum < Integer
A +Fixnum+ holds +Integer+ values that can be represented in a
native machine word (minus 1 bit). If any operation on a +Fixnum+

Should that be 0x7FFF_FFFF? (2147483647)
According to
http://www.rsok.com/~jrm/printprimes.html
this would seem to be a prime number, so could be used as the
modulus anyway.
i.e. by shifting you make sure that order matters etc.

Kind regards

robert
Thank you,
Hugh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top