Hash.merge_add! extension - how does this code look?

G

Greg Hauptmann

[Note: parts of this message were removed to make it a legal post.]

Hi,
I wanted a way to be able to "add" values to a Hash such that it keeps the
old and the new values. For examples example adding an item to a hash for
which the keys are dates. Here's a first cut. Any feedback on coding
style etc?

================================
class Hash
# Merges a hash with only one item into your hash. If there is already a
# hash entry for that key the both existing value(s) & new value are kept
via
# use of an Array
def merge_add!(h)
raise "Parameter passed in not a hash" if !h.instance_of?(Hash)
raise "Can only pass hash with one item to be added" if h.length > 1
h_key = h.to_a[0][0]
h_value = h.to_a[0][1]
if self.has_key?(h_key)
existing_value = self[h_key]
existing_value_as_array = existing_value.instance_of?(Array) ?
existing_value : [existing_value]
new_value = existing_value_as_array << h_value
self[h_key] = new_value
else
h_value.instance_of?(Array) ? self.merge!(h) : self.merge!( {h_key =>
[h_value]} )
end
end
end
================================
require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')

describe "merge_add!" do
before:)each) do
@empty_hash = {}
@hash_with_number = {100 => 1}
@hash_with_array = {100 => [1]}
end

it "should raise an error if there is no object passed to the method" do
lambda{ @empty_hash.merge_add!() }.should raise_error(ArgumentError)
end
it "should raise an error if the object passed is not a Hash" do
lambda{@empty_hash.merge_add!(123)}.should raise_error(RuntimeError,
"Parameter passed in not a hash")
end
it "should raise an error if the number of items in the hash in not 1" do
lambda{@empty_hash.merge_add!({101 => 2, 102 => 3})}.should
raise_error(RuntimeError, "Can only pass hash with one item to be added")
end
it "should create a new Array & wrap new value as an array" do
@empty_hash.merge_add!( {101 => 2} )
@empty_hash[101].should eql([2])
end
it "should create a new Array & use the array passed as the value" do
@empty_hash.merge_add!( {101 => [2]} )
@empty_hash[101].should eql([2])
end
it "should migrate existing value to an Array, then add the new value, if
existing value is NOT an array" do
@hash_with_number.merge_add!( {100 => 2} )
@hash_with_number[100].should eql([1, 2])
end
it "should migrate add the new value to existing, if existing value IS an
array" do
@hash_with_array.merge_add!( {100 => 2} )
@hash_with_array[100].should eql([1, 2])
end

end



================================


tks
 
D

David A. Black

Hi --

Hi,
I wanted a way to be able to "add" values to a Hash such that it keeps the
old and the new values. For examples example adding an item to a hash for
which the keys are dates. Here's a first cut. Any feedback on coding
style etc?

Just a quick (1/2-cup of coffee) point:
================================
class Hash
# Merges a hash with only one item into your hash. If there is already a
# hash entry for that key the both existing value(s) & new value are kept
via
# use of an Array
def merge_add!(h)

That's not a good name for it, because there's no non-bang merge_add
method. The ! in method names only makes sense (with very few,
very specialized exceptions, like the way Jim W. uses them in Builder)
if the methods come in pairs: one regular method and one "dangerous"
method ("dangerous" being Matz's term to describe the meaning of the
!).

The reasoning is as follows.

Bang method names never have, and never will, coincide completely with
receiver-altering methods in Ruby. That's just not a possibility, and
has never been envisioned. (Consider Array#pop, String#replace, and
many, many more.) So adding a bang just because the receiver is being
changed doesn't make sense, and dilutes the convention of the
regular/dangerous pairing.

Putting a bang on an unpaired method name just because the method
seems "dangerous", in a unary way, doesn't make sense either. Rails
does this, and it's about my least favorite thing about the Rails
source code. If ! just means "This method does something dramatic!",
we're back to a very fuzzy, uninformative, impressionistic use of !.

Matz had it right: the regular/dangerous pairing of otherwise
same-named methods is an incredibly creative and useful application of
the bang. It actually tells you something. Every time people complain
because gsub! returns nil if the receiver doesn't change, all I can
think is: "You were warned! There's a bang in the name; that means it's
similar to gsub but it's dangerous. Heads up! Go read the docs! Don't
complain!" :)

In Ruby, unpaired methods always have names that already incorporate
what they do and whether or not they change their receivers. Yes,
there are some glitches (like Array#delete and String#delete being
different as to receiver-changing). But Array#pop isn't Array#pop!
because there's no possible non-destructive "pop"; that would just be
array[-1]. So if you've got an unpaired method name and you feel like
it's not communicating enough, it's best to change the name rather
than add a !, for the reasons above.

Anyway, that's the gist. Sorry to pounce on one thing. I'll have more
coffee and look at the code itself :)


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
T

Trans

Hi,
I wanted a way to be able to "add" values to a Hash such that it keeps th= e
old and the new values. =A0For examples example adding an item to a hash = for
which the keys are dates. =A0 Here's a first cut. =A0Any feedback on codi= ng
style etc?

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D
class Hash
=A0 # Merges a hash with only one item into your hash. =A0If there is alr= eady a
=A0 # hash entry for that key the both existing value(s) & new value are = kept
via
=A0 # use of an Array
=A0 def merge_add!(h)
=A0 =A0 raise "Parameter passed in not a hash" if !h.instance_of?(Hash)

You don't need to do this. It is "anti-ducktyping". h could be a hash-
*like* object and that would be okay.
=A0 =A0 raise "Can only pass hash with one item to be added" if h.length =
1

Why artificially limit it to one entry hashes? Supporting any hash
would make it more flexible.
=A0 =A0 h_key =3D h.to_a[0][0]
=A0 =A0 h_value =3D h.to_a[0][1]
=A0 =A0 if self.has_key?(h_key)
=A0 =A0 =A0 existing_value =3D self[h_key]
=A0 =A0 =A0 existing_value_as_array =3D existing_value.instance_of?(Array= ) ?
existing_value : [existing_value]
=A0 =A0 =A0 new_value =3D existing_value_as_array << h_value
=A0 =A0 =A0 self[h_key] =3D new_value
=A0 =A0 else
=A0 =A0 =A0 h_value.instance_of?(Array) ? self.merge!(h) : self.merge!( {= h_key =3D>
[h_value]} )
=A0 =A0 end
=A0 end
end

The idea is a good one. I think implementation could use some
improvement. For comparison have a look at Facets' Hash#collate.

http://facets.rubyforge.org/doc/api/core/classes/Hash.html#M000112

t
 
S

Stefan Rusterholz

Greg said:
Hi, I wanted a way to be able to "add" values to a Hash such that it
keeps the old and the new values. For examples example adding
an item to a hash for which the keys are dates. Here's a first cut.
Any feedback on coding style etc?

Reading your problem and code I think you missed that Hash#merge
optionally takes a block which seems to solve your issue just fine but
in a more generic manner:

hash = {'foo' => 'bar1'}
hash.merge! 'foo' => 'bar2' do |key, existing, new|
Array(existing) + [new]
end
p hash # -> {"foo"=>["bar1", "bar2"]}

Additionally I would normalize all values to arrays upfront here, makes
dealing with the hash a lot easier.

Regards
Stefan
 
T

Trans

Reading your problem and code I think you missed that Hash#merge
optionally takes a block which seems to solve your issue just fine but
in a more generic manner:

hash =3D {'foo' =3D> 'bar1'}
hash.merge! 'foo' =3D> 'bar2' do |key, existing, new|
=A0 Array(existing) + [new]
end
p hash # -> {"foo"=3D>["bar1", "bar2"]}

Nice. I didn't know it could take a block either. Has that been there
all along?

T.
 
G

Greg Hauptmann

[Note: parts of this message were removed to make it a legal post.]

thanks for highlighting this! Is the quickest way to normalise to Array via
the following?
hashitem.instance_of?(Array) ? hashitem : [hashitem]



Greg said:
Hi, I wanted a way to be able to "add" values to a Hash such that it
keeps the old and the new values. For examples example adding
an item to a hash for which the keys are dates. Here's a first cut.
Any feedback on coding style etc?

Reading your problem and code I think you missed that Hash#merge
optionally takes a block which seems to solve your issue just fine but
in a more generic manner:

hash = {'foo' => 'bar1'}
hash.merge! 'foo' => 'bar2' do |key, existing, new|
Array(existing) + [new]
end
p hash # -> {"foo"=>["bar1", "bar2"]}

Additionally I would normalize all values to arrays upfront here, makes
dealing with the hash a lot easier.

Regards
Stefan
 
S

Stefan Rusterholz

Greg said:
thanks for highlighting this! Is the quickest way to normalise to Array
via
the following?
hashitem.instance_of?(Array) ? hashitem : [hashitem]

If you're building up the hash yourself then I'd use the following
idiom:
collector = Hash.new { |h,k| h[k] = [] }
collector['key1'] << 'value1'
collector['key2'] << 'value1'
collector['key1'] << 'value2'
p collector

Simplifies adding to and reading from the hash. But be aware that with
the hash having a different default value than nil, you can't use 'if
collector[key] then' to test for existence of a key anymore, you have to
use has_key?.

If you're not building the hash up yourself, then yes, I'd use what you
wroten in an each loop and override the existing value.

Regards
Stefan
 
G

Greg Hauptmann

I just noticed the code suggested might have a problem when the
original hash is empty. That is using the ability to pass a block to
'merge!" is good, but when the source hash is empty it does not seem
to trigger use of the code in the block to merge.

-----code----
def merge_add!(h)
self.merge!(h) do |key, existing, new|
new = new.instance_of?(Array) ? new : [new]
existing = existing.instance_of?(Array) ? existing : [existing]
existing + new
end
end
-------------
---in console---
?> {}.merge_add!({1 => 100})
=> {1=>100} <<== DID NOT PUT THE '100' IN AN ARRAY!!
=> {1=>[300, 100]}
--------------

regards
Greg

Greg said:
thanks for highlighting this! Is the quickest way to normalise to Array
via
the following?
hashitem.instance_of?(Array) ? hashitem : [hashitem]

If you're building up the hash yourself then I'd use the following
idiom:
collector = Hash.new { |h,k| h[k] = [] }
collector['key1'] << 'value1'
collector['key2'] << 'value1'
collector['key1'] << 'value2'
p collector

Simplifies adding to and reading from the hash. But be aware that with
the hash having a different default value than nil, you can't use 'if
collector[key] then' to test for existence of a key anymore, you have to
use has_key?.

If you're not building the hash up yourself, then yes, I'd use what you
wroten in an each loop and override the existing value.

Regards
Stefan
 
G

Greg Hauptmann

thanks all for feedback to date - here's my latest take

------------------------------------
def merge_add!(h)
raise "Parameter passed in not a hash" if !h.instance_of?(Hash)

# normalise input hash to contain arrays
h.each { |key, value| if !value.instance_of?(Array) then h[key] =
[value] end }

self.merge!(h) do |key, existing, new|
existing = existing.instance_of?(Array) ? existing : [existing]
existing + new
end
end
------------------------------------
Macintosh-2:myequity greg$ cat spec/lib/hash_extensions_spec.rb
require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')

describe "merge_add!" do
before:)each) do
@empty_hash = {}
@hash_with_number = {100 => 1}
@hash_with_array = {100 => [1]}
end

it "should raise an error if there is no object passed to the method" do
lambda{ @empty_hash.merge_add!() }.should raise_error(ArgumentError)
end
it "should raise an error if the object passed is not a Hash" do
lambda{@empty_hash.merge_add!(123)}.should
raise_error(RuntimeError, "Parameter passed in not a hash")
end
it "should raise an error if the number of items in the hash in not 1" do
lambda{@empty_hash.merge_add!({101 => 2, 102 => 3})}.should_not raise_error
end
it "should create a new Array & wrap new value as an array" do
@empty_hash.merge_add!( {101 => 2} )
@empty_hash[101].should eql([2])
end
it "should create a new Array & use the array passed as the value" do
@empty_hash.merge_add!( {101 => [2]} )
@empty_hash[101].should eql([2])
end
it "should migrate existing value to an Array, then add the new
value, if existing value is NOT an array" do
@hash_with_number.merge_add!( {100 => 2} )
@hash_with_number[100].should eql([1, 2])
end
it "should migrate add the new value to existing, if existing value
IS an array" do
@hash_with_array.merge_add!( {100 => 2} )
@hash_with_array[100].should eql([1, 2])
end

end



I just noticed the code suggested might have a problem when the
original hash is empty. That is using the ability to pass a block to
'merge!" is good, but when the source hash is empty it does not seem
to trigger use of the code in the block to merge.

-----code----
def merge_add!(h)
self.merge!(h) do |key, existing, new|
new = new.instance_of?(Array) ? new : [new]
existing = existing.instance_of?(Array) ? existing : [existing]
existing + new
end
end
-------------
---in console---
?> {}.merge_add!({1 => 100})
=> {1=>100} <<== DID NOT PUT THE '100' IN AN ARRAY!!
=> {1=>[300, 100]}
--------------

regards
Greg

Greg said:
thanks for highlighting this! Is the quickest way to normalise to Array
via
the following?
hashitem.instance_of?(Array) ? hashitem : [hashitem]

If you're building up the hash yourself then I'd use the following
idiom:
collector = Hash.new { |h,k| h[k] = [] }
collector['key1'] << 'value1'
collector['key2'] << 'value1'
collector['key1'] << 'value2'
p collector

Simplifies adding to and reading from the hash. But be aware that with
the hash having a different default value than nil, you can't use 'if
collector[key] then' to test for existence of a key anymore, you have to
use has_key?.

If you're not building the hash up yourself, then yes, I'd use what you
wroten in an each loop and override the existing value.

Regards
Stefan
 
T

Trans

thanks all for feedback to date - here's my latest take

Again, this is considered poor form. The reason is, if it isn't a Hash
it will blow up in the next couple of statements anyway, but more
importantly something other a Hash might emulate one. And there's no
reason not to allow it to work.
=A0 =A0 # normalise input hash to contain arrays
=A0 =A0 h.each { |key, value| if !value.instance_of?(Array) then h[key] = =3D
[value] end }

=A0 =A0 self.merge!(h) do |key, existing, new|
=A0 =A0 =A0 existing =3D existing.instance_of?(Array) ? existing : [exist= ing]
=A0 =A0 =A0 existing + new
=A0 =A0 end
=A0 end


def merge_add!(h)
q =3D {}
(keys | h.keys).each do |k|
q[k] =3D Array(self[k]) + Array(h[k])
end
replace(q)
end


trans.
 
G

Greg Hauptmann

interesting - is it really Ruby approach to let things 'break' during
a method so to speak as opposed to defensive coding and doing some
form of validation at the beginning of the method? I noted when I run
my spec's with your code i get the error

"<NoMethodError: undefined method `keys' for 123:Fixnum>"

which is quite a bit more cryptic than my

"RuntimeError - "Parameter passed in not a hash"

Wouldn't it be harder to debug code if one were getting the former
error message rather than the later?


thanks all for feedback to date - here's my latest take

Again, this is considered poor form. The reason is, if it isn't a Hash
it will blow up in the next couple of statements anyway, but more
importantly something other a Hash might emulate one. And there's no
reason not to allow it to work.
# normalise input hash to contain arrays
h.each { |key, value| if !value.instance_of?(Array) then h[key] =
[value] end }

self.merge!(h) do |key, existing, new|
existing = existing.instance_of?(Array) ? existing : [existing]
existing + new
end
end


def merge_add!(h)
q = {}
(keys | h.keys).each do |k|
q[k] = Array(self[k]) + Array(h[k])
end
replace(q)
end


trans.
 
B

Brian Candler

Greg said:
interesting - is it really Ruby approach to let things 'break' during
a method so to speak

Generally speaking, yes it is.

Depending on your implementation, you probably only make use of a one or
two Hash methods - says 'keys' and '[]'

By not testing the class, you allow other objects which also implement
these methods to work, which typically makes your code more useful.

If you wanted to be really anal, you could do:

raise "Unsuitable object (doesn't implement 'keys')" unless
hash.respond_to?:)keys)
raise "Unsuitable object (doesn't implement '[]')" unless
hash.respond_to?:)[])

However the error message you would then generate would be almost
identical to the default Ruby one:
"<NoMethodError: undefined method `keys' for 123:Fixnum>"

Now, in this particular case, I'd say an ideal API would only depend on
'each', and then it could be used for any enumerable object which yields
two values. For example:

class Hash
def merge_add!(h)
h.each do |k,v|
if has_key?(k)
self[k] = Array[self[k]] + Array[v]
else
self[k] = v
end
end
end
end

tmp = {"one"=>1, "two"=>2}
tmp.merge_add!([["three",3], ["one",999]])
p tmp

I should add that I don't like the fact that the resulting hash will
have a mixture of array and non-array values, as it makes the values
harder to use later. If it were me I'd want all the values to be arrays.

A similar case I have come across before is a multi-valued Hash#invert,
where different keys map to the same value.

# input: {"foo"=>1, "bar"=>1, "baz"=>2}

In that case, what I would want is

# output: {1=>["foo","bar"], 2=>["baz"]}

rather than

# output: {1=>["foo","bar"], 2=>"baz"}
 
T

Trans

Greg said:
interesting - is it really Ruby approach to let things 'break' during
a method so to speak

Generally speaking, yes it is.

Depending on your implementation, you probably only make use of a one or
two Hash methods - says 'keys' and '[]'

By not testing the class, you allow other objects which also implement
these methods to work, which typically makes your code more useful.

If you wanted to be really anal, you could do:

=A0 raise "Unsuitable object (doesn't implement 'keys')" unless
=A0 =A0 =A0 =A0 hash.respond_to?:)keys)
=A0 raise "Unsuitable object (doesn't implement '[]')" unless
=A0 =A0 =A0 =A0 hash.respond_to?:)[])

However the error message you would then generate would be almost
identical to the default Ruby one:
=A0 =A0 =A0"<NoMethodError: undefined method `keys' for 123:Fixnum>"

Nice explanation.
Now, in this particular case, I'd say an ideal API would only depend on
'each', and then it could be used for any enumerable object which yields
two values. For example:

I agree. I should have done this myself, but I started playing golf ;)
=A0 class Hash
=A0 =A0 def merge_add!(h)
=A0 =A0 =A0 h.each do |k,v|
=A0 =A0 =A0 =A0 if has_key?(k)
=A0 =A0 =A0 =A0 =A0 self[k] =3D Array[self[k]] + Array[v]
=A0 =A0 =A0 =A0 else
=A0 =A0 =A0 =A0 =A0 self[k] =3D v

self[k] =3D Array(v)
=A0 =A0 =A0 =A0 end
=A0 =A0 =A0 end
=A0 =A0 end
=A0 end

=A0 tmp =3D {"one"=3D>1, "two"=3D>2}
=A0 tmp.merge_add!([["three",3], ["one",999]])
=A0 p tmp

I should add that I don't like the fact that the resulting hash will
have a mixture of array and non-array values, as it makes the values
harder to use later. If it were me I'd want all the values to be arrays.

I think that was the intent. The above noted change might fix (?).
A similar case I have come across before is a multi-valued Hash#invert,
where different keys map to the same value.

=A0 # input: {"foo"=3D>1, "bar"=3D>1, "baz"=3D>2}

In that case, what I would want is

=A0 # output: {1=3D>["foo","bar"], 2=3D>["baz"]}

rather than

=A0 # output: {1=3D>["foo","bar"], 2=3D>"baz"}
{"foo"=3D>1, "bar"=3D>1, "baz"=3D>2}.invert
=3D> {1=3D>"bar", 2=3D>"baz"}

t
 
D

David A. Black

Hi --

interesting - is it really Ruby approach to let things 'break' during
a method so to speak as opposed to defensive coding and doing some
form of validation at the beginning of the method? I noted when I run
my spec's with your code i get the error

"<NoMethodError: undefined method `keys' for 123:Fixnum>"

which is quite a bit more cryptic than my

"RuntimeError - "Parameter passed in not a hash"

Wouldn't it be harder to debug code if one were getting the former
error message rather than the later?

Not really. The first message actually gives you more information
about what happened, which is likely to help you pinpoint it better.

You'll see a fair amount of class-checking in the Ruby source code,
but mostly in the C code, which after all is a C program and not a
Ruby program, and which has the job of bootstrapping the actual Ruby
environment into being. In Ruby itself, it's relatively unusual to
check class membership directly (or at least it's unusual to have any
compelling reason to).


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
D

David A. Black

Hi --

Greg said:
interesting - is it really Ruby approach to let things 'break' during
a method so to speak

Generally speaking, yes it is.

Depending on your implementation, you probably only make use of a one or
two Hash methods - says 'keys' and '[]'

By not testing the class, you allow other objects which also implement
these methods to work, which typically makes your code more useful.

If you wanted to be really anal, you could do:

raise "Unsuitable object (doesn't implement 'keys')" unless
hash.respond_to?:)keys)
raise "Unsuitable object (doesn't implement '[]')" unless
hash.respond_to?:)[])

However the error message you would then generate would be almost
identical to the default Ruby one:
"<NoMethodError: undefined method `keys' for 123:Fixnum>"

Now, in this particular case, I'd say an ideal API would only depend on
'each', and then it could be used for any enumerable object which yields
two values. For example:

class Hash
def merge_add!(h)
h.each do |k,v|
if has_key?(k)
self[k] = Array[self[k]] + Array[v]
else
self[k] = v
end
end
end
end

And to fix the name problem (the rogue "!"), you could also do:

class Hash
def merge_add(h)
dup.merge_add!(h)
end
end

which would give the ! a reason to be there and follows the semantics
of merge/merge!.


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
B

Brian Candler

self[k] = Array[self[k]] + Array[v]

Oops, I meant to write:

self[k] = Array(self[k]) + Array(v)

Otherwise, if the value is already an array, it will be wrapped in
another one.

This does highlight a limitation of this API thought. What if the values
themselves are arrays?

h = { "foo" => [1,2] }
h.merge_add!({ "foo" => [3,4] })
# do you want h = { "foo" => [[1,2], [3,4]] } ?

In that case, it would be better to insist that the hash values are
arrays in the first place, as shown by Stefan Rusterholz's collector
example earlier.

Also, are you happy for duplicate values to be inserted?

h = { "foo" => "bar" }
h.merge_add!({ "foo" => "bar" })
# h = { "foo" => ["bar", "bar"] } ?
 
G

Greg Hauptmann

thanks for the great feedback guys - will take all this on board

self[k] = Array[self[k]] + Array[v]

Oops, I meant to write:

self[k] = Array(self[k]) + Array(v)

Otherwise, if the value is already an array, it will be wrapped in
another one.

This does highlight a limitation of this API thought. What if the values
themselves are arrays?

h = { "foo" => [1,2] }
h.merge_add!({ "foo" => [3,4] })
# do you want h = { "foo" => [[1,2], [3,4]] } ?

In that case, it would be better to insist that the hash values are
arrays in the first place, as shown by Stefan Rusterholz's collector
example earlier.

Also, are you happy for duplicate values to be inserted?

h = { "foo" => "bar" }
h.merge_add!({ "foo" => "bar" })
# h = { "foo" => ["bar", "bar"] } ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,197
Messages
2,571,040
Members
47,634
Latest member
RonnyBoelk

Latest Threads

Top