Modifying a hash key

Alex Shulgin · Sep 16, 2007

Hi,

I'm totally new to Ruby, so please forgive me if I'm asking a trivial
question.

I've wrote this simple program to demonstrate my first major
astonishment with Ruby:

$ cat test-hash.rb
hash={}
key=[1]
hash[key]=1
p hash
key << 2
hash[key]=2
p hash
puts
$ ruby ./test-hash.rb
{[1]=>1}
{[1, 2]=>2, [1, 2]=>1}

$

This suggests me that the keys are actually stored in the hash by
reference, not by value as I would expect from my prior experience
with other languages.

I know I can use key.clone() to overcome this, but the question is:
what is the reason for hash to store it's keys by reference?

My Ruby and system versions are as follows:

$ ruby --version
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
$ uname -a
Linux zrbite 2.6.21-2-686 #1 SMP Wed Jul 11 03:53:02 UTC 2007 i686 GNU/
Linux

Cheers,
Alex

Thibaut Barrère · Sep 16, 2007

Hi,

I'd also be interested in renaming one or more hash keys. My use case:
ActiveWarehouse-ETL (http://activewarehouse.rubyforge.org/) is an ETL
tool handling rows of data as Hash, and it's very common to have to
rename a field in that case. So far here's what I do:

# rename a bunch of fields
def rename_fields!(row,fields_to_rename,fields_new_names)
throw "Array size mismatch" unless fields_new_names.size ==
fields_to_rename.size
mapping = Hash[*fields_to_rename.zip(fields_new_names).flatten]
mapping.each { |old_name,new_name| row[new_name] = row[old_name];
row.delete(old_name) }
end

Does anyone know a built-in way of achieving something similar ?

best

Thibaut

Alex Shulgin · Sep 17, 2007

Hi,

In message "Re: Modifying a hash key"

|This suggests me that the keys are actually stored in the hash by
|reference, not by value as I would expect from my prior experience
|with other languages.

Hash stores its values, as its name suggests, by hash values from
keys. So, if you modifies the key, and subsequently the hash value of
the key, it screws up. As a general rule, don't modify the keys, or
if you really need to modify the key for some unknown reason, call
rehash method on the hash.

So the real answer to my question (what is the reason for hash to
store it's keys by reference?) might be: trading rule of a thumb for
speed and memory? If no one is ever going to modify the hash key,
there is no reason to copy it...

OK, I think I got it.

If someone wonder how could I come up with modifying the hash keys:
this is my second program in Ruby and was coding a Ruby version of a
speech generator example from the Pike & Kernigan's "Practical
programming". Here is the code snippet:

state={}
prefix=[]
while $stdin.eof?
w=... # read a word
state[prefix] << w
...
prefix << w # <-- Bang! The hash key is changed...

end

Cheers,
Alex

Robert Klemme · Sep 17, 2007

2007/9/17 said:
So the real answer to my question (what is the reason for hash to
store it's keys by reference?) might be: trading rule of a thumb for
speed and memory?

No, the reason is that you can only store by reference in Ruby (there
are some internal optimizations for Fixnum and the like but the code
still basically behaves the same).

If no one is ever going to modify the hash key,
there is no reason to copy it...

There is also the issue that a key's hash value will likely change if
you change a Hash key. So it is *never* a good idea to modify a Hash
key. Note though that there is an exception: Strings are cloned if
they are not frozen to avoid nasty effects.

OK, I think I got it.

If someone wonder how could I come up with modifying the hash keys:
this is my second program in Ruby and was coding a Ruby version of a
speech generator example from the Pike & Kernigan's "Practical
programming". Here is the code snippet:

state={}
prefix=[]
while $stdin.eof?

Are you serious about the line above? I'd rather have expected "until" there.

I'd do this, note all the freezing in order to make errors with
changing keys obvious.

state = Hash.new {|h,k| h[k]=[]}
prefix = [].freeze

ARGF.each do |line|
line.scan /\w+/ do |word|
state[prefix] << word.freeze
(prefix += [word]).freeze
# or: prefix = (prefix.dup << word).freeze
end
end

Btw, there's probably a more efficient way of storing this if you
introduce a specialized class for prefix chaining. Probably like
this:

Prefix = Struct.new :word,

revious

w=... # read a word
state[prefix] << w
...
prefix << w # <-- Bang! The hash key is changed...

You can #dup the prefix or use +:
prefix += [w]

Kind regards

robert

This will implicitly create a new Array.

Kind regards

robert

Alex Shulgin · Sep 17, 2007

No, the reason is that you can only store by reference in Ruby (there
are some internal optimizations for Fixnum and the like but the code
still basically behaves the same).

Yes, but hash implementation could make a copy of the key when it
inserts new elements. Of course, this will slow down the code in
significant part of programs, so there is the trade-off I've talked
about.

state={}
prefix=[]
while $stdin.eof?

Click to expand...

Are you serious about the line above? I'd rather have expected "until" there.

Sorry, of course not. It was pulled off the top of my head, since no
real code was at hand (I've posted that from work, and the Ruby code
is something I keep at home (-: ).

I'd do this, note all the freezing in order to make errors with
changing keys obvious.

state = Hash.new {|h,k| h[k]=[]}
prefix = [].freeze

ARGF.each do |line|
line.scan /\w+/ do |word|
state[prefix] << word.freeze
(prefix += [word]).freeze
# or: prefix = (prefix.dup << word).freeze
end
end

Uh-oh... this freeze stuff seems overly complicated to me.

Btw, there's probably a more efficient way of storing this if you
introduce a specialized class for prefix chaining. Probably like
this:

Prefix = Struct.new :word, revious

w=... # read a word
state[prefix] << w
...
prefix << w # <-- Bang! The hash key is changed...

Click to expand...

You can #dup the prefix or use +:
prefix += [w]

My real code is as follows:

require 'scanf'

NPREFIX = 2

$nwords = ARGV[0] ? ARGV[0].to_i() : 1000

#
# acquire knowledge
#
state = {}
prefix = []
while not $stdin.eof? do
# w, = scanf("%s")
words = $stdin.gets().scan(/[^\s]+/)
words.each do |w|
suf = state[prefix]
if not suf
suf = state[prefix.clone()] = []
end
suf << w
if prefix.length >= NPREFIX
prefix.shift
end
prefix << w
end
end
state[prefix] = []

#
# generate pseudo-text
#
prefix = []
count = 0
while count < $nwords do
suf = state[prefix]
if suf.empty?
break
end
w = suf[rand(suf.length)]
print w + " "
if prefix.length >= NPREFIX
prefix.shift
end
prefix << w
count += 1
end

puts

May be an eye of experienced programmer could catch some more odd
places in my code? See, I'm just a Ruby newbie... Please do not
waste more of your time than really necessary on this.

Cheers,
Alex

Robert Klemme · Sep 17, 2007

Yes, but hash implementation could make a copy of the key when it
inserts new elements. Of course, this will slow down the code in
significant part of programs, so there is the trade-off I've talked
about.

That's the exact reason why this optimization was choosen for Strings only.

state={}
prefix=[]
while $stdin.eof?

Click to expand...

Are you serious about the line above? I'd rather have expected "until" there.

Click to expand...

Sorry, of course not. It was pulled off the top of my head, since no
real code was at hand (I've posted that from work, and the Ruby code
is something I keep at home (-: ).

I'd do this, note all the freezing in order to make errors with
changing keys obvious.

state = Hash.new {|h,k| h[k]=[]}
prefix = [].freeze

ARGF.each do |line|
line.scan /\w+/ do |word|
state[prefix] << word.freeze
(prefix += [word]).freeze
# or: prefix = (prefix.dup << word).freeze
end
end

Click to expand...

Uh-oh... this freeze stuff seems overly complicated to me.

Well, it's not necessary - I just put it there in order to find bugs.

Btw, there's probably a more efficient way of storing this if you
introduce a specialized class for prefix chaining. Probably like
this:

Prefix = Struct.new :word, revious

w=... # read a word
state[prefix] << w
...
prefix << w # <-- Bang! The hash key is changed...

Click to expand...

You can #dup the prefix or use +:
prefix += [w]

Click to expand...

My real code is as follows:

require 'scanf'

NPREFIX = 2

$nwords = ARGV[0] ? ARGV[0].to_i() : 1000

#
# acquire knowledge
#
state = {}
prefix = []
while not $stdin.eof? do
# w, = scanf("%s")
words = $stdin.gets().scan(/[^\s]+/)
words.each do |w|
suf = state[prefix]
if not suf
suf = state[prefix.clone()] = []
end
suf << w
if prefix.length >= NPREFIX
prefix.shift
end
prefix << w
end
end
state[prefix] = []

#
# generate pseudo-text
#
prefix = []
count = 0
while count < $nwords do
suf = state[prefix]
if suf.empty?
break
end
w = suf[rand(suf.length)]
print w + " "
if prefix.length >= NPREFIX
prefix.shift
end
prefix << w
count += 1
end

puts

You find my code "overly complicated"? Amazing...

Cheers

robert

Alex Shulgin · Sep 18, 2007

I'd do this, note all the freezing in order to make errors with
changing keys obvious.
state = Hash.new {|h,k| h[k]=[]}
prefix = [].freeze
ARGF.each do |line|
line.scan /\w+/ do |word|
state[prefix] << word.freeze
(prefix += [word]).freeze
# or: prefix = (prefix.dup << word).freeze
end
end

Click to expand...

Click to expand...

Uh-oh... this freeze stuff seems overly complicated to me.

Click to expand...

Well, it's not necessary - I just put it there in order to find bugs.
[snip]

You find my code "overly complicated"? Amazing...

Oh, sorry, I didn't want to hurt anyone...

First of all your and mine code do different things, and most
importantly that freeze stuff _really_ scared me. I thought it was
some kind of garbage-collection voodoo. ;-)

Now I see it may be safely removed after debugging the code. This way
your code looks much better, thanks!

Alex

Robert Klemme · Sep 18, 2007

I'd do this, note all the freezing in order to make errors with
changing keys obvious.
state = Hash.new {|h,k| h[k]=[]}
prefix = [].freeze
ARGF.each do |line|
line.scan /\w+/ do |word|
state[prefix] << word.freeze
(prefix += [word]).freeze
# or: prefix = (prefix.dup << word).freeze
end
end
Uh-oh... this freeze stuff seems overly complicated to me.

Click to expand...

Well, it's not necessary - I just put it there in order to find bugs.
[snip]
You find my code "overly complicated"? Amazing...

Click to expand...

Oh, sorry, I didn't want to hurt anyone...

Not hurt, just astonished.

First of all your and mine code do different things

Yes and no: your code does more but as far as I can see the gathering
does basically the same in different ways.

, and most
importantly that freeze stuff _really_ scared me. I thought it was
some kind of garbage-collection voodoo. ;-)

No, it just prevents changing an instance. #freeze has nothing to do
with GC (unless you count not being able to overwrite a reference to an
instance with a reference to nil).

Now I see it may be safely removed after debugging the code. This way
your code looks much better, thanks!

Cheers

robert

Get the real object in a Hash key	13	Apr 15, 2011
NoMethodError: undefined method `scanf' for main:Object	4	Sep 16, 2007
Hash Reverse ?	8	Oct 26, 2009
hash value inconsistencies	6	Feb 5, 2008
Hash key types and equality of hash keys	2	Mar 1, 2012
How to implement a hash whose key is another hash?	4	Aug 20, 2007
Is it safe to override #hash and #eql? to enable hash key equality?	2	Mar 18, 2009
YAML serialization of Hash with Set key not loadable	3	Oct 19, 2010

Modifying a hash key

Alex Shulgin

Thibaut Barrère

Alex Shulgin

Robert Klemme

Alex Shulgin

Robert Klemme

Alex Shulgin

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads