Unexpected problem: hash[key] << value

J

Joey Zhou

# ruby 1.9.2p180 (2011-02-18) [i386-mingw32]

get_page_hash = {}
get_page_hash.default = []

File.foreach("page.txt") do |line|
word, page = line.chomp.split(':')
get_page_hash[word] << page # the problem is here
end

p get_page_hash['Aword'] # => ["1", "2", "3", "4", "5"]
p get_page_hash['Bword'] # => ["1", "2", "3", "4", "5"]
p get_page_hash.default # => ["1", "2", "3", "4", "5"]

__END__

content of page.txt:
Aword:1
Bword:2
Cword:3
Aword:4
Dword:5


Simple program, clear purpose. I don't know why get_page_hash.default
becomes ["1", "2", "3", "4", "5"], it seems radiculous.

Only if I modify the very line to:

get_page_hash[word] += [page]

I get what I want:

p get_page_hash['Aword'] # => ["1", "4"]
p get_page_hash['Bword'] # => ["2"]
p get_page_hash.default # => []

I think use "<<" maybe intuitive, but the result is unexpected. What's
wrong with it?

Thank you!

Joey
 
M

Mark Beek

I just stumbled across this surprising behavior myself. It's the first
counter-intuitive mechanism I have come across in my short sweet
experience with Ruby.

Check this thread for an elaborate discussion (in English) of this
behavior:

http://www.ruby-forum.com/topic/134424#new

Here's my take:

Before we look at your case, let's look at a case that actually works as
you'd expect: initializing a hash with a Fixnum:

Code
h = Hash.new(0)
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] += 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result:
h['key1']: 0
h['key2']: 0
after updating key1
h['key1']: 1
h['key2']: 0

Perfect! Mighty handy for word count programs and all sorts of other use
cases.

Which would lead you to expect the following behavior when you
initialize a hash with an empty array, then append:

Code
h = Hash.new([])
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] << 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result
h['key1']: []
h['key2']: []
after updating key1
h['key1']: [1]
h['key2']: [] #<-- what you'd expect, but NOT what you get

The actual result is the following:

h['key1']: []
h['key2']: []
after updating key1
h['key1']: [1]
h['key2']: [1]
. . and so on

The problem is that when you initialize a hash with a mutable default
value, all of the defaults are actually references to THE SAME OBJECT.
So when you append to the default array in one hash value, you're
actually changing them all. Witness:

puts "#{h['key1'].object_id}"
puts "#{h['key2'].object_id}"
puts "#{h['key3'].object_id}"

Result:
116528
116528
116528

By contrast, when you update a value with the += construction rather
than <<, you're actually creating a new array object for that value. So
that particular one is no longer referring to the default value.

The thread referred to above mentions other ways to get what you'd
expect with a default empty array. Still, I gotta admit that I simply
don't understand why Hash.new([]) works the way it does. Who would want
to create a Hash table where changing a single value can potentially
change all other values, past, present, and to come. Talk about side
effects gone wild!

If anyone can explain the rationale for this behavior,I'd really
appreciate it. I'm probably just missing something.
 
B

Brian Candler

Mark Beek wrote in post #986380:
Which would lead you to expect the following behavior when you
initialize a hash with an empty array, then append:

Code
h = Hash.new([])
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] << 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result
h['key1']: []
h['key2']: []
after updating key1
h['key1']: [1]
h['key2']: [] #<-- what you'd expect, but NOT what you get

To get that behaviour, you need the Hash to create a *new* empty array
for every unknown element. What I do is:

h = Hash.new { |o,k| o[k] = [] }
The problem is that when you initialize a hash with a mutable default
value, all of the defaults are actually references to THE SAME OBJECT. ...
If anyone can explain the rationale for this behavior,I'd really
appreciate it. I'm probably just missing something.

The question is, how else could it work in the general case?

Perhaps you pass a prototype object, and the Hash constructor would call
dup on that object every time it needs a new distinct instance? No,
that doesn't work, because .dup is only a shallow copy. Check out:

a = [[1,2],[3,4]]
b = a.dup
b[0] << 3
a
b

Perhaps you could pass a Class, and then Hash would call your class's
new method every time it wanted an instance? Sure, you could pass Array
in this case, but it's quite restrictive. And the simple case of
Hash.new(0) wouldn't work.

So to work in the general case you have to give it some code to execute
to create a new object every time one is needed - a factory block.

The same applies with arrays: compare

a = Array.new(5, [])
b = Array.new(5) { [] }
puts a.map { |x| x.object_id }
puts b.map { |x| x.object_id }

Regards,

Brian.
 
R

Robert Klemme

I just stumbled across this surprising behavior myself. It's the first
counter-intuitive mechanism I have come across in my short sweet
experience with Ruby.

Check this thread for an elaborate discussion (in English) of this
behavior:

http://www.ruby-forum.com/topic/134424#new

Here's my take:

Before we look at your case, let's look at a case that actually works as
you'd expect: initializing a hash with a Fixnum:

Code
h =3D Hash.new(0)
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] +=3D 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result:
h['key1']: 0
h['key2']: 0
after updating key1
h['key1']: 1
h['key2']: 0

Perfect! Mighty handy for word count programs and all sorts of other use
cases.

Which would lead you to expect the following behavior when you
initialize a hash with an empty array, then append:

Code
h =3D Hash.new([])
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"
h['key1'] << 1
puts "after updating key1"
puts "h['key1']: #{h['key1']}"
puts "h['key2']: #{h['key2']}"

Result
h['key1']: []
h['key2']: []
after updating key1
h['key1']: [1]
h['key2']: [] =A0#<-- what you'd expect, but NOT what you get

The actual result is the following:

h['key1']: []
h['key2']: []
after updating key1
h['key1']: [1]
h['key2']: [1]
. . . and so on

The problem is that when you initialize a hash with a mutable default
value, all of the defaults are actually references to THE SAME OBJECT.
So when you append to the default array in one hash value, you're
actually changing them all. Witness:

puts "#{h['key1'].object_id}"
puts "#{h['key2'].object_id}"
puts "#{h['key3'].object_id}"

Result:
116528
116528
116528

By contrast, when you update a value with the +=3D construction rather
than <<, you're actually creating a new array object for that value. So
that particular one is no longer referring to the default value.

Well, in this case actually the better idiom is this:

h =3D Hash.new {|h,k| h[k] =3D []}
...

h[key] << something

Reason: Array#+ will create a new object every time you add something
while the idiom presented above only ever creates one Array per key.
The thread referred to above mentions other ways to get what you'd
expect with a default empty array. Still, I gotta admit that I simply
don't understand why Hash.new([]) works the way it does. Who would want
to create a Hash table where changing a single value can potentially
change all other values, past, present, and to come. Talk about side
effects gone wild!

Well, first of all this is the default return value. This does not
necessarily mean that it will be modified. You might do something
like

h =3D Hash.new("missing".freeze)
...

puts h[key]

And then of course there is a very common idiom

counters =3D Hash.new 0
...
counters[key] +=3D 1
If anyone can explain the rationale for this behavior,I'd really
appreciate it. I'm probably just missing something.

Hopefully that explanation helps.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
J

Joey Zhou

Robert Klemme wrote in post #986401:
Well, in this case actually the better idiom is this:

h = Hash.new {|h,k| h[k] = []}
This is actually what I need. Thank you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top