Performance of ruby hashes

U

unni.tallman

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.
 
R

Robert Klemme

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

Hashes *are* pretty fast. Please state what actual performance problem
you have. Otherwise nobody will really be able to help you.

One reason for a slow hash can be a slow hash function or a hash
function that creates badly distributed values. That cannot be
attributed to the Hash implementation but to the implementation of your
hash keys.

robert
 
T

Tomasz Wegrzanowski

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

Indexing them by symbols instead of strings helps a bit.
But hashes are so fast that it's hard to even make a meaningful
benchmark. For example in the following meaningless benchmark
symbol-hashes are faster than string-hashes:

$ time ./meaningless_hash_benchmark.rb --symbols
real 0m16.117s
user 0m14.009s
sys 0m0.485s
$ time ./meaningless_hash_benchmark.rb
real 0m18.525s
user 0m16.306s
sys 0m0.475s
$ cat ./meaningless_hash_benchmark.rb
#!/usr/bin/ruby

N = 2000000

# Lame
if ARGV.empty?
keys = File.readlines("H").map{|x| x.chomp}
else
keys = File.readlines("H").map{|x| x.chomp.to_sym}
end

h = {}
keys.each{|k| h[k] = rand }

k0 = keys[0]
k1 = keys[1]
k2 = keys[2]

# A meaningless loop
N.times{|i|
h[k0] += h[k1]
h[k1] += h[k2]
h[k2] += h[k0]
}
$

15% is pretty impressive for a benchmark dominated by method calls.

That's related to how hashes work. To get a_hash[a_key]
* a_key is converted to a number (hashed)
* the number is looked up in a_hash's internal array
** we can get 0 answer - then a_key is definitely not there
** we can get 1 answer - then verify it by a_real_key == a_key
** we can get 2+ answers - then verify them all until match is found
(this case is rare, and high numbers here are even more rare)
* So if we found a_real_key, such that a_real_key == a_key, then
return a_value_at_a_real_key
* == can actually cost a lot. If you use Strings, their contents need
to be compared. If you use Symbols, you only check whether they're the
same object. Each Symbol object is different, so this is enough.
 
A

ara.t.howard

Indexing them by symbols instead of strings helps a bit.
But hashes are so fast that it's hard to even make a meaningful
benchmark. For example in the following meaningless benchmark
symbol-hashes are faster than string-hashes:

bad idea for big hashes - symbols are never gc'd and swapping is never fast!

;-)

-a
 
T

Tomasz Wegrzanowski

bad idea for big hashes - symbols are never gc'd and swapping is never fast!

It would have to be totally extreme for that to become a problem,
like with millions different keys in your program. Then you're
absolutely correct, it can seriously hurt.

But in most normal cases, using symbols is a win.

And GC'ing them would require major changes to internal representation,
as they are direct values now, and as such not tracked at all.
Maybe in Ruby 3. :)
 
A

ara.t.howard

It would have to be totally extreme for that to become a problem,
like with millions different keys in your program. Then you're
absolutely correct, it can seriously hurt.

more like hundreds of thousands on a 4gb box: i've notice a few times with big
directory indexes, but yes - it must be large.

that being said, a web app which dynamically generates symbol keys and which
runs for long periods 'leaks'.

-a
 
T

Tomasz Wegrzanowski

You're behind on the news. ;)

Symbols just became a String subclass in Ruby 1.9.

Darn, and I've been using such an ancient snapshot of 1.9. :)

Still, garbage collecting Symbols probably won't be doable before Ruby 3.
All over the source IDs (integers basically) are used as symbols, not
Symbol objects.

You can see it even within Ruby:

foo = :some_weird_symbol.to_i #=> 10897
# :some_weird_symbol can be garbage collected now, right ?
# Or not quite ...
foo.to_sym #=> :some_weird_symbol

These numbers are used as method names and for all other internal
purposes. Symbols only pretend to be nice instances of Symbol class,
when nobody's looking, they're just numbers :)
 
J

Jeremy Henty

You're behind on the news. ;)

Symbols just became a String subclass in Ruby 1.9.

That doesn't preclude them being immediate values; an object's class
is independent of its internal TYPE. Check out "eigenclass - Changes
in Ruby 1.9" at
<URL:http://eigenclass.org/hiki.rb?Changes+in+Ruby+1.9> and see the
discussion under the heading "Hash#_compare_by_identity and
Hash#compare_by_identity?". This states that :c and :c are still
always the same object, whereas "c" and "c" are still in general
different. I think this shows that Symbols remain immediate. Unless
I've misunderstood something, of course.

Regards,

Jeremy Henty
 
R

Robert Klemme

The basic idea is that hash speed degrades as the size of each single-level
hash increases, and after a certain size has been reached, breaking the
hash into groups (layers if you prefer) improves performance.

What is the basis for this statement? The pure lookup speed certainly
does not degrade unless there is an increasing number of collisions
(which should be prevented by the load factor controlling the table's size).
Give us the details of the problem -- the nature of the keys, the size of
the data set, and so forth.

Definitively!

Kind regards

robert
 
H

Hugh Sasse

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.


Try freezing the keys.

http://www.eng.cse.dmu.ac.uk/~hgs/ruby/performance/

Don't forget all the usual caveats about profiling to find where
the expense is really. ["Programming Pearls", for one example citation]

Hugh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,215
Messages
2,571,113
Members
47,713
Latest member
LeliaB1379

Latest Threads

Top