hash key performance

  • Thread starter Matthias Wächter
  • Start date
M

Matthias Wächter

I am just wondering why different hash keys result in such a
significantly different performance.

I have written 11 simple test cases, all using the following template:

num=ARGV[0].to_i

var=:key

a={:key => "test"}

num.times {
b=a[:key][0]
}

Beside this test_symbol.rb variant, I replace :key in the example
above with false (test_false.rb), an integer value 1 (test_int.rb),
with nil (test_nil.rb), with "id" (test_string.rb). In six other
variants I replace the key with var, and set var to a symbol
(test_var_symbol.rb), false (test_var_false.rb), an integer value
(test_var_int.rb), nil (test_var_nil.rb), "id" (test_var_string.rb),
and "id".frozen (test_var_frozen.rb).


This is my performance chart using time (dual-core CPU, called with
num=30_000_000). I ran it once (dropped timing values), then three
consecutive times and added the times (which is the sort key as well):

test_int.rb 37.029
test_var_int.rb 37.659
test_symbol.rb 37.773
test_var_symbol.rb 37.785
test_var_frozen.rb 37.852
test_var_string.rb 40.515
test_var_false.rb 45.468
test_false.rb 45.687
test_var_nil.rb 45.894
test_nil.rb 46.068
test_string.rb 53.032


No surprise is that repeated lookup of a string as a key results in
the worst performance, and using a Fixnum gives best performance.

More interesting is a bulk of tests that relate to using symbols and
variables for int, symbol and frozen string, all within timing
accuracy. A noticable, but still small gap can be seen for a
nonfrozen string in a variable.

But here it comes: Using false and, even worse, nil (in either
literal or variable form) is way slower (a good 20%) than using
integers and symbols, even if they follow the same "immutable
object" philosophy of fixed object_ids.

Why is that? What's so special about false and nil to have such a
degraded performance as a hash key?

- Matthias
 
M

Marcin Mielżyński

Matthias said:
But here it comes: Using false and, even worse, nil (in either
literal or variable form) is way slower (a good 20%) than using
integers and symbols, even if they follow the same "immutable
object" philosophy of fixed object_ids.

Why is that? What's so special about false and nil to have such a
degraded performance as a hash key?

hash.c:rb_any_hash is the answer to your question (funcall('hash') is
discarded only for Fixnums, Symbols and Strings. Note that String
numbers would be way better if it's hash was cached until first
destructive operation.

lopex
 
M

Matthias Wächter

hash.c:rb_any_hash is the answer to your question (funcall('hash') is
discarded only for Fixnums, Symbols and Strings.

So the first two cases of "switch (TYPE(a))" should be enhanced with
checks for nil, true and false like this:

switch (TYPE(a)) {
case T_FIXNUM:
case T_SYMBOL:
+ case T_NIL:
+ case T_FALSE:
+ case T_TRUE:
return (int)a;
break;


Objections?

- Matthias
 
R

Ryan Davis

So the first two cases of "switch (TYPE(a))" should be enhanced with
checks for nil, true and false like this:

switch (TYPE(a)) {
case T_FIXNUM:
case T_SYMBOL:
+ case T_NIL:
+ case T_FALSE:
+ case T_TRUE:
return (int)a;
break;

Send that and your original timings to ruby-core@ and/or file a bug =20
on rubyforge in the ruby project.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,849
Latest member
Fira

Latest Threads

Top