hash value inconsistencies

T

Tim Pease

ruby -e "puts 'test'.hash"

Should this output the same integer value on all platforms where Ruby
can run?

* Windows
ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-
mswin32]
ruby -e "puts 'test'.hash" #=> -914358341

* Mac 10.5
ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-
darwin8.10.3]
ruby -e "puts 'test'.hash" #=> -914358341

* Linux 2.6 Kernel
ruby --version #=> ruby 1.8.6 (2007-06-07 patchlevel 36) [x86_64-
linux]
ruby -e "puts 'test'.hash" #=> 1233125307

ruby --version #=> ruby 1.8.5 (2006-12-04 patchlevel 2) [x86_64-
linux]
ruby -e "puts 'test'.hash" #=> 1233125307


It appears not! So, any suggestions on generating an ID number for an
object that is unique yet consistent across different platforms? I'd
like to have some method that I could call on an object that would
return a reproducible value that would uniquely identify that object.

Thoughts?

TwP
 
G

Gerry Ford

Tim said:
ruby -e "puts 'test'.hash"

Should this output the same integer value on all platforms where Ruby
can run?

* Windows
ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-
mswin32]
ruby -e "puts 'test'.hash" #=> -914358341

I can't speak to the larger problem you ask, but I did verify that this
negative one million was the output for windows. Maybe we could alter
the test slightly to figure out what ruby is doing.

Cheers.
 
C

Clifford Heath

Tim said:
Should this output the same integer value on all platforms where Ruby
can run?

Perhaps, but if you read the below, you'll see why you should never rely
on it.
It appears not! So, any suggestions on generating an ID number for an
object that is unique yet consistent across different platforms? I'd
like to have some method that I could call on an object that would
return a reproducible value that would uniquely identify that object.

That's not possible. There is more entropy in an arbitrary object than
can be represented in a FixNum. Basic coding theory stuff. If it was
possible, then you could code all the data in all the databases in the
world into a single Fixnum :).

If you want a fixed-length code that's sufficiently likely to be unique
that you can be almost certain that you'll never see a false duplicate,
you need to use a cryptographic hash function. I recommend SHA-256, but
you might survive with a weaker one like MD5 or SHA-1. They take a lot
more work to calculate than is justified for Ruby's hash keys though!

With these functions, the probability of a population containing a false
duplicate is approximately 50% when the population contains sqrt(2^N),
(or 2*(N/2)) distinct items, where N is the number of bits in the
checksum. For SHA-256, that means you need 2^128 items before you have
a reasonable chance of a collision. All of the programs you'll ever write,
running for your entire life, will only create a tiny fraction of this
many objects, so the chance of you ever seeing a collision is tiny.

That might sound risky still, but all of e-commerce is built on the
principle. If it's good enough for that, it's good enough for you :)

Clifford Heath.
 
T

Tim Pease

Perhaps, but if you read the below, you'll see why you should never
rely
on it.


That's not possible. There is more entropy in an arbitrary object than
can be represented in a FixNum. Basic coding theory stuff. If it was
possible, then you could code all the data in all the databases in the
world into a single Fixnum :).


Darn information theory! I just need a fixnum. The number of objects
we are creating is pretty tiny -- maybe 100.

I was quite surprised that the Ruby "hash" method is not consistent
across platforms. The solution is to roll my own hash function that
produces consistent results. Just wondering about the more general
questions regarding the built in hash function.

Blessings,
TwP
 
G

Gerry Ford

Tim said:
Darn information theory! I just need a fixnum. The number of objects
we are creating is pretty tiny -- maybe 100.

I was quite surprised that the Ruby "hash" method is not consistent
across platforms. The solution is to roll my own hash function that
produces consistent results. Just wondering about the more general
questions regarding the built in hash function.
What in particular are you going to hash? Under what circumstances do
you want to bomb out?
 
R

Robert Klemme

2008/2/6 said:
Darn information theory! I just need a fixnum. The number of objects
we are creating is pretty tiny -- maybe 100.

I was quite surprised that the Ruby "hash" method is not consistent
across platforms. The solution is to roll my own hash function that
produces consistent results.

A regular hash function is a bad candidate for a unique id anyway.
I'd rather use a MD5 or something like that. If your strings are
reasonably short you can as well convert them to Fixnums but then
again: why bother and not directly use the string?
Just wondering about the more general
questions regarding the built in hash function.

There is no need for a hash function to be consistent across
platforms. Why should it?

Kind regards

robert
 
M

Mark Brassard

Tim said:
ruby -e "puts 'test'.hash"

Should this output the same integer value on all platforms where Ruby
can run?

* Windows
ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-
mswin32]
ruby -e "puts 'test'.hash" #=> -914358341

* Mac 10.5
ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-
darwin8.10.3]
ruby -e "puts 'test'.hash" #=> -914358341

* Linux 2.6 Kernel
ruby --version #=> ruby 1.8.6 (2007-06-07 patchlevel 36) [x86_64-
linux]
ruby -e "puts 'test'.hash" #=> 1233125307

ruby --version #=> ruby 1.8.5 (2006-12-04 patchlevel 2) [x86_64-
linux]
ruby -e "puts 'test'.hash" #=> 1233125307


It appears not! So, any suggestions on generating an ID number for an
object that is unique yet consistent across different platforms? I'd
like to have some method that I could call on an object that would
return a reproducible value that would uniquely identify that object.

Thoughts?

TwP


The problem is that some of your test machines are 64-bit and some are
32-bit. I Ran the same tests on some Macs running Snow Leopard(64-bit)
and Leopard(32-bit) and Linux (64-bit) and Linux (32-bit) and all
results were consistent over OSes with the same bits.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,736
Latest member
zacharyharris

Latest Threads

Top