R
Richard
thanks for perspective!
The next step is to reduce the number of bits you are encoding. You
said in another post that "1 collision in 10 million hashes would be
tolerable". So you need:
23.25349666421154
24 bits worth of key.
Base64 encoded, that's only 4 characters.
Actually, I probably just proved that I don't really understand how
probabilities work, so maybe what you really need is 32 or 48 or 64
bits.
Ah, der neueste und bis heute genialste Streich unsere großenZumindest nicht öffentlich!
<snip>
When doing these calculations, it's important to keep the birthday
paradox in mind (this is kind of counter-intuitive): The chance of a
collission raises tremendously when we're looking for *any* arbitrary
two hashes colliding within a certain namespace. The probability you've
calculated is the pre-image probability (which you also again need to
multiply with a factor of two, because when trying to collide one given
hash, in the mean case you'll only have to search *half* the namespace
before finding a collision).
<snip>
Te birthday paradox could have been important had the OP stated his goal
differently. What he said was:
"""Ideally I would want to avoid collisions altogether. But if that means significant extra CPU time then 1 collision in 10 million hashes would be tolerable."""
That means that he's willing to do the necessary overhead of collision
resolution, once in every 10 million lookups. That's not the same as
saying that he wants only one chance in 10 million of having ANY
collisions among his data items.
Ah, der neueste und bis heute genialste Streich unsere großenZumindest nicht öffentlich!
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.