P
Phil Carmody
Juha Nieminen said:You would have hard time proving that.
He shouldn't. It should be pretty clear. No hash table needs
to examine each element more than once.
Phil
Juha Nieminen said:You would have hard time proving that.
Nick said:CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
Phil Carmody said:He shouldn't. It should be pretty clear. No hash table needs
to examine each element more than once.
user923005 said:Cuckoo hashing has guaranteed O(1) lookup and delete and amortized
O(1) insert.
[ ... ]Just as a note, the common claim that hash table are O(1)
per access whereas search trees are O(log n) per access is
misleading, and is arrived at by an apples and oranges
comparison. The number of probes in a hash table is O(1)
whereas the number of probes in a search tree is O(log n).
However the cost of computing the hash code for independent
keys is O(m) where m is the average key length, which is
necessarily greater than log2(n). In comparison based trees
the cost of the comparison is O(m), i.e., the probe cost is
O(m). In radix based trees the probe cost is O(1). If O(m)
= O(n) the execution costs per probe (ignoring access
issues) are:Keeping in mind, however, that m and n will usually be quite a
bit difference, so any similarity between O(m) and O(n) is
purely coincidental. In reality, the performance of hash
tables tends to be rather different in general than the
performance of tree-based systems (radix or comparison).
Funny that no one has mentionned it, but the performance of a
hash table is very, very dependent on the quality of the hashing
function. With a perfect hashing function, the access time is
O(1), but with a poor hashing function, it can be significantly
worse---even to the point of O(n). Unless you're sure that you
can define a good hashing function, you're probably better off
with a balanced tree. (The radix tree is an interesting
suggestion, often ignored. Given the sparseness of the entries,
it may result in a significant lack of locality and a too many
cache misses. But given typical word length, this isn't
certain, and it has other advantages, like not requiring the
word to be actually assembled in memory.)
Note, however, that there are factors that don't show up in a
big-O comparison that can still be quite substantial,
especially for collectiosn of reasonble sizes. For one obvious
one, consider the common case (like this one) where the keys
are strings. Hashing the string requires looking at the whole
string, but comparing two strings will often look at _far_
fewer -- for the first few probes, it'll typically only
involve looking at one or two characters.From a theoretical viewpoint, this effect is negligible -- but
given the number of words in a typical vocabulary, it can be
quite substantial.
I've done some actual measures (some time back), see
http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. In
this case, I only measured access, not insertion, but for the
list of words in the ispell French dictionary, I found that a
hash table with a good hashing function (FNV or my Mersenne
prime algorithm) was better than three times faster; using a
list of actual program symbols, the difference was still better
than 2.5 (the French word list contained 94883 different words,
the program symbols 15047).
I've since added some additional hashing algorithms, and tested
on differernt machines (which does make a difference), but I've
not gotten the results written up. I think I'll add an
implementation using a radix tree, just for good measure.
That's so true. However, we shouldn't completely dismiss
"micro-optimization" and "hacker-optimization" once we have
come up with the fastest possible overall algorithm.
Even though two implementations might both by eg. O(n lg n)
with a rather small constant factor, one of them might still
be 10 times faster than the other if it performs clever
low-level tricks inside the innermost loop (which might be run
eg. millions of times).
Just as a note, the common claim that hash table are O(1) per
access whereas search trees are O(log n) per access is
misleading, ...computing the hash code ...
is necessarily greater
than log2(n)....
hash table - O(log n)
James said:I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).
I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).
Juha said:[...]
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.
It's always easier to make a bug-free program fast than to
bug-fix a fast program.
Juha Nieminen said:Wikipedia says: "Lookup requires inspection of just two locations in
the hash table, which takes constant time in the worst case".
I don't understand how that is possible. Suppose an element is
inserted into the table. Then later a new element displaces this first
element to a second location. Then later another new element, to be put
in this second location, displaces this first element to a third
location. How exactly do you find this first element anymore? It's not
in either of the two alternative locations defined by the two hashing
functions for that element.
The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).
[ ... hash table operations ]
The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).
That depends on how the hash table is designed. It's entirely possible
(for one example) to create a hash table that uses balanced trees for
collision resolution. This gives a worst case of O(log N), which isn't
really all that awful at all.
Jerry Coffin said:[ ... hash table operations ]
That depends on how the hash table is designed. It's entirely possible
Richard said:It doesn't work like that, though you had me perplexed for a
moment. The first element goes back to its original location,
displacing the second. Thus, suppose that h0, h1, etc are
locations and the first three element are e0, e1, e2, and that
their address pairs are:
e0: h0,h1
e1: h0,h2
e2: h1,h3
Initially e0 is inserted in location h0. When e1 is inserted, e0
is moved to h1, and e1 is inserted in h0. The contents of the
table look like this:
h0: e1
h1: e0
h2: -
h3: -
Now e2 is inserted; it goes into location h1, displacing e0. In
turn e0 goes back to its alternate location h0, displacing e1,
which has to go to its alternate location, h2. The table now
looks like this:
h0: e0
h1: e2
h2: e1
h3: -
Phil said:Jerry Coffin said:(e-mail address removed) says...
[ ... hash table operations ]
uhuh - specifically cuckoo hash operations.
That depends on how the hash table is designed. It's entirely
possible ...
[... to not use a cuckoo hash ...]
Yes, but then it wouldn't be the hash design whose Big-Ohs were
being debated.
I'm not 100% sure I fully understand it, but anyways, it sound to me
like insertion is O(n) like that (each displaced element could
potentially displace an existing element, which could potentially
displace an existing element, etc, potentially going through O(n)
elements without it going into an infinite loop).
I don't even see how insertion is amortized constant-time. It sounds
to me like insertion is not only O(n), but also could potentially be
amortized linear-time, especially if poor hashing functions are chosen.
There was no conditional ("with properly designed hashing functions")
in the description given by wikipedia for the amortized constant time
insertion, but it made it sound like it's *always* amortized constant
time, regardless of the chosen hashing functions... I can't understand
how it's possible.
Heck, in fact I think it's rather trivial to demonstrate that
insertion can be O(infinite), by simply using these two hashing functions:
f1(x) = 0
f2(x) = 1
With these hashing functions it's impossible to insert more than two
elements in the hash table. Trying to insert a third one is impossible.
Of course these two functions are absolutely extreme, but I think they
demonstrate that the hash table can *not* have constant-time insertion
(or even linear-time, for that matter) for all possible hashing
functions. At most this may be possible for *some* hashing functions.
user923005 said:Big-O analysis is an examination of average case behavior.
Juha said:Not true. Big-O denotes the asymptotic worst case behavior.
One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always denotes the
upper bound of the asymptotic behavior of the algorithm. In other words,
any asymptotic behavior of the algorithm with any given data will always
fall below the big-O function curve. If with some data the algorithm
behaves asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.
For example quicksort is O(n^2). (Don't believe what people say. It
*is* O(n^2), period.)
Juha said:Not true. Big-O denotes the asymptotic worst case behavior.
One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always
denotes the upper bound of the asymptotic behavior of the
algorithm. In other words, any asymptotic behavior of the
algorithm with any given data will always fall below the big-O
function curve. If with some data the algorithm behaves
asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.
For example quicksort is O(n^2). (Don't believe what people say.
It *is* O(n^2), period.)
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.