Dann said:
Dann Corbit wrote:
(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the
value
based on a given key value.
I am quite conversent in C++ but this needs be implemented in C.
Would appriciate any idea how to achieve this in C?
Use a binary search tree. [...]
Huh? Doesn't this assume you have a lexical ordering on keys?
<shrug> Any key can be expressed as a bit pattern. Bit patterns can
be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.
Why
don't we go for a hash table instead?
I like trees.
A hash table is much faster than a tree for all operations.
This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.
A tree structure requires more memory than a hash table.
But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.
I guess that the hash computation will be as fast as comparison. Since
it
happens N times instead of N*log(N) times, the hash computation will be
faster:
You mean on initial inserts. That's fine, but a common scenario is to
be performance limited by searching.
Searching a well designed hash table is O(1). See the article referenced
elsethread.
Lol! Did you just seriously quote that web page to me in an effort to
educate me about hash functions? Try this page:
http://burtleburtle.net/bob/hash/doobs.html .
It was not an attempt to educate you about hash functions. It was an
attempt to show you hash functions which are very fast to compute and which
have excellent dispersion. I am aware of the hash functions on that page,
and use them in a commercial product I wrote, along with FNV hash, UMAC and
several others.
It is? Yes.
Why do you think this? Experience.
What if the programmer is not in
control of the data set?
You mean like a database? The programmer will still have access to some
kind of estimate of the set size (cardinality). In the rare case of
streaming data with unknown start and end a hash table will still be better,
given the right design.
If you are paranoid about it, you can make a hash table of skiplists.
I know -- but many of them are a complete waste of time because they
are actually unusually *SLOW* because of the way they have been
implemented (which defeats the purpose, of course -- hashes are
supposed to be faster than trees in the best case scenario). Of the
ones I looked at, James' at least does not make this mistake.
If it is memory based, hash tables are going to be faster unless they are
badly designed. If the data is permanent, then it is a lot harder to get
hashes to be faster, but the advanced database systems do manage it.