Dann said:
Richard Heathfield said:
(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.
I am quite conversent in C++ but this needs be implemented in C.
Would appriciate any idea how to achieve this in C?
Use a binary search tree. [...]
Huh? Doesn't this assume you have a lexical ordering on keys?
<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.
Why
don't we go for a hash table instead?
I like trees.
A hash table is much faster than a tree for all operations.
This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.
But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.
Furthermore, if your hash table has been improperly sized, then you
will pay unusually high reprobe penalities, or may thrash your memory
when you don't need to. A hash table usually never has 100% density
efficiency -- on average its half filled. If you make an
auto-resizable hash table, and assume that the number of entries is
generally monotonically increasing, then it means that periodically,
operations will cost O(tree-size) which can be extremely expensive, and
will not fly in real-time systems.
That all being said -- *usually* hash tables are faster. It assumes
you can compute the hash function very quickly, and that your table is
not encumbered by extra overhead for each operation. James Dow Allen
has written a hash table that satisfies these contraints (though I
personally dislike his programming style.)
The only reason not to use a hash table is if you need to do range searches.
Well, another case, not described above, is if you are implementing
something like a file system.
If all searches are equality searches, then the right answer is a hash
table.
A hash table is more general in the case where you *have* to use
equality searches. Otherwise you just have to understand your case to
know which is the best strategy.