STL map kinda functionality in C

D

Dann Corbit

Richard Heathfield said:
Dann Corbit said:



Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.
http://citeseer.ist.psu.edu/fotakis03space.html

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
 
T

Tak-Shing Chan

Dann Corbit said:



Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.

I think Dann has perfect hashing in mind (though it is
unclear whether this is suitable for the OP or not).

Tak-Shing
 
D

Dann Corbit

Tak-Shing Chan said:
I think Dann has perfect hashing in mind (though it is
unclear whether this is suitable for the OP or not).

The OP said:
"I would like to store a key-value pair and then retrieve the value based on
a given key value."

If that is what they really meant, then a hash table is the answer to the
question.

If they need additional functionality, like range searches, then a tree
structure of some sort is probably superior.

You can get O(n) performance in the general case. See my other post on the
topic.

At any rate, for the OP's stated requirement, a hash table is better than a
tree.

Probably, is the right place for this thread, since
(other than the fact that the target wanted is C) it is not about the C
language, but about implementation of an algorithm.
 
D

Dann Corbit

Richard Heathfield said:
Dann Corbit said:

"The server encountered an internal error and was unable to complete your
request. Error message:
The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there was an error in a CGI
script."

[sic]

So - that was an interesting counter-argument. :)

Article:

Space Efficient Hash Tables With Worst Case Constant Access Time (2003)

Dimitris Fotakis, Rasmus Pagh, Peter Sanders, Paul Spirakis
 
W

websnarf

Dann said:
Dann said:
(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

A tree structure requires more memory than a hash table.
But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

I guess that the hash computation will be as fast as comparison. Since it
happens N times instead of N*log(N) times, the hash computation will be
faster:

You mean on initial inserts. That's fine, but a common scenario is to
be performance limited by searching.

Lol! Did you just seriously quote that web page to me in an effort to
educate me about hash functions? Try this page:
http://burtleburtle.net/bob/hash/doobs.html .
It's pretty unusual to have no idea how large your data set is.

It is? Why do you think this? What if the programmer is not in
control of the data set?
There are heaps and loads of hash functions with liberal licenses for use.

I know -- but many of them are a complete waste of time because they
are actually unusually *SLOW* because of the way they have been
implemented (which defeats the purpose, of course -- hashes are
supposed to be faster than trees in the best case scenario). Of the
ones I looked at, James' at least does not make this mistake.
Here, you may be onto something. I have seen real-world studies where disk
storage is required where b+trees are faster than hash tables.

Yeah, this is because B+ trees recover a lot of locality, and the
log(n) is divided by a reasonably high constant. B+ trees also reduce
the memory overhead to a point where you might expect it to take less
memory than a hash table.
 
D

Dann Corbit

Dann said:
Dann Corbit wrote:
(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the
value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can
be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

A tree structure requires more memory than a hash table.
But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

I guess that the hash computation will be as fast as comparison. Since
it
happens N times instead of N*log(N) times, the hash computation will be
faster:

You mean on initial inserts. That's fine, but a common scenario is to
be performance limited by searching.

Searching a well designed hash table is O(1). See the article referenced
elsethread.
Lol! Did you just seriously quote that web page to me in an effort to
educate me about hash functions? Try this page:
http://burtleburtle.net/bob/hash/doobs.html .

It was not an attempt to educate you about hash functions. It was an
attempt to show you hash functions which are very fast to compute and which
have excellent dispersion. I am aware of the hash functions on that page,
and use them in a commercial product I wrote, along with FNV hash, UMAC and
several others.
It is? Yes.

Why do you think this? Experience.

What if the programmer is not in
control of the data set?

You mean like a database? The programmer will still have access to some
kind of estimate of the set size (cardinality). In the rare case of
streaming data with unknown start and end a hash table will still be better,
given the right design.

If you are paranoid about it, you can make a hash table of skiplists.
I know -- but many of them are a complete waste of time because they
are actually unusually *SLOW* because of the way they have been
implemented (which defeats the purpose, of course -- hashes are
supposed to be faster than trees in the best case scenario). Of the
ones I looked at, James' at least does not make this mistake.

If it is memory based, hash tables are going to be faster unless they are
badly designed. If the data is permanent, then it is a lot harder to get
hashes to be faster, but the advanced database systems do manage it.
 
R

Richard Bos

Eric Sosman said:
Dann Corbit wrote On 07/10/06 15:12,:

Usually, yes -- but as Pitti-Sing says, "Bless you,
it all depends!"


I'd extend this just a tiny bit: If range *operations*
are important, ordered data structures (trees, skip lists,
what-have-you) might be the answer. Even if all the searches
are for exact equality, it is sometimes important to be able
to access the "neighbors" of a located item, or to move
"forward" and/or "backward" from it.

Item: database programming. Indexing, in particular.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,181
Messages
2,570,970
Members
47,537
Latest member
BellCorone

Latest Threads

Top