STL map kinda functionality in C

Dann Corbit · Jul 11, 2006

Richard Heathfield said:
Dann Corbit said:

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.
http://citeseer.ist.psu.edu/fotakis03space.html

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Tak-Shing Chan · Jul 11, 2006

Dann Corbit said:

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.

I think Dann has perfect hashing in mind (though it is
unclear whether this is suitable for the OP or not).

Tak-Shing

Dann Corbit · Jul 11, 2006

Tak-Shing Chan said:
I think Dann has perfect hashing in mind (though it is
unclear whether this is suitable for the OP or not).

The OP said:
"I would like to store a key-value pair and then retrieve the value based on
a given key value."

If that is what they really meant, then a hash table is the answer to the
question.

If they need additional functionality, like range searches, then a tree
structure of some sort is probably superior.

You can get O(n) performance in the general case. See my other post on the
topic.

At any rate, for the OP's stated requirement, a hash table is better than a
tree.

Probably, is the right place for this thread, since
(other than the fact that the target wanted is C) it is not about the C
language, but about implementation of an algorithm.

Richard Heathfield · Jul 11, 2006

Dann Corbit said:

http://citeseer.ist.psu.edu/fotakis03space.html

"The server encountered an internal error and was unable to complete your
request. Error message:
The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there was an error in a CGI
script."

[sic]

So - that was an interesting counter-argument.

Dann Corbit · Jul 11, 2006

Richard Heathfield said:
Dann Corbit said:

http://citeseer.ist.psu.edu/fotakis03space.html

Click to expand...

"The server encountered an internal error and was unable to complete your
request. Error message:
The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there was an error in a CGI
script."

[sic]

So - that was an interesting counter-argument.

Article:

Space Efficient Hash Tables With Worst Case Constant Access Time (2003)

Dimitris Fotakis, Rasmus Pagh, Peter Sanders, Paul Spirakis

websnarf · Jul 11, 2006

Dann said:
Dann said:

(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

Click to expand...

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

Click to expand...

A tree structure requires more memory than a hash table.

But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

Click to expand...

I guess that the hash computation will be as fast as comparison. Since it
happens N times instead of N*log(N) times, the hash computation will be
faster:

You mean on initial inserts. That's fine, but a common scenario is to
be performance limited by searching.

http://www.cse.yorku.ca/~oz/hash.html

Lol! Did you just seriously quote that web page to me in an effort to
educate me about hash functions? Try this page:
http://burtleburtle.net/bob/hash/doobs.html .

It's pretty unusual to have no idea how large your data set is.

It is? Why do you think this? What if the programmer is not in
control of the data set?

There are heaps and loads of hash functions with liberal licenses for use.

I know -- but many of them are a complete waste of time because they
are actually unusually *SLOW* because of the way they have been
implemented (which defeats the purpose, of course -- hashes are
supposed to be faster than trees in the best case scenario). Of the
ones I looked at, James' at least does not make this mistake.

Here, you may be onto something. I have seen real-world studies where disk
storage is required where b+trees are faster than hash tables.

Yeah, this is because B+ trees recover a lot of locality, and the
log(n) is divided by a reasonably high constant. B+ trees also reduce
the memory overhead to a point where you might expect it to take less
memory than a hash table.

Dann Corbit · Jul 11, 2006

Dann said:
Dann said:

Dann Corbit wrote:
(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the
value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can
be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

Click to expand...

A tree structure requires more memory than a hash table.

But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

Click to expand...

I guess that the hash computation will be as fast as comparison. Since
it
happens N times instead of N*log(N) times, the hash computation will be
faster:

Click to expand...

You mean on initial inserts. That's fine, but a common scenario is to
be performance limited by searching.

Searching a well designed hash table is O(1). See the article referenced
elsethread.

Lol! Did you just seriously quote that web page to me in an effort to
educate me about hash functions? Try this page:
http://burtleburtle.net/bob/hash/doobs.html .

It was not an attempt to educate you about hash functions. It was an
attempt to show you hash functions which are very fast to compute and which
have excellent dispersion. I am aware of the hash functions on that page,
and use them in a commercial product I wrote, along with FNV hash, UMAC and
several others.

It is? Yes.

Why do you think this? Experience.

What if the programmer is not in
control of the data set?

You mean like a database? The programmer will still have access to some
kind of estimate of the set size (cardinality). In the rare case of
streaming data with unknown start and end a hash table will still be better,
given the right design.

If you are paranoid about it, you can make a hash table of skiplists.

I know -- but many of them are a complete waste of time because they
are actually unusually *SLOW* because of the way they have been
implemented (which defeats the purpose, of course -- hashes are
supposed to be faster than trees in the best case scenario). Of the
ones I looked at, James' at least does not make this mistake.

If it is memory based, hash tables are going to be faster unless they are
badly designed. If the data is permanent, then it is a lot harder to get
hashes to be faster, but the advanced database systems do manage it.

Richard Bos · Jul 11, 2006

Eric Sosman said:
Dann Corbit wrote On 07/10/06 15:12,:

Usually, yes -- but as Pitti-Sing says, "Bless you,
it all depends!"

I'd extend this just a tiny bit: If range *operations*
are important, ordered data structures (trees, skip lists,
what-have-you) might be the answer. Even if all the searches
are for exact equality, it is sometimes important to be able
to access the "neighbors" of a located item, or to move
"forward" and/or "backward" from it.

Item: database programming. Indexing, in particular.

Richard

stl::map iterator	2	Jul 1, 2013
Sorting an STL map	1	Oct 19, 2013
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Keyboard event detection in C#	1	Feb 8, 2023
Is this PEP-able? (syntax for functools.partial-like functionality)	3	Aug 20, 2013
STL map impl. in C	2	Jul 23, 2003
Unraveling Pointers and Arrays in C++: Seeking Expert Advice.	1	Jan 26, 2024
How to try a range of hex values in C# code ?	0	Nov 19, 2022

STL map kinda functionality in C

Dann Corbit

Tak-Shing Chan

Dann Corbit

Richard Heathfield

Dann Corbit

websnarf

Dann Corbit

Richard Bos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads