STL map kinda functionality in C

C

comp_novice

Hi,

I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Thanks in advance
DKRS.
 
R

Richard Heathfield

comp_novice said:
Hi,

I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. It is trivial to write a bst that can handle any
data type. If you wish to separate the key from the rest of the value,
that's up to you, but C doesn't really mind either way.

For information on binary search trees in C, check out http://adtinfo.org
which documents Ben Pfaff's GNU libavl library. IIRC Ben normally builds
his trees out of ints rather than the void *s you'll be needing, but the
techniques are basically the same.
 
B

Ben Pfaff

Richard Heathfield said:
For information on binary search trees in C, check out http://adtinfo.org
which documents Ben Pfaff's GNU libavl library. IIRC Ben normally builds
his trees out of ints rather than the void *s you'll be needing, but the
techniques are basically the same.

I think that the C Unleashed book implementation might have used
ints, but GNU libavl uses void *s.
 
M

Marc Boyer

Le 10-07-2006 said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

You could have a look at BPL, it offers a STL-like map
http://www.enseeiht.fr/~boyer/Tools.html

Any feedback welcome,
Marc Boyer
 
R

Richard Heathfield

Ben Pfaff said:
I think that the C Unleashed book implementation might have used
ints,
Indeed.

but GNU libavl uses void *s.

Oh, that's good - I wish I'd known; I could have saved myself some work. :)
 
W

websnarf

Richard said:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys? Why
don't we go for a hash table instead? Or at least ask if the keys are
lexical or not.
 
R

Richard Heathfield

(e-mail address removed) said:
Richard said:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.
Why
don't we go for a hash table instead?

I like trees.
 
W

websnarf

Richard said:
(e-mail address removed) said:
Richard said:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.

Except if the key is a pointer to the *real* contents of the key. I.e.
"Hello" and strdup("Hello") might compare to be equal even though their
pointers do not compare as equal. (Of course string are lexical, but
obviously this idea extends to more abstract keys.)
 
R

Richard Heathfield

(e-mail address removed) said:
Except if the key is a pointer to the *real* contents of the key.

<shrug> The key that is a pointer to the real key is not itself the real key
(unless it points to itself, which isn't terribly useful).

Comparison functions deal with this in a trivially simple manner.
 
D

Dann Corbit

Richard Heathfield said:
(e-mail address removed) said:
Richard said:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.
Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

The only reason not to use a hash table is if you need to do range searches.
If all searches are equality searches, then the right answer is a hash
table.
 
B

Ben Pfaff

Dann Corbit said:
A hash table is much faster than a tree for all operations.

The only reason not to use a hash table is if you need to do range searches.
If all searches are equality searches, then the right answer is a hash
table.

I wish someone had pointed this out to me before I spent so much
time on carefully implementing binary trees. I probably would
never have done so if I'd realized that, actually, for all the
problems I was interested in solving at the time, hash tables
were a much better solution. (However, I was only 14 or so at
the time.)

These days, I'm better at thinking before I code.
 
D

Dann Corbit

Ben Pfaff said:
I wish someone had pointed this out to me before I spent so much
time on carefully implementing binary trees. I probably would
never have done so if I'd realized that, actually, for all the
problems I was interested in solving at the time, hash tables
were a much better solution. (However, I was only 14 or so at
the time.)

That would have been a terrible tragedy. Your AVL tree project is the
finest discussion on the topic that I have ever seen. And sometimes, we
want to search for ranges of things instead of exact matches. In such a
case, a tree is the right answer and a hash table is the wrong answer.
These days, I'm better at thinking before I code.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long
b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return
0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case
1:putchar(a[i&15]);break;}}}
 
E

Eric Sosman

Dann Corbit wrote On 07/10/06 15:12,:
A hash table is much faster than a tree for all operations.

Usually, yes -- but as Pitti-Sing says, "Bless you,
it all depends!"
The only reason not to use a hash table is if you need to do range searches.
If all searches are equality searches, then the right answer is a hash
table.

I'd extend this just a tiny bit: If range *operations*
are important, ordered data structures (trees, skip lists,
what-have-you) might be the answer. Even if all the searches
are for exact equality, it is sometimes important to be able
to access the "neighbors" of a located item, or to move
"forward" and/or "backward" from it.

But if all that's needed is a plain vanilla key-and-value
mapping, I'd agree with Dann: the hash table is the weapon of
choice.
 
M

Morris Dovey

Eric Sosman (in 1152562315.492306@news1nwk) said:

| Dann Corbit wrote On 07/10/06 15:12,:
||
|| A hash table is much faster than a tree for all operations.
|
| Usually, yes -- but as Pitti-Sing says, "Bless you,
| it all depends!"

I'm inclined to agree with Eric on this one - that it all depends on
the context. For relatively small key sizes (e.g. stock symbols), I've
been able to achieve my best lookup speeds using trie indexes.
 
W

websnarf

Dann said:
Richard Heathfield said:
(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.
Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

Furthermore, if your hash table has been improperly sized, then you
will pay unusually high reprobe penalities, or may thrash your memory
when you don't need to. A hash table usually never has 100% density
efficiency -- on average its half filled. If you make an
auto-resizable hash table, and assume that the number of entries is
generally monotonically increasing, then it means that periodically,
operations will cost O(tree-size) which can be extremely expensive, and
will not fly in real-time systems.

That all being said -- *usually* hash tables are faster. It assumes
you can compute the hash function very quickly, and that your table is
not encumbered by extra overhead for each operation. James Dow Allen
has written a hash table that satisfies these contraints (though I
personally dislike his programming style.)
The only reason not to use a hash table is if you need to do range searches.

Well, another case, not described above, is if you are implementing
something like a file system.
If all searches are equality searches, then the right answer is a hash
table.

A hash table is more general in the case where you *have* to use
equality searches. Otherwise you just have to understand your case to
know which is the best strategy.
 
D

Dann Corbit

Dann said:
Richard Heathfield said:
(e-mail address removed) said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrug> Any key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

A tree structure requires more memory than a hash table.
But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

I guess that the hash computation will be as fast as comparison. Since it
happens N times instead of N*log(N) times, the hash computation will be
faster:
http://www.cse.yorku.ca/~oz/hash.html
Furthermore, if your hash table has been improperly sized, then you
will pay unusually high reprobe penalities, or may thrash your memory
when you don't need to. A hash table usually never has 100% density
efficiency -- on average its half filled. If you make an
auto-resizable hash table, and assume that the number of entries is
generally monotonically increasing, then it means that periodically,
operations will cost O(tree-size) which can be extremely expensive, and
will not fly in real-time systems.

It's pretty unusual to have no idea how large your data set is.
That all being said -- *usually* hash tables are faster. It assumes
you can compute the hash function very quickly, and that your table is
not encumbered by extra overhead for each operation. James Dow Allen
has written a hash table that satisfies these contraints (though I
personally dislike his programming style.)

There are heaps and loads of hash functions with liberal licenses for use.
Well, another case, not described above, is if you are implementing
something like a file system.

Here, you may be onto something. I have seen real-world studies where disk
storage is required where b+trees are faster than hash tables.
A hash table is more general in the case where you *have* to use
equality searches. Otherwise you just have to understand your case to
know which is the best strategy.

If you don't have equality searches, then hash tables are totally useless.
 
R

Richard Heathfield

Dann Corbit said:
A hash table is much faster than a tree for all operations.

Consider writing out the keys and values of all nodes, in key order (i.e.
print a dictionary, phone list, or whatever). Can you really do that faster
with a hash table than with a tree? I'm surprised.
 
D

Dann Corbit

Richard Heathfield said:
Dann Corbit said:


Consider writing out the keys and values of all nodes, in key order (i.e.
print a dictionary, phone list, or whatever). Can you really do that
faster
with a hash table than with a tree? I'm surprised.

No. Hash tables are not useful for ordered lists.
By all operations I meant "insert, update, delete, find." which are all
O(1).
Elsethread, I pointed out that hash tables are not useful for range
comparisons.
 
R

Richard Heathfield

Dann Corbit said:

By all operations I meant "insert, update, delete, find." which are all
O(1).

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,181
Messages
2,570,970
Members
47,537
Latest member
BellCorone

Latest Threads

Top