No std::hash_map

S

saneman

I have read that hash_map is not part of the C++ standard and therefore
std::hash_map does not exist (even though it can be found in sgi).

But is hash_map not just a std::map thats more efficient and implemented
with tables instead of an underlying balanced search tree and a
hashfunction?
 
V

Victor Bazarov

saneman said:
I have read that hash_map is not part of the C++ standard
....yet...

and
therefore std::hash_map does not exist (even though it can be found
in sgi).
But is hash_map not just a std::map thats more efficient and
implemented with tables instead of an underlying balanced search tree
and a hashfunction?

Not sure what you mean. 'std::map' does not use hashfunction.

V
 
E

Erik Wikström

I have read that hash_map is not part of the C++ standard and therefore
std::hash_map does not exist (even though it can be found in sgi).

But is hash_map not just a std::map thats more efficient and implemented
with tables instead of an underlying balanced search tree and a
hashfunction?

No, in the next version of the standard (and in TR1) there is something
called unordered_map which is a hash map. From the name we can see one
important difference between std::map and a hash map, std::map is
ordered which means that when iterating through the elements they are
sorted by their key, but when iterating over a hash map the elements
will be in no particular order*.

* Actually they will be ordered by the hash of the key, but for most
purposes that is the same thing as not ordered.
 
J

Juha Nieminen

Erik said:
No, in the next version of the standard (and in TR1) there is something
called unordered_map which is a hash map. From the name we can see one
important difference between std::map and a hash map, std::map is
ordered which means that when iterating through the elements they are
sorted by their key, but when iterating over a hash map the elements
will be in no particular order*.

Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.
 
J

Jerry Coffin

[ ... ]
Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.

A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.
 
J

Jeff Schwab

Jerry said:
[ ... ]
Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.

A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.

Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.
 
J

Jerry Coffin

Jerry said:
[ ... ]
Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.

A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.

Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.

I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.
 
J

Jeff Schwab

Jerry said:
Jerry said:
[ ... ]

Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.
A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.
Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.

I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.

Thanks, that's fair enough. It must be a good idea, since it seems so
simple in retrospect. :) I've never actually seen the buckets
implemented as anything other than linked lists.
 
J

Juha Nieminen

Jerry said:
I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.

How is this different from simply using one single balanced tree?

Since you are having the overhead of a balanced tree anyways, adding
some properties of hashmaps to the mix probably won't cause a
significant increase in any performance.
 
E

Erik Wikström

How is this different from simply using one single balanced tree?

Since you are having the overhead of a balanced tree anyways, adding
some properties of hashmaps to the mix probably won't cause a
significant increase in any performance.

If you have luck with your hash-algorithm and the values inserted you
will only have one element per tree/bucket giving you O(1) insertion,
removal, and retrieval. In a normal hash-map (where the buckets are
linked lists) you have O(M) for these operations, where M is the number
of elements in the buckets (again, if you are lucky M==1) but in the
worst case you have O(N) when all elements are in the same bucket. If
the buckets are trees you get O(log N).
 
J

Jerry Coffin

How is this different from simply using one single balanced tree?

Since you are having the overhead of a balanced tree anyways, adding
some properties of hashmaps to the mix probably won't cause a
significant increase in any performance.

Quite the contrary -- you only get treelike performance in the _worst_
case where your hashes all collide. Given a halfway reasonable hash
function, that's not going to happen -- each of your trees is going to
be nearly empty, so instead of O(log N), you get O(log (N/M)), where M
is the number of hash buckets in the tree.

The other effect of this design is that it makes the size of the hash
table far less critical. In a typical hash table, you get really great
performance up to some load factor, then as you add more data, the
performance gets much worse very quickly. When you use linked lists for
collision resolution, you basically have two terms involved: one that's
(roughly) constant, and another that's linear. To keep reasonable
performance at all, you need to ensure that the table is large enough
that the constant term dominates, and the linear one stays linear on
such a tiny number that it has no practical effect.

By using trees for collision resolution, you turn that linear term into
a logarithmic term instead. For example, if your hash table is too small
for the data by a factor of 1000, with a conventional hash table you get
performance that's about 1000 times worse than with a large enough hash
table. With the logarithmic growth, it's only log2(1000) = ~10 times
worse instead.

That may not seem very significant at first (10 times worse performance
still might not seem very acceptable), but the effect is really much
larger than that: since the effect on a conventional hash table is SO
bad, nearly all conventional hash tables _must_ include some provision
for increasing the size of the table as it gets (nearly) full. These
fall into two categories: some basically rehash all the items in the
table when it gets close to full. Some do gradual resizing instead, to
distribute that high cost over a number of insertions. While this
distributes the cost more evenly, I believe it usually actually
increases the overall cost. OTOH, since it's virtually certain that such
resizing WILL be necessary, and most people aren't willing to allow a
few insertions to become drastically more expensive, you nearly always
end up paying that higher cost.

By using trees for collision resolution, you can nearly always avoid
that cost completely -- unless you start with a _tiny_ table, the
chances of overfilling by a large factor become nearly negligible. Since
resizing becomes more of a remote possibility instead of a near
certainty, you can even consider using rehashing instead of gradual
resizing, to gain some overall speed at the expense of possibly having a
very small number of insertions that might be extremely expensive.
 
J

James Kanze

Jerry said:
Jerry Coffin wrote:
[ ... ]
Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.
A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.
Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.
I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.
Thanks, that's fair enough. It must be a good idea, since it seems so
simple in retrospect. :) I've never actually seen the buckets
implemented as anything other than linked lists.

The idea is well known, but little used in practice, because if
you have a good hash function, and a low enough fill ratio, it
results in slower accesses, not faster. There will usually be
only one or two elements in a bucket, and the higher constant
factor for a balanced tree will offset any gain when the trees
are so small.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,965
Members
47,512
Latest member
FinleyNick

Latest Threads

Top