No std::hash_map

saneman · Feb 14, 2008

I have read that hash_map is not part of the C++ standard and therefore
std::hash_map does not exist (even though it can be found in sgi).

But is hash_map not just a std::map thats more efficient and implemented
with tables instead of an underlying balanced search tree and a
hashfunction?

Victor Bazarov · Feb 14, 2008

saneman said:
I have read that hash_map is not part of the C++ standard
....yet...

and
therefore std::hash_map does not exist (even though it can be found
in sgi).
But is hash_map not just a std::map thats more efficient and
implemented with tables instead of an underlying balanced search tree
and a hashfunction?

Not sure what you mean. 'std::map' does not use hashfunction.

V

Erik WikstrÃ¶m · Feb 14, 2008

I have read that hash_map is not part of the C++ standard and therefore
std::hash_map does not exist (even though it can be found in sgi).

But is hash_map not just a std::map thats more efficient and implemented
with tables instead of an underlying balanced search tree and a
hashfunction?

No, in the next version of the standard (and in TR1) there is something
called unordered_map which is a hash map. From the name we can see one
important difference between std::map and a hash map, std::map is
ordered which means that when iterating through the elements they are
sorted by their key, but when iterating over a hash map the elements
will be in no particular order*.

* Actually they will be ordered by the hash of the key, but for most
purposes that is the same thing as not ordered.

Juha Nieminen · Feb 15, 2008

Erik said:
No, in the next version of the standard (and in TR1) there is something
called unordered_map which is a hash map. From the name we can see one
important difference between std::map and a hash map, std::map is
ordered which means that when iterating through the elements they are
sorted by their key, but when iterating over a hash map the elements
will be in no particular order*.

Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.

Jerry Coffin · Feb 16, 2008

[ ... ]

Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.

A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.

Jeff Schwab · Feb 16, 2008

Jerry said:
[ ... ]

Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.

Click to expand...

A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.

Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.

Jerry Coffin · Feb 16, 2008

Jerry said:
Jerry said:

[ ... ]

Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.

Click to expand...

A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.

Click to expand...

Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.

I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.

Jeff Schwab · Feb 17, 2008

Jerry said:
Jerry said:

[ ... ]

Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.
A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.

Click to expand...

Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.

Click to expand...

I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.

Thanks, that's fair enough. It must be a good idea, since it seems so
simple in retrospect.

I've never actually seen the buckets
implemented as anything other than linked lists.

Juha Nieminen · Feb 17, 2008

Jerry said:
I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.

How is this different from simply using one single balanced tree?

Since you are having the overhead of a balanced tree anyways, adding
some properties of hashmaps to the mix probably won't cause a
significant increase in any performance.

Erik WikstrÃ¶m · Feb 17, 2008

How is this different from simply using one single balanced tree?

Since you are having the overhead of a balanced tree anyways, adding
some properties of hashmaps to the mix probably won't cause a
significant increase in any performance.

If you have luck with your hash-algorithm and the values inserted you
will only have one element per tree/bucket giving you O(1) insertion,
removal, and retrieval. In a normal hash-map (where the buckets are
linked lists) you have O(M) for these operations, where M is the number
of elements in the buckets (again, if you are lucky M==1) but in the
worst case you have O(N) when all elements are in the same bucket. If
the buckets are trees you get O(log N).

Jerry Coffin · Feb 17, 2008

How is this different from simply using one single balanced tree?

Since you are having the overhead of a balanced tree anyways, adding
some properties of hashmaps to the mix probably won't cause a
significant increase in any performance.

Quite the contrary -- you only get treelike performance in the _worst_
case where your hashes all collide. Given a halfway reasonable hash
function, that's not going to happen -- each of your trees is going to
be nearly empty, so instead of O(log N), you get O(log (N/M)), where M
is the number of hash buckets in the tree.

The other effect of this design is that it makes the size of the hash
table far less critical. In a typical hash table, you get really great
performance up to some load factor, then as you add more data, the
performance gets much worse very quickly. When you use linked lists for
collision resolution, you basically have two terms involved: one that's
(roughly) constant, and another that's linear. To keep reasonable
performance at all, you need to ensure that the table is large enough
that the constant term dominates, and the linear one stays linear on
such a tiny number that it has no practical effect.

By using trees for collision resolution, you turn that linear term into
a logarithmic term instead. For example, if your hash table is too small
for the data by a factor of 1000, with a conventional hash table you get
performance that's about 1000 times worse than with a large enough hash
table. With the logarithmic growth, it's only log2(1000) = ~10 times
worse instead.

That may not seem very significant at first (10 times worse performance
still might not seem very acceptable), but the effect is really much
larger than that: since the effect on a conventional hash table is SO
bad, nearly all conventional hash tables _must_ include some provision
for increasing the size of the table as it gets (nearly) full. These
fall into two categories: some basically rehash all the items in the
table when it gets close to full. Some do gradual resizing instead, to
distribute that high cost over a number of insertions. While this
distributes the cost more evenly, I believe it usually actually
increases the overall cost. OTOH, since it's virtually certain that such
resizing WILL be necessary, and most people aren't willing to allow a
few insertions to become drastically more expensive, you nearly always
end up paying that higher cost.

By using trees for collision resolution, you can nearly always avoid
that cost completely -- unless you start with a _tiny_ table, the
chances of overfilling by a large factor become nearly negligible. Since
resizing becomes more of a remote possibility instead of a near
certainty, you can even consider using rehashing instead of gradual
resizing, to gain some overall speed at the expense of possibly having a
very small number of insertions that might be extremely expensive.

James Kanze · Feb 17, 2008

Jerry said:
Jerry said:

Jerry Coffin wrote:

[ ... ]
Another relevant difference is that O(log n) operations can be
guaranteed for a std::map, but they cannot be guaranteed for a hashmap.
A hash-based map can be designed so to guarantee O(log N) operations for
insertions, deletions, etc. Offhand I'm not sure whether TR1 and/or C++
0x make that requirement, but if they did it could be met.
Could you please point to an example? I'd love to see both the theory
(explained in English) and the implementation.

Click to expand...

I don't know of an existing implementation, but the idea is pretty
simple. You create the hash map as a table of balanced trees (e.g. AVL
or R-B trees). In the worst case, if all your hashes collide, you're
working with a single balanced tree, which is exactly what you have with
std::set, std::map, std::multiset and std::multimap, so you get
precisely the same characteristics as they provide.

Click to expand...

Click to expand...

Thanks, that's fair enough. It must be a good idea, since it seems so
simple in retrospect. I've never actually seen the buckets
implemented as anything other than linked lists.

Click to expand...

The idea is well known, but little used in practice, because if
you have a good hash function, and a low enough fill ratio, it
results in slower accesses, not faster. There will usually be
only one or two elements in a bucket, and the higher constant
factor for a balanced tree will offset any gain when the trees
are so small.

Click to expand...

Click to expand...

high traffic/availability application and gnu_cxx::hash_map problem -better to use tr1/unordered_map	1	Jan 5, 2011
hash_map question	3	Feb 16, 2008
STL hash_map	1	Jul 27, 2007
hash_map elements deletion	4	Mar 7, 2007
SGI hash_map	4	Mar 21, 2005
Question on initializing STL hash-maps using Visual C++ Express	2	Oct 17, 2010
Good ole gnu::hash_map, I'm impressed	8	Jul 16, 2008
__gnu_cxx::hash_map question, please help!!!	4	Oct 25, 2007

No std::hash_map

saneman

Victor Bazarov

Erik WikstrÃ¶m

Juha Nieminen

Jerry Coffin

Jeff Schwab

Jerry Coffin

Jeff Schwab

Juha Nieminen

Erik WikstrÃ¶m

Jerry Coffin

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads