unordered_set

desktop · Apr 5, 2008

I have read that the upcoming C++0x standard will include unordered_set
which is based on a hash_table.

If I would like to make an unordered_set containing my own objects:
myObj I assume I would do something like this:

unordered_set<myObj> u_set;
myObj m;
u_set.insert(m);

But how is a hash value calculated on some user specified object like
myObj? I have only read about hash functions working on integers or strings.

Erik WikstrÃ¶m · Apr 5, 2008

I have read that the upcoming C++0x standard will include unordered_set
which is based on a hash_table.

If I would like to make an unordered_set containing my own objects:
myObj I assume I would do something like this:

unordered_set<myObj> u_set;
myObj m;
u_set.insert(m);

But how is a hash value calculated on some user specified object like
myObj? I have only read about hash functions working on integers or strings.

You will have to supply a hash-function/functor yourself, suppose you
create a functor class called hashMyObj then you would use the
unordered_set like this:

std::unordered_set<myObj, hashMyObj> u_set;
myObj m;
u_set.insert(m);

desktop · Apr 6, 2008

Erik said:
You will have to supply a hash-function/functor yourself, suppose you
create a functor class called hashMyObj then you would use the
unordered_set like this:

std::unordered_set<myObj, hashMyObj> u_set;
myObj m;
u_set.insert(m);

Ok so I specify my own hash-function that works my own object (each
object could contain a unique string or integer).

Where do I find info on what kind of objects the default hash-function
supports?

Erik WikstrÃ¶m · Apr 6, 2008

Ok so I specify my own hash-function that works my own object (each
object could contain a unique string or integer).

Where do I find info on what kind of objects the default hash-function
supports?

I look in the latest draft of the next version*, which specifies that
specialisations of std::hash should be available for integers, floating
point types, std::string (and variants thereof), std::error_code, and
std::thread::id. Note however that std::unordered_* were included in TR1
and I do not know what specialisations were specified in TR1, or how
much of it is implemented in your version of the standard library (it
should be documented somewhere though).

* The latest draft of the next C++ standard can be found at the C++
committee's site, search the following page for working draft and select
the latest. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/

Triple-DES · Apr 7, 2008

How come C++ library won't add something like this...

System.identityHashCode(theObject);

which returns internal address of the object by converting it into
an integer.

The user would have no need to specify a hashing function.

reinterpret_cast<size_t>(&theObject) does that. No reason for it to be
a library function.

DP

Kai-Uwe Bux · Apr 7, 2008

Triple-DES said:
reinterpret_cast<size_t>(&theObject) does that. No reason for it to be
a library function.

More of a problem is that the address of an object will generally not
satisfy the requirements of a hash function: objects that compare equal can
have different addresses but need to have identical hash values.

Best

Kai-Uwe Bux

Martin York · Apr 7, 2008

It's good enough for vast majority of cases (except Strings probably).

It is good for practically none, but if you must have it is trivial to
implement:

template<typename T>
size_t silly_hash_function(T const& object)
{
return reinterpret_cast<size_t>(&object);
}

James Kanze · Apr 7, 2008

On 2008-04-06 17:43:20 -0400, Razii <[email protected]> said:

Because it's not a good hash function.

It could be a good hash function, if that's what you want.

First, the function doesn't return the internal address,
converted to an integer, but rather a hash code (undefined how)
based on the identity of the object. Since the garbage
collector can move objects, using the address of the object as a
hash code would cause more than a few problems.

Second, if I understand correctly, the next version of the C++
standard will provide the equivalent. I think, in fact, that it
was you that told it to me: an implementation is required to
specialize std::hash for pointer types. And since C++ doesn't
allow objects to change addresses, generating a hash code from
the object's address is a perfectly valid (and in fact probably
the best) way to generate a hash code based on identity.

The real problem, of course, is that typically, you don't want a
hash code based on identity; you want one based on value. There
are exceptions, of course, but one frequence error I've seen in
Java is overriding the equals() function, and forgetting to
override hashCode() as well. In my experience, it's very, very
rare to use anything but a value type as a key.

James Kanze · Apr 8, 2008

"[G]enerating a hash code from the object's address" is not
the same as "return[ing the] address of the object".

Isn't that more or less what I said in the first paragraph of my
response?

The Java function cited above basically generates a hash code
somehow from the identity of the object. Since C++ doesn't
allow moving objects around in memory, the identity is basically
the object's address, and generating a hash code from the
object's address in C++ does more or less what
System.identityHashCode( object ) does in Java.

And hashing on pointer types when someone is putting pointers
into a container is not the same as hashing on object
addresses when someone hasn't written their own hash function.

Now that seems obvious. But the only use of
System.identityHashCode() in Java is to if you've written a
class which uses == (and not Object.equals()) for equality on
some of its subobjects.

As I said (somewhere, I think), one frequent error in Java is
overriding Object.equals(), but not Object.hashCode(). To be
fair, the Java documentation is very, very clear about the need
to override Object.hashCode() whenever you override
Object.equals(). A more robust implementation of
Object.hashCode() in the base class would have been something
along the lines of:

if ( equals is overridden ) {
return 0 ;
} else {
return System.identityHashCode( this ) ;
}

But robustness has never been a particular characteristic of
Java.

(BTW: maybe I am misinterpreting your original statements. With
regards to hash codes, I tend to consider good/bad as a
different distinction than correct/incorrect, with good/bad only
applying to correct hash codes. Thus, "return 0" is a correct
hash code, but it certainly cannot be considered a good hash
code. If by "not good", you meant "incorrect given the usual
equivalence function used", and not "correct, but with a bad
distribution", then I agree.)

TF-IDF	2	Aug 19, 2021
Universal hashing	5	Apr 7, 2008
Problem with references	1	Nov 3, 2009
Error when using std::less	2	Jun 10, 2007
how to use a pool	4	Nov 26, 2008
std::set versus tr1::unordered_set	1	Jun 3, 2005
Tasks	1	Nov 29, 2022
Map and memory issues	3	Dec 13, 2007

unordered_set

desktop

Erik WikstrÃ¶m

desktop

Erik WikstrÃ¶m

Triple-DES

Kai-Uwe Bux

Martin York

James Kanze

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads