unordered_set

D

desktop

I have read that the upcoming C++0x standard will include unordered_set
which is based on a hash_table.

If I would like to make an unordered_set containing my own objects:
myObj I assume I would do something like this:

unordered_set<myObj> u_set;
myObj m;
u_set.insert(m);

But how is a hash value calculated on some user specified object like
myObj? I have only read about hash functions working on integers or strings.
 
E

Erik Wikström

I have read that the upcoming C++0x standard will include unordered_set
which is based on a hash_table.

If I would like to make an unordered_set containing my own objects:
myObj I assume I would do something like this:

unordered_set<myObj> u_set;
myObj m;
u_set.insert(m);

But how is a hash value calculated on some user specified object like
myObj? I have only read about hash functions working on integers or strings.

You will have to supply a hash-function/functor yourself, suppose you
create a functor class called hashMyObj then you would use the
unordered_set like this:

std::unordered_set<myObj, hashMyObj> u_set;
myObj m;
u_set.insert(m);
 
D

desktop

Erik said:
You will have to supply a hash-function/functor yourself, suppose you
create a functor class called hashMyObj then you would use the
unordered_set like this:

std::unordered_set<myObj, hashMyObj> u_set;
myObj m;
u_set.insert(m);

Ok so I specify my own hash-function that works my own object (each
object could contain a unique string or integer).

Where do I find info on what kind of objects the default hash-function
supports?
 
E

Erik Wikström

Ok so I specify my own hash-function that works my own object (each
object could contain a unique string or integer).

Where do I find info on what kind of objects the default hash-function
supports?

I look in the latest draft of the next version*, which specifies that
specialisations of std::hash should be available for integers, floating
point types, std::string (and variants thereof), std::error_code, and
std::thread::id. Note however that std::unordered_* were included in TR1
and I do not know what specialisations were specified in TR1, or how
much of it is implemented in your version of the standard library (it
should be documented somewhere though).

* The latest draft of the next C++ standard can be found at the C++
committee's site, search the following page for working draft and select
the latest. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/
 
T

Triple-DES

How come C++ library won't add something like this...

System.identityHashCode(theObject);

which returns internal address of the object by converting it into
an integer.

The user would have no need to specify a hashing function.

reinterpret_cast<size_t>(&theObject) does that. No reason for it to be
a library function.

DP
 
K

Kai-Uwe Bux

Triple-DES said:
reinterpret_cast<size_t>(&theObject) does that. No reason for it to be
a library function.

More of a problem is that the address of an object will generally not
satisfy the requirements of a hash function: objects that compare equal can
have different addresses but need to have identical hash values.


Best

Kai-Uwe Bux
 
M

Martin York

It's good enough for vast majority of cases (except Strings probably).

It is good for practically none, but if you must have it is trivial to
implement:

template<typename T>
size_t silly_hash_function(T const& object)
{
return reinterpret_cast<size_t>(&object);
}
 
J

James Kanze

On 2008-04-06 17:43:20 -0400, Razii <[email protected]> said:
Because it's not a good hash function.

It could be a good hash function, if that's what you want.

First, the function doesn't return the internal address,
converted to an integer, but rather a hash code (undefined how)
based on the identity of the object. Since the garbage
collector can move objects, using the address of the object as a
hash code would cause more than a few problems.

Second, if I understand correctly, the next version of the C++
standard will provide the equivalent. I think, in fact, that it
was you that told it to me: an implementation is required to
specialize std::hash for pointer types. And since C++ doesn't
allow objects to change addresses, generating a hash code from
the object's address is a perfectly valid (and in fact probably
the best) way to generate a hash code based on identity.

The real problem, of course, is that typically, you don't want a
hash code based on identity; you want one based on value. There
are exceptions, of course, but one frequence error I've seen in
Java is overriding the equals() function, and forgetting to
override hashCode() as well. In my experience, it's very, very
rare to use anything but a value type as a key.
 
J

James Kanze

"[G]enerating a hash code from the object's address" is not
the same as "return[ing the] address of the object".

Isn't that more or less what I said in the first paragraph of my
response?

The Java function cited above basically generates a hash code
somehow from the identity of the object. Since C++ doesn't
allow moving objects around in memory, the identity is basically
the object's address, and generating a hash code from the
object's address in C++ does more or less what
System.identityHashCode( object ) does in Java.
And hashing on pointer types when someone is putting pointers
into a container is not the same as hashing on object
addresses when someone hasn't written their own hash function.

Now that seems obvious. But the only use of
System.identityHashCode() in Java is to if you've written a
class which uses == (and not Object.equals()) for equality on
some of its subobjects.

As I said (somewhere, I think), one frequent error in Java is
overriding Object.equals(), but not Object.hashCode(). To be
fair, the Java documentation is very, very clear about the need
to override Object.hashCode() whenever you override
Object.equals(). A more robust implementation of
Object.hashCode() in the base class would have been something
along the lines of:

if ( equals is overridden ) {
return 0 ;
} else {
return System.identityHashCode( this ) ;
}

But robustness has never been a particular characteristic of
Java.

(BTW: maybe I am misinterpreting your original statements. With
regards to hash codes, I tend to consider good/bad as a
different distinction than correct/incorrect, with good/bad only
applying to correct hash codes. Thus, "return 0" is a correct
hash code, but it certainly cannot be considered a good hash
code. If by "not good", you meant "incorrect given the usual
equivalence function used", and not "correct, but with a bad
distribution", then I agree.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

TF-IDF 2
Universal hashing 5
Problem with references 1
Error when using std::less 2
how to use a pool 4
std::set versus tr1::unordered_set 1
Tasks 1
Map and memory issues 3

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,949
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top