c-style string vs std::string

B

BGB

Actually implementing a good hash table is not trivial, for two reasons.

Firstly, the decision of which kind of hash table. Unlike eg. red-black
trees, there are many different possible hash table implementations, and
there's no one single that would be optimal. What type of hash table is
best may actually depend on what kind of data is inserted into it.

fair enough, one has to decide on the type of hash table, ...
but they are not all that complicated code-wise, and are typically not
all that large (WRT code), albeit a factor worth consideration is the
size (too small will not be very effective or will fail, depending on
variety, and too large may waste memory or actually perform worse than a
smaller hash).

however, "rules of thumb" are generally reasonably effective.

Secondly, a naive hashing function may have hidden surprises. It might
seem to work like a charm in all testcases... but then there may be some
pathological input (which is not even necessarily deliberately chosen to
break the hash table, but could be natural input from somewhere) that will
trigger the weakness in the hashing function (which usually presents itself
by the vast majority of the elements ending up in only a fraction of the
hash table positions, in other words, many elements having the same hash
values). This pathological situation might not be discovered in testing,
only when a client somewhere uses it with some unexpected input.

I have had generally good success with polynomial hashes (generally
better collision behavior than the more traditional shifts+xors hashes).

there are also some nifty (almost magical) properties of prime-numbers
when used in hash-functions and similar (note: mersenne and non-mersenne
primes exhibit different properties).


found this:
http://en.wikipedia.org/wiki/Universal_hashing

one of the algos mentioned is similar to one I often use.

however, I developed my variant on my own, initially mostly by fiddling
and similar...

The major inefficiency in std::set and std::map is the memory allocation
that happens with each element. However, this inefficiency can be greatly
alleviated by using a fast memory allocator (which all STL containers
support). Such an allocator can make those containers faster by even an
order of magnitude or so.

Of course using a ready-made implementation of a data container (be it
std::set, std::map or their unordered variants in the new standard) also
reduces the amount of bug-hunting in the program, which is always a great
asset.

possibly, but I have generally not had too many problems with bugs WRT
hashing, but OTOH, this is also one of those things that one may end up
internalizing due to frequent use.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,142
Messages
2,570,818
Members
47,362
Latest member
eitamoro

Latest Threads

Top