How is this different from simply using one single balanced tree?
Since you are having the overhead of a balanced tree anyways, adding
some properties of hashmaps to the mix probably won't cause a
significant increase in any performance.
Quite the contrary -- you only get treelike performance in the _worst_
case where your hashes all collide. Given a halfway reasonable hash
function, that's not going to happen -- each of your trees is going to
be nearly empty, so instead of O(log N), you get O(log (N/M)), where M
is the number of hash buckets in the tree.
The other effect of this design is that it makes the size of the hash
table far less critical. In a typical hash table, you get really great
performance up to some load factor, then as you add more data, the
performance gets much worse very quickly. When you use linked lists for
collision resolution, you basically have two terms involved: one that's
(roughly) constant, and another that's linear. To keep reasonable
performance at all, you need to ensure that the table is large enough
that the constant term dominates, and the linear one stays linear on
such a tiny number that it has no practical effect.
By using trees for collision resolution, you turn that linear term into
a logarithmic term instead. For example, if your hash table is too small
for the data by a factor of 1000, with a conventional hash table you get
performance that's about 1000 times worse than with a large enough hash
table. With the logarithmic growth, it's only log2(1000) = ~10 times
worse instead.
That may not seem very significant at first (10 times worse performance
still might not seem very acceptable), but the effect is really much
larger than that: since the effect on a conventional hash table is SO
bad, nearly all conventional hash tables _must_ include some provision
for increasing the size of the table as it gets (nearly) full. These
fall into two categories: some basically rehash all the items in the
table when it gets close to full. Some do gradual resizing instead, to
distribute that high cost over a number of insertions. While this
distributes the cost more evenly, I believe it usually actually
increases the overall cost. OTOH, since it's virtually certain that such
resizing WILL be necessary, and most people aren't willing to allow a
few insertions to become drastically more expensive, you nearly always
end up paying that higher cost.
By using trees for collision resolution, you can nearly always avoid
that cost completely -- unless you start with a _tiny_ table, the
chances of overfilling by a large factor become nearly negligible. Since
resizing becomes more of a remote possibility instead of a near
certainty, you can even consider using rehashing instead of gradual
resizing, to gain some overall speed at the expense of possibly having a
very small number of insertions that might be extremely expensive.