R
Roy Smith
Marko Rauhamaa said:The O(f(n)) notation has no meaning when n is limited.
This thing is not just pedantry. The discussion was how a balanced tree
fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
presented as a proof that balanced trees don't have a chance in practice
or theory. I wasn't so easily convinced.
Marko
On the other hand, log n, for n = 1 million, is just 20. It's not hard
to imagine a hash function which costs 20x what a node traversal does,
in which case, the log n lookup is ahead for all n < 1 million.
Looking at the Songza source, I see we have one user-defined hash
function:
def __hash__(self):
return hash((self.song,
self.user,
self.add_time,
self.delete_time,
self.play_first))
where those things are 2 bson.ObjectId's, 2 datetimes, and a boolean. I
wouldn't be surprised if that falls into the 20x node traversal bucket.
In this case, convenience was more important than speed, so it doesn't
matter.
The hash vs. tree argument can get very complicated. For example, if
your tree is not completely resident in memory, the cost of paging in a
node will swamp everything else, and improving lookup speed will boil
down to reducing the number of I/O operations you need. An expensive
hash plus a linear walk through a collision chain which was resident in
a single memory block will beat traversing two nodes, each of which had
to be paged in separately.