Giovanni said:
You are wrong here, in general HashSet can end up worst case O(n) as in this
You mean O(m), right?
particular case being discussed, unless you make the wrong assumption that
all words in a dictionary fall each under a separate HashMap bucket and this
is NOT possible, there is no such hash function. This is why Matthews
explicitly mentioned and chose the binary search which is always worst case
O(log n).
There will be a small List or similar structure at each bucket of the
Set, but generally speaking those lists will be very small relative to
m. That is why the Javadocs for HashSet claim constant-time
performance for HashSet#contains(). Are you saying the Javadocs are
wrong?
It is not common to do binary searches on HashSets.
HashSet lookups tend to be much faster than binary searches because
the hash lookup takes one to the correct bucket in O(1) time, then
there is an O(x) search through the list at that bucket, where x is
some very small number. The nature of the hashing alogrithm should
keep x more-or-less constant with respect to m, thus the claim of
constant-time complexity for 'contains()' is not invalidated.
Again, this is the claim that the Javadocs make. I feel very
comfortable agreeing with the Javadocs on this matter.
Another excerpt from the HashMap javadoc "This class makes no guarantees as
to the order of the map; in particular, it does not guarantee that the order
will remain constant over time."
For this very reason the binary search is the right choice and not a
HashMap.
Except that a HashMap gives O(1) performance and the complexity
measure of a binary search is much worse.
Order of the HashMap is not relevant; one finds the correct entry
directly via the search-term hash and a short linear search through
the matching bucket. The size of each bucket does not depend on m for
typical dictionaries.
You are wrong again, the constant time is defined as O(c) and not as O(1)
Wikipedia agrees with me:
<
http://en.wikipedia.org/wiki/Big-O_notation>
Note the first table entry of
<
http://en.wikipedia.org/wiki/Big-
O_notation#Orders_of_common_functions>
Note that one of the algorithms given as having O(1) complexity in
that table is "using a constant-size lookup table or hash table".
even a HashMap lookup involves a small number of operations and that is not
1. In general constant time is denoted using a constant e.g. c
Not according to my math professors or any source I've read on big-O
notation. They all use "O(1)". See the Wikipedia reference that I
cited.
Generally yes, but in this particular problem you assume that searching in a
dictionary is constant time using a HashMap and you are sadly mistaken.
I am not mistaken, nor happily nor sadly, if Wikipedia and Sun's
Javadocs are to be believed. I've quoted Wikipedia's assertion that
hash table lookups are O(1). The Javadocs for HashMap state
explicitly, "This implementation provides constant-time performance
for the basic operations (get and put) ...".
I think I will believe the Javadocs. This belief is supported by
understanding the algorithm at the heart of the HashMap#get()
operation.
I agree that their analysis does not account for the time it takes to
sort the 'n' characters of the search term and the O(n) calculation of
the hash code for the search term. Since n is far less than m,
typically no more than ten and nearly never above a hundred for most
human languages, we can consider that the search term length is not as
severe a factor.