J
James Kanze
On Nov 21, 6:29 pm, (e-mail address removed)-graz.ac.at (GJ Woeginger)
wrote:
I was posed the following question in a technical
interview for a Software Engineering position by a
major multinational NASDAQ company:
[Paraphrasing] "You are given an array of 1,000,000
32-bit integers. One int value x occurs 500,001 times
or more in the array. Specify an algorithm to
determine x."
The assumption being extra points for efficiency.
There is an old analysis of this problem by Fischer and Salzberg.
M.J. Fisher and S.L. Salzberg (1982)
Finding a majority among n votes.
Journal of Algorithms 3, pp 143-152.
If 2k elements contain a majority element (= an element
that occurs at least k+1 times), then you can find it
with 3k-2 element comparisons (and some small overhead).
The basic idea in their algorithm is that whenever you
find two distinct elements, then you can destroy both
without changing the majority element among the
remaining elements. This yields:
Run once through the array, and maintain a
majority-candidate and a counter. The
majority-candidate is initialized as the first element,
and the counter (counting how often you have seen the
candidate) is initialized at 1.
If you hit the current candidate again, increment the counter.
If you hit another element, decrement the counter.
If the counter becomes 0, drop the current candidate and start from
scratch with the remaining part of the array.
Fischer and Salzberg also show that this bound 3k-2 is
best possible in the worst case (and that's the main
part of their paper).
If I understand your description than it would look like:
int findmode(int* p, int n)
{
int x = p[0];
int c = 1;
for (int i = 1; i < n; i++)
{
if (c == 0)
{
x = p;
c = 1;
}
else if (p == x)
c++;
else
c--;
}
return x;
}
It seems that this could only produce at maximum (2k-1)
comparisions in the worst case, not (3k-2) as you claim?
Oh wait... the (c==0) counts as a comparison. I see it now.
Ignore me.
I don't think the (c==0) counts as a comparison.
Of course it does. Why wouldn't it?
I think the other k-1 comparisons are for the 2nd
part of the algorithm, when you make one more pass
through the array verifying the candidate found in
the 1st part. [You already know the index i for
one occurrence of the candidate, so only the other
k-1 indices have to be checked.]
What second part? There is no second pass.