Using Java Classes to Sort a Small Array Quickly

E

Eric Sosman

Arne said:
Wanja Gayk wrote:
(e-mail address removed) says...
Wanja Gayk wrote:
Here's the proof: Sorting an array of positive integers in O(n) time: ...
Very nice! Would you care to try this approach on a shorter
input array, like

data = new int[] { Integer.MAX_VALUE };

This case should be quite simple, since the array is already sorted.
Let us know how you make out, will you?

I didn't say it works for any array out there, did I?

By using big-O notation, you said something about the behavior of the
algorithm as the size of the data set tends to infinity, so yes, you did.
No.

But a solution is a bit more practical if it does.

Wanja was making a joke. The performance of an algorithm on one
particular data set is irrelevant to a discussion of asymptotic
performance.

What about correctness? Try his code on the one-element array
I offered: I guarantee that performance will not be an issue. (In
particular, I guarantee you will not get a correct result *and* you
will not run out of memory.)
Bucket sort is O(n) in n.

It depends on what you choose to count, and how. Question: How
much time does it take to select between K alternatives? You need
at least lg(K) decisions. Even if those decisions are being made at
the level of individual transistors you need more time to coordinate
a thousand transistors than to coordinate three. In hardware, this
shows up as a slower clock speed, and in other ways as well, so it
is an error to consider "time to decision" as a constant.

Depending on how you choose to count. Many things can be proven
if you select your counting schemes with due care ...

There's a famous algorithm for finding the largest of an unlimited
quantity of positive numbers: You represent the numbers as chunks of
uncooked spaghetti, each of a length proportional to the number
represented. That's your input. Algorithm: Pick up the whole bunch,
turn it upright, plonk one end down on the kitchen counter, and pluck
out the tallest spaghetto (spaghettus?). Voila! An O(1) maximum-
finding algorithm!

Discerning the flaws in this method is left as an exercise for
the astute reader.
Big O for execution speed traditionally does not look at
memory restrictions and data type restrictions.

Big O for proton density traditionally does not look at execution
time. Big O for the error in Stirling's approximation traditionally
does not look at proton density. Big O for X traditionally does not
look at Y. So?
 
E

Eric Sosman

Very nice! Would you care to try this approach on a shorter
input array, like

data = new int[] { Integer.MAX_VALUE };

This case should be quite simple, since the array is already sorted.
Let us know how you make out, will you?

I didn't say it works for any array out there, did I?

You did. Claiming an algorithm is O(n) means claiming that that
is the upper bound.

No - big O for algorithms is usual average case not worst case.

Big O is for whatever quantity it is measuring.
I am sure you can find plenty of quotes for quicksort being
O(nlogn) and not O(n^2).

Quicksort takes O(n*log n) *average* time, O(n*n) *worst case*
time. Neither of the Big O's is in any way "preferable" to the
other: They are describing different aspects of the algorithm.
There is no reason to *expect* "average time" and "worst case time"
and "time off for good behavior" to have similar Big-O expressions.
 
M

markspace

I'm pretty sure that is not correct. Quoting Wikipedia:

"Informally, especially in computer science, the Big O notation often is
permitted to be somewhat abused to describe an asymptotic tight bound
where using Big Theta Θ notation might be more factually appropriate in
a given context."

That is to say, Big O is the upper bound. Big Theta is the average case
or the expected case.


Quicksort takes O(n*log n) *average* time, O(n*n) *worst case*
time.


Both of these are also incorrect, I think. The sort time for quick sort
is O(n log n) for RANDOM data. This can be proven. The worst case is
O(n^2) and that's for already sorted data, a not unreasonable input. So
the n log n number for quicksort is for a particular arrangement of
data, not the actual upper bound.

The random data argument is a bit like the spaghetti sort. It works
well for certain types of input, like spaghetti, not so well with other
types of data.
 
L

Lew

I'm pretty sure that is not correct. Quoting Wikipedia:

"Informally, especially in computer science, the Big O notation often is
permitted to be somewhat abused to describe an asymptotic tight bound
where using Big Theta Θ notation might be more factually appropriatein
a given context."

That is to say, Big O is the upper bound. Big Theta is the average case
or the expected case.

O is upper bound. Omega is lower bound. Theta is both upper and lower, or"tight" bound. Any of the three can describe average case, best case, expected case or worst case, according to what is being bounded.

As is explained by the very site you cite:

\0> <http://en.wikipedia.org/wiki/Big_O_notation>
 
J

Joshua Cranmer

That is to say, Big O is the upper bound. Big Theta is the average case
or the expected case.

The first part of that statement I might accept as a simplification,
while the second sentence I would definitely not.

f(x) = Theta(g(x)) if and only if lim x->inf f(x)/g(x) is neither 0 nor
infinity; in other words, f(x) and g(x) have the same asymptotic
behavior up to a constant.

Big-O really just means that the function will grow no faster than
another; in a way, it's a pessimistic bound: you can prove that your
function is O(n^3), but it could still be O(n^2) and you can't prove
that it is. If a function is Theta(n^3), however, there is no way in
hell that it could be O(n^2); you have proven that it is exactly cubic
in all cases.

You can use any of these for best-case, average-case, or worst-case
performance, although using big-O for the best-case performance is
somewhat counterintuitive.
Both of these are also incorrect, I think. The sort time for quick sort
is O(n log n) for RANDOM data. This can be proven. The worst case is
O(n^2) and that's for already sorted data, a not unreasonable input. So
the n log n number for quicksort is for a particular arrangement of
data, not the actual upper bound.

There are ways to touch up quicksort to achieve Theta(n lg n) time for
sorted data, and the implementation in the JVM does so.
 
G

Gene Wirchenko

[snip]
You did. Claiming an algorithm is O(n) means claiming that that
is the upper bound.

No - big O for algorithms is usual average case not worst case.

Nope. It is abused for that.
I am sure you can find plenty of quotes for quicksort being
O(nlogn) and not O(n^2).

And that is an example of the abuse. Quicksort is nearly O(n log
n), but technically, it is O(n squared).

I am sure that I can find plenty of quotes for the moon being
made of green cheese, too.

Sincerely,

Gene Wirchenko
 
A

Arne Vajhøj

I'm pretty sure that is not correct. Quoting Wikipedia:

"Informally, especially in computer science, the Big O notation often is
permitted to be somewhat abused to describe an asymptotic tight bound
where using Big Theta Θ notation might be more factually appropriate in
a given context."

That is to say, Big O is the upper bound. Big Theta is the average case
or the expected case.

<http://en.wikipedia.org/wiki/Big_O_notation>

I don't think upper bound in this case means worst case.

See:

http://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations
Both of these are also incorrect, I think. The sort time for quick sort
is O(n log n) for RANDOM data. This can be proven. The worst case is
O(n^2) and that's for already sorted data, a not unreasonable input. So
the n log n number for quicksort is for a particular arrangement of
data, not the actual upper bound.

The random data argument is a bit like the spaghetti sort. It works well
for certain types of input, like spaghetti, not so well with other types
of data.

O(n*logn) is by far the most likely. Which is why that is usually given
as the big O for QS when it is not mentioning both average and worst
case.

But it is possible to have data with O(n^2).

Note that this worst case is only sorted data when using the
first or last element as pivot element. If you use the middle
element as pivot element, then QA works perfectly for sorted data.

(there are the same number of worst cases, but sorted is
just not one of them)

Arne
 
G

Gene Wirchenko

I don't think upper bound in this case means worst case.

Well, yes, it is. That is why it is the upper bound. If a worse
case were possible, then that limit would not be the upper bound.

[snip]

Sincerely,

Gene Wirchenko
 
E

Eric Sosman

Very nice! Would you care to try this approach on a shorter
input array, like

data = new int[] { Integer.MAX_VALUE };

This case should be quite simple, since the array is already sorted.
Let us know how you make out, will you?

I didn't say it works for any array out there, did I?

Ah. Then I claim I can sort an array of integers in O(0) time.
(And my claim is O(as worthwhile) as yours.)

Those constraints would be pretty useless though. On the other hand:
Sorting numbers of a limited range is pretty common.
Either way I would argue that sorting an empty or one-element array is
no sorting at all.

Fine. Then sort

data = new int[] { Integer.MAX_VALUE, 1, 3, 1, 4, 1, 5, 9 };
 
E

Eric Sosman

@glegroupsg2000goo.googlegroups.com>, (e-mail address removed) says...
Lew wrote:
Arne Vajhøj wrote:
Wanja Gayk wrote:
(e-mail address removed) says...
Wanja Gayk wrote:
Here's the proof: Sorting an array of positive integers in O(n) time:
...
Very nice! Would you care to try this approach on a shorter
input array, like

data = new int[] { Integer.MAX_VALUE };

This case should be quite simple, since the array is already sorted.
Let us know how you make out, will you?

I didn't say it works for any array out there, did I?

By using big-O notation, you said something about the behavior of the algorithm as the size of the data set tends to infinity, so yes, you did.

No.

But a solution is a bit more practical if it does.

Wanja was making a joke. The performance of an algorithm on one particular data set is irrelevant to a discussion of asymptotic performance. The hub of the joke was his reference to "O(n)" and calling it "proof" when he was making no statement about asymptotic performance.

I get it.

????

Bucket sort is O(n) in n.

n+k in the average, n^2 worst-case, but that has nothing to do with the joke as I read it. It looked like he was talking about one specific array, and then he challenged us with "I didn't say it works for any array out there, did I?"

It looks like I did misread his post. I didn't realize he was speaking of the general case given the presence of only one input.

The claim was that _any_ sort would be at least O(n log n). I've proven
him wrong by sorting something in o(n) using a kind of bucket sort, that
trades number range for runtime complexity.

And for correctness.
 
E

Eric Sosman

Those constraints would be pretty useless though. On the other hand:
Sorting numbers of a limited range is pretty common.
Either way I would argue that sorting an empty or one-element array is
no sorting at all.

Fine. Then sort

data = new int[] { Integer.MAX_VALUE, 1, 3, 1, 4, 1, 5, 9 };

You don't get it, do you?
Which part of "limited range" was too hard to understand for you?

My apologies. I should have asked you to sort

new int[] { Integer.MAX_VALUE,
Integer.MAX_VALUE - 1,
};

.... an array whose range spans only two values.
 
L

Lew

Wanja said:
(e-mail address removed) says...
My apologies. I should have asked you to sort

new int[] { Integer.MAX_VALUE,
Integer.MAX_VALUE - 1,
};

... an array whose range spans only two values.

Which are both out of range.

http://r3dux.org/wp-content/uploads/2010/09/Trolling-Is-A-Art.jpg

Eric isn't trolling, he's making a technical point. The only thing you might find trollish about his remarks is that they completely contradict your points on technical merit, but rejecting technically valid counterpoint by calling it trolling is itself a trollish ploy.

You kept changing the ground of your assertions, which is itself trolling behavior. You first claim that you aren't talking about all data sets, a point which contradicts your use of big-O notation. Then you told Eric, "On the other hand: Sorting numbers of a limited range is pretty common." So Eric provides TWO examples of sorting numbers of a limited range, and you reject both scenarios. This one you refer to some mythical predetermined range out of which you claim Eric's values are. Huh? What range is that, andwho made you the Range Czar?

You cannot make a big-O assertion about one data set. The concept is meaningless. Asymptotic notation is about algorithmic convergence as problem size grows.

Eric's answer and example were perfectly valid technical challenges to yourtechnical assertions. Instead of answering him fairly and showing how your general statements apply to his particular counterexample, you wave your hands vaguely and refer to some non-existent criteria not previously stated, and accuse him of trolling when he wasn't. This actually renders your points invalid.

You can't make algorithmic claims about one particular data set! Especially when you don't reveal what that data set is.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,814
Latest member
SpicetreeDigital

Latest Threads

Top