Did the sort do anything?

L

Lew

Lawrence said:
Gee, I wonder what language that came from ...

Tim Peters wrote it in C.
http://svn.python.org/projects/python/trunk/Objects/listobject.c

And the algorithm itself came from neither C nor any other computer language;
it came from English.

<http://www.kiwidoc.com/java/l/x/android/android/5/p/java.util/c/TimSort>
"The underlying techniques are described in this paper (and may have even
earlier origins): "Optimistic Sorting and Information Theoretic Complexity"
Peter McIlroy SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms),
pp 467-474, Austin, Texas, 25-27 January 1993."
 
L

Lew

Is 32 bits enough for a hash code? Birthday paradox and all that...

I have no idea what the Birthday Paradox has to do with this, but yes, for the
purposes intended the 32-bit hash code supported natively in Java is
sufficient. Why wouldn't it be?
 
T

Tom Anderson

Tim Peters wrote it in C.
http://svn.python.org/projects/python/trunk/Objects/listobject.c

And the algorithm itself came from neither C nor any other computer language;
it came from English.

<http://www.kiwidoc.com/java/l/x/android/android/5/p/java.util/c/TimSort>
"The underlying techniques are described in this paper (and may have
even earlier origins): "Optimistic Sorting and Information Theoretic
Complexity" Peter McIlroy SODA (Fourth Annual ACM-SIAM Symposium on
Discrete Algorithms), pp 467-474, Austin, Texas, 25-27 January 1993."

I still prefer Smoothsort. Smoothsort may be brain-freezingly arcane (i've
never managed to implement it correctly!), but is simple, consistent, and
elegant. Timsort is basically a pile of hacks, special cases, and
judiciously-chosen arbitrary constants. Don't get me wrong, i think it's a
brilliant piece of work, and it is clearly very effective - but it's a
work of software engineering rather than computer science, and i like my
sorts to be scientific!

What would shut me up would be a performance comparison of two
comparable-quality implementations of the algorithms, showing that Timsort
wipes the floor with Smoothsort. Is anyone aware of one?

tom
 
L

Lew

Tom said:
I still prefer Smoothsort. Smoothsort may be brain-freezingly arcane (i've
never managed to implement it correctly!), but is simple, consistent, and
elegant. Timsort is basically a pile of hacks, special cases, and
judiciously-chosen arbitrary constants. Don't get me wrong, i think it's a
brilliant piece of work, and it is clearly very effective - but it's a work of
software engineering rather than computer science, and i like my sorts to be
scientific!

How is "brain-freezingly arcane" compatible with "simple"?
 
M

markspace

What would shut me up would be a performance comparison of two
comparable-quality implementations of the algorithms, showing that
Timsort wipes the floor with Smoothsort. Is anyone aware of one?


I'm not sure, but I believe the advantage of Timsort over Smoothsort is
that Timsort has O(n) performance for both ascending and descending
sequences, where Smoothsort appears to only have O(n) for ascending
sequences.

There was a recent "shoot out" among various sort algorithms, and
Timsort won. I don't recall exactly which algorithms were included, but
I'm sure if Smoothsort were a serious contender it would have been examined.
 
T

Tom Anderson

How is "brain-freezingly arcane" compatible with "simple"?

Edsger Dijkstra wrote the formal definition in a language of his own
invention (which has dynamic scoping) using a notation you can't even
completely write in ASCII, and accompanied it with a description in
English couched in terms of a mix of mathematical abstractions and
whimsical terminology he made up on the fly:

http://www.cs.utexas.edu/users/EWD/transcriptions/EWD07xx/EWD796a.html

It is possible to rephrase it more comprehensibly:

http://www.keithschwarz.com/smoothsort/

As well as that initial barrier, there is the fact that although it's
fairly easy to see what the algorithm is doing, it's much harder to
understand why it works. Keith Schwarz has done a bang-up job of
explaining it, though.

Basically, Dijkstra was a brilliant guy, but a bit of a bastard.

tom
 
T

Tom Anderson

I'm not sure, but I believe the advantage of Timsort over Smoothsort is
that Timsort has O(n) performance for both ascending and descending
sequences, where Smoothsort appears to only have O(n) for ascending
sequences.

Interesting point. You could probably get round that by having Smoothsort
make an initial guess at the direction of the sequence, and then sort
backwards (forwards but with an inverted comparison) if appropriate, but
that's the first step down the path of hackery which Timsort has already
so victoriously travelled.
There was a recent "shoot out" among various sort algorithms, and
Timsort won. I don't recall exactly which algorithms were included, but
I'm sure if Smoothsort were a serious contender it would have been
examined.

I'd be very interested to see that shootout - do you remember where it
was?

tom
 
L

Lawrence D'Oliveiro

Basically, Dijkstra was a brilliant guy, but a bit of a bastard.

He was right about gotos being harmful, but I’m still not sure I understand
his remark that “computer science is no more about computers than astronomy
is about telescopesâ€.
 
L

Lawrence D'Oliveiro

In all my years, I had never noticed that stability is irrelevant for
primitives.

It’s only irrelevant for types where the key is the entire value.

If you were sorting integers based on, say, the units digit, then stability
would most certainly become relevant, even thought integers are a
“primitive†type in Java.

If, on the other hand, you were sorting immutable objects of a Java
“reference†type where the key was the entire object state, then stability
would indeed be irrelevant, notwithstanding such types are not considered
“primitiveâ€.
 
J

Joshua Cranmer

He was right about gotos being harmful, but I’m still not sure I understand
his remark that “computer science is no more about computers than astronomy
is about telescopesâ€.

In Dijkstra's day, I would say that the greater focus on computer
science was on the algorithmic research. At the very least, I am
guessing that he did not approach computer science from a systems
perspective, and instead approached it from the theory end.
 
L

Lew

He was right about gotos being harmful, but I’m still not sure I understand

He was stretching a point to make a point, of course, and the considered
reader will factor that in.

You cannot have a useful computer program without gotos. It was the
willy-nilly, undisciplined, wild goto that Dijkstra deprecated. He
specifically states, "The unbridled use of the go to statement has as an
immediate consequence that it becomes terribly hard to find a meaningful set
of coordinates in which to describe the process progress." Note that he's
speaking of "unbridled" use.

Great word.

The solution, of course, is to harness the goto as one would a horse - confine
it to the traces of an 'if', 'for', 'while', 'do', 'switch', or method.
his remark that “computer science is no more about computers than astronomy
is about telescopesâ€.

He's saying the science is about the science, not the instrument used. He's
also being intentionally coy. Surely you recognize the pattern.
 
L

Lawrence D'Oliveiro

Do you consider the result of System.identityHashCode(x) to be part of
the state of the object referenced by x?

It is computed from the state, is it not?
 
L

Lawrence D'Oliveiro

In Dijkstra's day, I would say that the greater focus on computer
science was on the algorithmic research.

Ah, the old “algorithm†versus “program†meaningless dichotomy.
At the very least, I am guessing that he did not approach computer science
from a systems perspective, and instead approached it from the theory end.

Even from a theory end, I had it drummed into me that everything we build is
an “abstract machineâ€. The fact that some machines are “hardware†and others
“software†is neither here nor there. Even in the real world, the
distinction is often blurred, if not turned entirely on its head.
 
L

Lew

It is computed from the state, is it not?

No. It most certainly is not. The identity hash code is computed once for
each object irrespective of and independently of anything else you might
consider the object's state.

As a cursory reading of the Javadocs makes clear:
<http://download.oracle.com/javase/6...ystem.html#identityHashCode(java.lang.Object)>

Even without the rather intelligent step of checking the documentation, you
can reason out that the answer must be no. Either the identity hash code is
part of the state, in which case it is not computed from the state but *is*
part of the state, or it is not, in which case it is not computed from the
state. Which it is depends on your definition of "the state", i.e., whether
you include the identity hash code as part of the state. Yes, it's
tautological, but that's the result of the way you framed the question.
 
L

Lew

Ah, the old “algorithm†versus “program†meaningless dichotomy.

Ah, the old imitate Maxwell Smart in a meaningless attempt to sound sage trick.
 
L

Lawrence D'Oliveiro

Distinct objects probably have different identity hash codes regardless of
whether their fields all have the same values.

“By contract, any two objects for which equals(Object) returns true must
return the same hash code value.â€

<http://developer.android.com/reference/java/lang/Object.html#hashCode()>

“Distinct†would presumably mean “not equalâ€.
Its existence ensures that program behavior can be affected by object
sort stability or instability, even if the sort key includes all fields.
The sort key would have to also include the identity hash code to make
stability irrelevant.

The sort key would have to be consistent with the definition of “equalityâ€,
would it not?
 
J

javax.swing.JSnarker

“By contract, any two objects for which equals(Object) returns true must
return the same hash code value.â€

<http://developer.android.com/reference/java/lang/Object.html#hashCode()>

“Distinct†would presumably mean “not equalâ€.

First, that applies to equals and hashCode and to == and
identityHashCode, but not to equals and identityHashCode.

Second, it means that objects that are the same must have the same hash
code, but not the other way around. Different objects are allowed to
have the same hash code.
The sort key would have to be consistent with the definition of “equalityâ€,
would it not?

If it included the identity hash code, it would have to be consistent
with ==.
 
L

Lew

“By contract, any two objects for which equals(Object) returns true must
return the same hash code value.â€

<http://developer.android.com/reference/java/lang/Object.html#hashCode()>

That's the wrong reference, as that is not the method under discussion.

The discussion is about a different method in a different class:
“Distinct†would presumably mean “not equalâ€.

except for the little matter of that documentation being not relevant.
The sort key would have to be consistent with the definition of “equalityâ€,
would it not?

Not in the case of identity hash code, no.

I gave the correct API reference upthread. Note that there is no requirement
or hint for consistency with 'equals()'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top