Did the sort do anything?

R

Roedy Green

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

Some suggestions:

1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.

2. back a copy of the unsorted list of items. After the sort, do a
pairwise compare for identity. If all are identical, the sort did not
do anything.

3. write your own sort that has a boolean function you can ask if it
moved anything.

4. do some sort of checksum before and after.

--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
M

markspace

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

Some suggestions:

1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.

This one here is probably the best idea of those you list. Note that
you'll save a bit of time avoiding sorts with the current merge sort
that Java uses, but a Tim sort (soon to be the standard sort in Java)
does this already, so in the long run you'll just add an O(n) to your sorts.

If possible, you should get on the Java developer list (it requires a
full release though) and make some suggestions. Returning a boolean for
"I did something" is not a terrible idea. I also see occasionally
people ask for a way to keep to disparate lists in sync when sorting.
This might be useful too. I.e., provide a way to override the "swap"
function in the sort.
 
A

Andreas Leitgeb

Roedy Green said:
Often you sort things when they are already sorted.
I am interested in simple algorithms to detect whether the sort
actually did anything.
Some suggestions:
1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.
2. back a copy of the unsorted list of items. After the sort, do a
pairwise compare for identity. If all are identical, the sort did not
do anything.
3. write your own sort that has a boolean function you can ask if it
moved anything.
4. do some sort of checksum before and after.

5. wrap the Collection such, that certain modifications set an "unsorted"-
flag, e.g. if an element is inserted/replaced/appended that doesn't compare
in the intended way with its neighbours. Then shortcut the sort, if the said
flag is clear.
 
D

Daniele Futtorovic

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

Some suggestions:

1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.

2. back a copy of the unsorted list of items. After the sort, do a
pairwise compare for identity. If all are identical, the sort did not
do anything.

3. write your own sort that has a boolean function you can ask if it
moved anything.

4. do some sort of checksum before and after.

I find the checksum idea quite intriguing, actually. It would have to do
more operations than some kind of perhaps binary seep through, (although
just as many if the list actually /is/ sorted). But the average greater
efficiency of e.g. #hashCode() vs. #compare() might well offset that,
meponders.

Is there any existing research on this?
 
R

Robert Klemme

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

What would the benefit of this be? When you learn the fact the current
sorting run has been done already. And the data tells you nothing about
what future sorts will do. OK, you know, they won't do anything if you
do not modify the collection / array. But you did know that before,
didn't you? And if you insert in sort order then you also know that
there is no additional sort necessary. If you pull the data from some
external source you also do not know whether next time round you need to
sort. So what is this information useful for?

Btw, couldn't option 3 be tricky? Depending on the sorting algorithm
things might be moved around but the final order might be the same as
before. Not that this would be desirable but I am pretty sure there are
sorting algorithms which have this property (probably quicksort). Now,
how then do you interpret your boolean flag? Basically you only know
sequences are identical if there was no movement whatsoever, but if
there was movement then you do not know. Which means you would need to
apply one of the other strategies additionally...

Kind regards

robert
 
T

Tom Anderson

I am interested in simple algorithms to detect whether the sort actually
did anything.

Write a Comparator that keeps track of how many times it returns >0, and
keep your fingers crossed that the sort algorithm always passes it
parameters in index order? A quick try seems to indicate that this does
work with Arrays.sort, although i don't believe the spec makes it
something you could rely on.

You could do something with a decorator that associates each element with
its index in the list, then use that information to do the order check in
the comparator. I'm not explaining this well, but it's a pretty bad idea,
so i won't.

If you wrote your own implementation, you could simply count the number of
times you swap elements, and if it's nonzero, you know the list was out of
order. This would be a good opportunity to implement a good sorting
algorithm - something that is, unlike mergesort, adaptive and in-place
(although not necessarily stable), perhaps Smoothsort:

http://www.keithschwarz.com/smoothsort/

You could also half-implement your own algorithm; do a linear scan through
your list, checking that each consecutive pair of elements is in order. As
soon as you hit a pair that isn't, stop, ask Collections.sort to sort the
remainder of the list, then go back and merge that half with the sorted
prefix you scanned through. Then return false. If you made it to the end
of the list without finding anything out of order, return true.

tom
 
T

Tom Anderson

I find the checksum idea quite intriguing, actually. It would have to do
more operations than some kind of perhaps binary seep through,

A binary seep? What's that, then?
(although just as many if the list actually /is/ sorted). But the
average greater efficiency of e.g. #hashCode() vs. #compare() might well
offset that, meponders.

Do you think hashCode is generally more efficient than compare? I would
have thought the opposite, because for hashCode, you have to examine all
an object's fields, whereas for a compareTo, you only have to examine as
many as it takes to find some which are different. Unless the hash is
cached, as in String.
Is there any existing research on this?

The only thing i can think of which is remotely similar is the Rabin-Karp
string searching algorithm, which computes a rolling hash over a string to
search it. And the rsync algorithm, which does something similar. And
those really aren't even remotely similar.

tom
 
D

Daniele Futtorovic

A binary seep? What's that, then?

Uuhh... well... kinda like a binary sort, except you'd return (const
bool NOT_ORDERED) instead of swapping.

Do you think hashCode is generally more efficient than compare? I would
have thought the opposite, because for hashCode, you have to examine all
an object's fields, whereas for a compareTo, you only have to examine as
many as it takes to find some which are different. Unless the hash is
cached, as in String.

Yes, I thought so because of the caching. Any instance could cache the
hashCode() at least between mutation of its data (theoretically,
assuming it can track mutation), whereas a comparison cannot, because it
compares against something different every time.
In other words, between mutations, each invocation of hashCode() yields
the same result, but each invocation of compare() potentially a
different one (depending on what's compared against).
 
J

Joshua Cranmer

Btw, couldn't option 3 be tricky? Depending on the sorting algorithm
things might be moved around but the final order might be the same as
before. Not that this would be desirable but I am pretty sure there are
sorting algorithms which have this property (probably quicksort).

The term you are looking for is the stability of the sort. A stable sort
preserves the order of equally-compared object; unstable sorts do not.
Of the major sorts, heapsort and quicksort are the only sorts which are
unstable, although a bad of implementation of any stable sort could be
unstable. Wikipedia claims that quicksort can be implemented stably, but
that comes at an O(n) space price.

Java uses quicksort for primitive types in Arrays.sort and mergesort for
reference types: since you can't distinguish two equal primitive types,
you can't tell that quicksort isn't stable.
 
R

Robert Klemme

The term you are looking for is the stability of the sort.

No, not exactly. Stability only refers to the _output_ of the
complete sort operation. At least in theory a sort could move things
around even though it is stable.

The question what that bit of information is useful for still remains
unanswered.

Cheers

robert
 
R

Roedy Green

Do you think hashCode is generally more efficient than compare? I would
have thought the opposite, because for hashCode, you have to examine all
an object's fields, whereas for a compareTo, you only have to examine as
many as it takes to find some which are different. Unless the hash is
cached, as in String.

The object as a whole has a hashCode. that is just a 32 bit
operation. The compare might end up comparing a number of long
strings char by char. HashCodes are cached. The speed comes when it
is already computed.

In some cases you can tolerate a small rate of error. I would think
often mistaking a no-change for a change would not be catastrophic,
though the opposite normally would be. I don't think any
hashCode/checksum method could guarantee 100% accuracy.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
R

Roedy Green

What would the benefit of this be?

A very simple example would be a sort utility that reads data into RAM
and sorts it. If the sort did not change the order, there is no need
to write it back.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
R

Roedy Green

Java uses quicksort for primitive types in Arrays.sort and mergesort for
reference types: since you can't distinguish two equal primitive types,
you can't tell that quicksort isn't stable.

Java's sort on objects is stable IIRC. There you can tell apart
objects that compare equal by their addresses if not other fields in
the object not used in the compare.

In all my years, I had never noticed that stability is irrelevant for
primitives. It is like a pleasant little backscratch to have that
brought to my attention.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
R

Robert Klemme

A very simple example would be a sort utility that reads data into RAM
and sorts it. If the sort did not change the order, there is no need
to write it back.

Sounds plausible - at first. OTOH in such a situation I would probably
rather remember the last modification time and only sort if it has
changed after the last sorted writing. That way you are even more
efficient because you save the effort of read IO as well.

Maybe it's just that I didn't have such a use case but I am still not
really convinced that what you are proposing is so useful.

Kind regards

robert
 
S

Spock

In some cases you can tolerate a small rate of error. I would think
often mistaking a no-change for a change would not be catastrophic,
though the opposite normally would be. I don't think any
hashCode/checksum method could guarantee 100% accuracy.

Unfortunately, hashCode/checksum methods will never detect change where
there is none, but may sometimes detect no change when there is one,
which is precisely the type of error that would be more serious.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,697
Latest member
AugustNabo

Latest Threads

Top