Sorting with only a partial order definition

  • Thread starter =?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=
  • Start date
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

I have a list of items and a "rule" for ordering them.

Unfortunately, the rule is not complete so it won't define the correct
order for any two items in that list.

In other words, if I pick two random items from the list I may or may
not have a rule that dictates the order of those two items. The rule
could be "implicit" in that I got rules for other items, for instance:

[a, b, c, d]
a < b
b < c

If I now pick a and c out of the list I would not know wether a or c
comes first, unless I grab the rules for a<b and b<c and imply that a<c
from those two.

As such, there could be potentially many correct results from ordering
such a list. (d is unspecified above so any position for d is actually
legal)

Is there anything in Python that will help me sort a list with something
like this ?

For instance, given the following list of items:

items = ["a", "b", "c", "d"]

and the following two rules:

"a" comes before "d"
"d" comes before "b"

then the following is a list of correct results (could be more though):

["a", "d", "b", "c"]
["a", "c", "d", "b"]
["a", "d", "c", "b"]
....

Note that this is an arbitrary example. The real list is a list of lists
containing database rows, ie. something like this:

[[10, "Column 2", "Column 3"],
[15, "Column 2", "Column 3"],
...]

If there isn't anything built in, does anyone have an idea about how to
go about creating such an ordering function ? Tweaking a cmp-like
function to return 0 for "undefined" didn't seem to work so there must
be a slightly more intelligent solution than that. Perhaps the rules
would have to be checked in a specific order.

Doing a combinatorial solution and checking the rules aren't really an
option as on last count there was close to 20K rows.

There's also a lot of rules (also coming from a database). I was
thinking I could do a sort based on one of the rules and use a stable
sort for the following rules but doing that sometimes rearranged the
list so that it was now incorrect going by the previous rules.

Here's my initial test for doing a cmp-like solution:

order = set([("a", "d"), ("d", "b")])
def my_cmp(a, b):
if (a, b) in order:
print a, "<", b
return -1
elif (b, a) in order:
print a, "<", b
return +1
else:
print a, "?", b
return 0

items = ["c", "b", "a", "d"]
items.sort(cmp=my_cmp)
print items

This prints out:

b ? c
a ? b
d < a
['c', 'b', 'a', 'd']

Which is obviously incorrect.

Any help or pointers would be most welcome.
 
B

Bryan Olson

Lasse said:
> I have a list of items and a "rule" for ordering them.
>
> Unfortunately, the rule is not complete so it won't define the correct
> order for any two items in that list.

This is called a "partial ordering".

[...]
> If there isn't anything built in, does anyone have an idea about how to
> go about creating such an ordering function ? Tweaking a cmp-like
> function to return 0 for "undefined" didn't seem to work so there must
> be a slightly more intelligent solution than that. Perhaps the rules
> would have to be checked in a specific order.

The usual tools to deal with partial orderings are directed acyclic graphs,
and "topological sorting". Try Googling the terms along with "Python".
 
P

Paul Rubin

Lasse Vågsæther Karlsen said:
I have a list of items and a "rule" for ordering them.

Unfortunately, the rule is not complete so it won't define the correct
order for any two items in that list.

In other words, if I pick two random items from the list I may or may
not have a rule that dictates the order of those two items. The rule
could be "implicit" in that I got rules for other items, for instance:

That's called topological sorting and any reference on graph
algorithms will describe how to do it. I don't know of Python code
offhand but it's easy to write.

http://en.wikipedia.org/wiki/Topological_sorting

gives a straightforward linear-time algorithm.
 
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

<snip>

Ok, managed to implement the algorithm. Might not be the optimal
solution (memory and speed-wise) but it worked and doesn't take too long
to run either so I'm going to stick with it.

I have a different question though, along the same lines, but one that
doesn't need a solution, just want to know if something like it exists.

A while back I had an application that had a lot of items that it needed
to order. The problem, however, was that the rules was not defined at
all. Instead it was opted for a visual solution where the user would be
presented with images and had to rearrange them in the order that was
necessary. The application was one I helped build for a friend which
combined images from several cameras and allowed him to sort them
according to the contents so that he could get a timeline formed.

The date/time stamps on the cameras was not directly usable for various
reasons so the visual ordering was what we ended up on. For instance,
two of the cameras was not digital ones so they had no timestamp except
for the one provided by the scanning software.

In that application we talked about presenting the user with two and two
images and he just had to click on the image that came first. The
problem with this was to try to present the "right" images to the user
so that he had to minimize the number of clicks.

In other words, try to pick nodes in the graph and ask the user to
provide the direction of the edge, and the "picking algorithm" would
work in such a way that the number of edges would be minimized.

Not sure if I'm explaining it correctly.

The solution we ended up with was to present the user with all the
images in one big timeline and just let him drag them around. This worked.

What I was wondering about is if there is an algorithm that would do
what I want? Ie. help me pick the nodes so as to minimize the number of
edges. Obviously the answer to the first pair of nodes will influence
which nodes will be subsequently picked, so each answer would stear the
algorithm in a way, not just go into the final problem.

If anyone got the name of such an algorithm or something I would like to
look at it at least. Application is built and deployed so I'm not
looking for a solution to implement.
 
P

Paul Rubin

Lasse Vågsæther Karlsen said:
In that application we talked about presenting the user with two and
two images and he just had to click on the image that came first. The
problem with this was to try to present the "right" images to the user
so that he had to minimize the number of clicks.

If you're trying to sort those pictures chronologically, there is a
total ordering and the best you can do in general is a standard
O(N log N) sorting algorithm. If you've got good timestamps on some
of the pics, so you want to minimize comparisons involving pics with
no timestamps, hmm, maybe some optimizations are possible.
 
T

Toby Dickenson

What I was wondering about is if there is an algorithm that would do
what I want? Ie. help me pick the nodes so as to minimize the number of
edges.

To rephrase your question, you want a sorting algorithm that minimises the
number of comparisons (because a comparison involves asking a human), and
which takes advantage of any pre-existing rough ordering.

You need timsort - the algorithm behind python lists sort() method.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,269
Messages
2,571,338
Members
48,026
Latest member
DannieKeeg

Latest Threads

Top