Would there be support for a more general cmp/__cmp__

C

Christopher Subich

Antoon said:
It *is* a definition of an ordering.

For something to be an ordering it has to be anti symmetric and transitive.

The subset relationships on sets conform to these conditions so it is a (partial)
ordering. Check your mathematic books, Why you would think this is abuse is beyond me

Which is exactly why a < b on sets returns True xor False, but cmp(a,b)
throws an exception.

a <COMPARE> b is a local comparison, asking only for the relationship
between two elements. In some bases, like the complex numbers, some
comparisons are ill-defined.; in others, like sets, they're well-defined
but don't give a total ordering.

cmp(a,b) asks for their relative rankings in some total ordering. For a
space that does not have a total ordering, cmp(a,b) is meaningless at
best and dangerous at worst. It /should/ throw an exception when the
results of cmp aren't well-defined, consistent, antisymmetric, and
transitive.
 
C

Christopher Subich

Antoon said:
I also think there is the problem that people aren't used to partial
ordering. There is an ordering over sets, it is just not a total
ordering. But that some pairs are uncomparable (meaning that neither
one is smaller or greater) doesn't imply that comparing them is
ill defined.

It depends on your definition of "comparison." Admittedly, <, =, !=,
and > can be defined for a partial ordering, but classical mathematics,
as understood by most people (even programmers), assumes that unless a
== b, a > b or a < b.

The comparisons, as defined this way, are done on a totally ordered set,
yes. But since most comparisons ever done -are- done on a totally
ordered set (namely the integers or reals), it's entirely fair to say
that "a programmer's expectation" is that comparisons should work more
or less like a totally ordered list.


With that caevat in mind, comparison on a partially ordered domain /is/
Well it is a wrong assumption is general. There is nothing impure about
partial ordering.

Impure? Probably not. Useless from many perspective when "comparisons"
are needed, to the point where it's probably safer to throw an exception
than define results.
That is IMO irrelevant. The subset relationship is an ordering and as
such has all characteristics of other orderings like "less than",
except that the subset relationship is partial and the less than
relationship is total. How it is called "subset" vs "less than" is
IMO irrelevant. It is about mathematical characteristics.

Still accident. < wouldn't be used for sets if we had a subset symbol
on the standard keyboard, APL fans notwithstanding. Simce programmers
familiar with Python understand that < on sets isn't a "real" comparison
(i.e. provide a total order), they don't expect unique, consistent
results from something like sort (or a tree, for that matter).
Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.

Yes, it does. Consider "in" as a mathematical operator:

For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.

Notice that L1 and L2 contain the same elements, just in different orders.
Sorting those two lists should give the same order, correct?


No. Since we don't have a total ordering, sorting doesn't has to give
the same result. For as far as sorting is defined on this kind of
object, the only thing I would expect is that after a sort the
following condition holds.

for all i,j if i < j then not L > L[j]


Highly formal, aren't you? Again, common programming allows for "not a
> b" == "a <= b", so your condition becomes the more familiar:

for all i,j in len(list), if i < j then L <= L[j]

This condition is also assumed implicitly by many other assumed "rules"
of sorted lists, namely their uniqueness:

"If L is a list of unique keys, then sort(L) is a unique list of keys in
sorted order."

Yes, this assumes that L has a total order. Big whoop -- this is a
matter of programming practiciality rather than mathematical purity.

To reply to the grandparent poster here: that's not always easy to do.
A "sensible order" isn't always easy to define on a set where a partial
order exists, if it includes that order already. Adding
comparison-pairs to a partially ordered set (where incomparable elements
throw an exception, rather than just return False which is confusing
from sort's perspective) can easily result in a graph that isn't
actually transitive and antisymmetric, i.e.:

A > B > C, but
C > D > A for some operator ">"

With this in mind, the only sensible thing for .sort to do when it
encounters an exception is to fall back to its "backup" comparator (id,
for example), and resort /the entire list/ using that comparator. The
results returned will then be valid by sort's comparison, but a subset
of that list containing only "good" objects (like integers) may not (and
probably won't be) sorted itself using the object's comparison.

The difficulty arises because while we may consider a single type to be
fully ordered, it may also have some comparisons with other types;
sorting the list by type, then value seems like it would work (and
that's what the library does), but since we can sometimes compare across
types, this may not give a valid comparison. Consider:

[apple, wheat, cow]
apple < cow (defined, totally ordered)
Mammal > Grain > Fruit (accident of type id numbers)
apple < wheat throws TypeError, also wheat < cow

Since we encounter a TypeError when sorting this list, we first sort by
type:
[cow, wheat, apple]
and would then sort-by-comparison within types, if we had more than one.

But since apple < cow, the sublist of this list [cow, apple] is not
itself sorted. This is paradoxical under a single comparison, because a
sublist of any sorted list is itself supposed to be sorted.

The ordering that .sort used on the full list is well-defined and total,
but it's also useless.

But will all libraries sort the shelves in the same way. As far as I
understand a book gets somekind of code in a library and books are
sorted according to these codes. Now everybody sorts these codes the
same way, but the attibution of the code may differ from library to
library.

Irrelevant; if you swap the order of any two (nonidentical) books in a
single library, then the library is unsorted. The library has imposed a
total order on books in its collection.

The library example is irrelevant, anyway; two books can always be
compared by the keys (author last, author first/middle, [other authors],
title, editor, year published, publisher, number of pages, colour of the
brightest line in the cover's emmission spectrum at 2000k in spectral
order [within visible spectrum]), and for the most part this is a useful
keyspace. Libraries just choose to impose a category/numbering at the
head of the key.
 
A

Antoon Pardon

Op 2005-10-25 said:
Which is exactly why a < b on sets returns True xor False, but cmp(a,b)
throws an exception.

I don't see the conection.

The documentation states that cmp(a,b) will return a negative value if
a < b. So why should it throw an exception?
a <COMPARE> b is a local comparison, asking only for the relationship
between two elements. In some bases, like the complex numbers, some
comparisons are ill-defined.; in others, like sets, they're well-defined
but don't give a total ordering.

cmp(a,b) asks for their relative rankings in some total ordering.

The documentation doesn't state that. I also didn't find anything in
the documentation on how the programmer should code in order to
enforce this.

So someone who just use the rich comparisons to implement his
partial order will screw up this total order that cmp is somehow
providing.

And cmp doesn't provide a total ordering anyway. Sets are clearly
uncomparable using cmp, so cmp doesn't provide a total order.

Maybe the original idea was that cmp provided some total ordering,
maybe the general idea is that cmp should provide a total ordering,
but that is not documented, nor is there any documentation in
helping the programmer in this.

And even if we leave sets aside it still isn't true.
-1


For a
space that does not have a total ordering, cmp(a,b) is meaningless at
best and dangerous at worst.

The current specs and implemantation are.

I see nothing wrong with a function that would provide four kinds of
results when given two elements. The same results as cmp gives now
when it applies and None or other appropiate value or a raised
exception when not.

Given how little this functionality differs from the current cmp,
I don't see why it couldn't replace the current cmp.
It /should/ throw an exception when the
results of cmp aren't well-defined, consistent, antisymmetric, and
transitive.

That an order is not total in no way implies any of the above.
The order on sets is well-defined, consistent, antisymmetric
and transitive.

Python may not be able to handle it in a well-defined, consistent
manner, but that is IMO a problem of python not of partial orders.
 
S

Steven D'Aprano

Yes, it does. Consider "in" as a mathematical operator:

For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.

What do you mean "in" is a useless ordering? It makes a huge difference
whether "nuclear bomb in New York" is true or not.

In fact, I'm quite surprised that Antoon should object to "in" as "this
doesn't define a mathematical ordering, the subset relationship does" when
"subset" is just "in" for sets: set S is a subset of set T if for all
elements x in S, x is also in T. Informally, subset S is in set T.

Can somebody remind me, what is the problem Antoon is trying to solve here?
 
C

Christopher Subich

Antoon said:
Op 2005-10-25, Christopher Subich schreef <[email protected]>:



I don't see the conection.

The documentation states that cmp(a,b) will return a negative value if
a < b. So why should it throw an exception?

Because it's useful that cmp(a1, a2) should either (return a value) or
(throw an exception) for any element a1, a2 within type(a1) cross
type(a2). If cmp sometimes is okay and sometimes throws an exception,
then it leads to weird borkage in things like trees.

With that in mind, not all sets are comparable. {a} and {b} have no
comparison relationship, as you've pointed out, aside from not-equal.
I'll get to why "not-equal" is a bad idea below.
The documentation doesn't state that. I also didn't find anything in
the documentation on how the programmer should code in order to
enforce this.

Most programmers are simply programmers; they haven't had the benefit of
a couple years' pure-math education, so the distinction between "partial
order" and "total order" is esoteric at best.

With that in mind, compare should conform, as best as possible, to
"intuitive" behavior of comparison. Since comparisons are mostly done
on numbers, an extension of comparisons should behave "as much like
numbers" as possible.
So someone who just use the rich comparisons to implement his
partial order will screw up this total order that cmp is somehow
providing.

And cmp doesn't provide a total ordering anyway. Sets are clearly
uncomparable using cmp, so cmp doesn't provide a total order.

Cmp isn't supposed to "provide" a total order, it's supposed to reflect
relative standing in one if one already exists for P1 x P2. If one
doesn't exist, I'd argue that it's the best course of action to throw an
exception.

After all, rich comparisons were put in precisely to allow support of
limited comparisons when a total order (or indeed full comparison) isn't
appropriate.
Maybe the original idea was that cmp provided some total ordering,
maybe the general idea is that cmp should provide a total ordering,
but that is not documented, nor is there any documentation in
helping the programmer in this.

I doubt that anyone was thinking about it in such depth. My bet is that
the thought process goes this way:

Og compare numbers. Numbers good, compare good. Grunt grunt.
<some aeons pass>
Language designers: Wouldn't it be nice if we could allow user-defined
objects, such as numbers with units, to compare properly with each
other? This would let us say (1 m) < (.5 mile) pretty easily, eh?

Guido: Let's let classes override a __cmp__ function for comparisons.

In programming language theory, comparisons were firstly about numbers,
and their leading-order behaviour has always stayed about numbers.
Comparing entities which are best represented in an... interesting
formal mathematical way (i.e. partial orders, objects for which some
comparisons are Just Plain Weird) works only as a side-effect of
number-like behavior.

The lesson to take home from this: the less a custom class behaves like
a number, the less intutitively meaningful (or even valid) comparisons
will be on it.
And even if we leave sets aside it still isn't true.



-1

I'd argue that the wart here is that cmp doesn't throw an exception, not
that it returns inconsistent results. This is a classic case of
incomparable objects, and saying that 1 < an empty tuple is bordering on
meaningless.

The current specs and implemantation are.

I see nothing wrong with a function that would provide four kinds of
results when given two elements. The same results as cmp gives now
when it applies and None or other appropiate value or a raised
exception when not.

Given how little this functionality differs from the current cmp,
I don't see why it couldn't replace the current cmp.

My biggest complaint here is about returning None or IncomparableValue;
if that happens, then all code that relies on cmp returning a numeric
result will have to be rewritten.

Comparing incomparables is an exceptional case, and hence it should
raise an exception.

As for saying that cmp should return some times and raise an exception
other times, I just find it squicky. Admittedly, this is entirely up to
the class designer, and your proposed guideline wouldn't change cmp's
behavior for clases that /are/ totally ordered.

Then again, sets excepted, your guideline doesn't seem very applicable
in standard library code.
That an order is not total in no way implies any of the above.
The order on sets is well-defined, consistent, antisymmetric
and transitive.

A partial order is by definition consistent, antisymmetric, and
transitive. A total order is furthermore universal, such that if a !=
b, aRb xor bRa.

This matches the normal use of cmp, and comparing two incomparable
objects is an exceptional case -- hence the exception. Although again,
you're free to provide a cmp for your classes that operates only on a
subset of type x type.
 
S

Steve Holden

Steven said:
What do you mean "in" is a useless ordering? It makes a huge difference
whether "nuclear bomb in New York" is true or not.

In fact, I'm quite surprised that Antoon should object to "in" as "this
doesn't define a mathematical ordering, the subset relationship does" when
"subset" is just "in" for sets: set S is a subset of set T if for all
elements x in S, x is also in T. Informally, subset S is in set T.

Can somebody remind me, what is the problem Antoon is trying to solve here?
Being Belgian, I suspect.

regards
Steve
 
A

Antoon Pardon

Op 2005-10-25 said:
Because it's useful that cmp(a1, a2) should either (return a value) or
(throw an exception) for any element a1, a2 within type(a1) cross
type(a2). If cmp sometimes is okay and sometimes throws an exception,
then it leads to weird borkage in things like trees.

Oh no, not "we have to protect the programmer" argument again.

This kind of behaviour will manifest itself soon enough in testing.
Almost eacht time some addition is suggested that could help
with finding bugs, it gets rejected that because such bugs will be
found soon enough with proper testing.

And in order to protect one specific case, you limit programmers
the use in all other cases where it could be usefull.
With that in mind, not all sets are comparable. {a} and {b} have no
comparison relationship, as you've pointed out, aside from not-equal.
I'll get to why "not-equal" is a bad idea below.


Most programmers are simply programmers; they haven't had the benefit of
a couple years' pure-math education, so the distinction between "partial
order" and "total order" is esoteric at best.

So. these people are not familiar with higher order functions either.
Do you think we should make python a language only for the not too
educated.
With that in mind, compare should conform, as best as possible, to
"intuitive" behavior of comparison. Since comparisons are mostly done
on numbers, an extension of comparisons should behave "as much like
numbers" as possible.

IMO cmp(set([0]), set([0,1])) giving a result of -1, is much more
number like behaviour than throwing an exception. With numbers
we have that a < b implies cmp(a,b) < 0. So the most number like
behaviour for sets would also be that a < b would imply cmp(a,b) < 0.
I doubt that anyone was thinking about it in such depth. My bet is that
the thought process goes this way:

Which seems to be my biggest frustration with python. Things are
often enough not thought out in depth.
I'd argue that the wart here is that cmp doesn't throw an exception, not
that it returns inconsistent results. This is a classic case of
incomparable objects, and saying that 1 < an empty tuple is bordering on
meaningless.

I wont argue with you here, but it is what python gives as now.
Changing this behaviour is not going to happen.
My biggest complaint here is about returning None or IncomparableValue;
if that happens, then all code that relies on cmp returning a numeric
result will have to be rewritten.

I don't know. There are two possibilities.

1) The user is working with a total order. In that case the None
or IncomparableValues won't be returned anyway so nothing about
his code has to change.

2) The user is working with a partial order. In that case cmp
doesn't provide consistent results so the use of cmp in this
case was a bug anyway.
Comparing incomparables is an exceptional case, and hence it should
raise an exception.

As for saying that cmp should return some times and raise an exception
other times, I just find it squicky.

But cmp already behaves this way. The only difference would be that
the decision would be made based on the values of the objects instead
of only their class.
Admittedly, this is entirely up to
the class designer, and your proposed guideline wouldn't change cmp's
behavior for clases that /are/ totally ordered.

Then again, sets excepted, your guideline doesn't seem very applicable
in standard library code.

Well AFAIAC this isn't specific about library code.
 
A

Antoon Pardon

Op 2005-10-25 said:
It depends on your definition of "comparison." Admittedly, <, =, !=,
and > can be defined for a partial ordering, but classical mathematics,
as understood by most people (even programmers), assumes that unless a
== b, a > b or a < b.

The comparisons, as defined this way, are done on a totally ordered set,
yes. But since most comparisons ever done -are- done on a totally
ordered set (namely the integers or reals), it's entirely fair to say
that "a programmer's expectation" is that comparisons should work more
or less like a totally ordered list.


With that caevat in mind, comparison on a partially ordered domain /is/
ill-defined; it can give inconsistent results from the "a < b or a > b
or a == b" rule.

Against people expectations is not the same as inconsistent.
Impure? Probably not. Useless from many perspective when "comparisons"
are needed, to the point where it's probably safer to throw an exception
than define results.


Still accident. < wouldn't be used for sets if we had a subset symbol
on the standard keyboard, APL fans notwithstanding.

That is only because people usualy work with numbers and sets at the
same time and using different symbols helps making the distinction.

But using <= for subset is mathematical completely correct. If you
want to go purely formal, numbers are just a particular family of
sets where the less than or equal relation of numbers coincides with
the subset relation of sets in that family.

People use the multiplication symbol for multiplying matrices although
matrix multiplication is something different from number multiplication.

If the multiplications of matrices is enough like the multiplication of
numbers to use the same symbol, I see nothing wrong in using the same
symbol for the (strict) subset relation as is used for the less than
(or equal) relation on numbers.
Simce programmers
familiar with Python understand that < on sets isn't a "real" comparison
(i.e. provide a total order), they don't expect unique, consistent
results from something like sort (or a tree, for that matter).

Yes, it does. Consider "in" as a mathematical operator:

My appologies, I though you were going in the "I can in my coat,
my coat can in the box" kind of direction.
For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and
transitive (extend to mouse-in-cat if you really care about transitive),
so "in" is a partial order sans equality. A useless one, but a partial
order nonetheless.

I don't know if it is useless. This kind of relationship seems very
common in political geography.

This village is in this county, this county is in this state, this state
is in this nation.

Also magizines that test certain equipment to give advice on the best
buy, usualy work with methods that result in partial orderings.
Notice that L1 and L2 contain the same elements, just in different orders.
Sorting those two lists should give the same order, correct?


No. Since we don't have a total ordering, sorting doesn't has to give
the same result. For as far as sorting is defined on this kind of
object, the only thing I would expect is that after a sort the
following condition holds.

for all i,j if i < j then not L > L[j]


Highly formal, aren't you?


I like the precision it provides.
Again, common programming allows for "not a > b"
== "a <= b", so your condition becomes the more familiar:

for all i,j in len(list), if i < j then L <= L[j]


So, should we limit our programming to the common cases
or should a language also provide help for the uncommon
case.
This condition is also assumed implicitly by many other assumed "rules"
of sorted lists, namely their uniqueness:

"If L is a list of unique keys, then sort(L) is a unique list of keys in
sorted order."

Yes, this assumes that L has a total order. Big whoop -- this is a
matter of programming practiciality rather than mathematical purity.

I don't find it so practical if the language limits the programmer
to the common cases. I can understand that implementation issues
impose limits on what can pratically realised and often enough that
is the more common practice. So in this case it is about programming
practiciality. But you seem to think it is programming practiciality
if you limit yourself to the common case from the beginning, even
if implementing for a more general case wouldn't be much harder
then implementing only the common case.
 
A

Antoon Pardon

Op 2005-10-25 said:
What do you mean "in" is a useless ordering? It makes a huge difference
whether "nuclear bomb in New York" is true or not.

In fact, I'm quite surprised that Antoon should object to "in" as "this
doesn't define a mathematical ordering, the subset relationship does" when
"subset" is just "in" for sets: set S is a subset of set T if for all
elements x in S, x is also in T. Informally, subset S is in set T.

I was misunderstaning where Christopher was heading to.
Can somebody remind me, what is the problem Antoon is trying to solve here?

Well there are two issues. One about correct behaviour and one about
practicallity.

The first problem is cmp. This is what the documentation states:

cmp(x, y)
Compare the two objects x and y and return an integer according to
the outcome. The return value is negative if x < y, zero if x == y
and strictly positive if x > y.

The problem is, that if someone implements a partial ordered class,
with the rich comparisons, cmp doesn't behave correct. It can't
behave correct because it will always give an number as a result
and such a number implies one of three relations between two
elements, but with partial ordered classes, it is possible none
of these relations exist between those two elements. So IMO cmp
should be changed to reflect this. This can be done either by cmp
returning None or UnequalValues. or by cmp raising UnequalValues
in such a case.

The second part is one of practicallity. Once you have accepted cmp,
should reflect this possibility, it makes sense that the __cmp__
method should get the same possibilities, so that where it is more
pratical to do so, the programmer can implement his partial order
through one __cmp__ method instead of having to write six.


In my opinion such a change would break no existing code. This
is because there are two possibilities.

1) The user is using cmp on objects that form a total order.
In such circumstances the behaviour of cmp doesn't change.

2) The user is using cmp on objects that don't form a total
order. In such circumstances cmp doesn't give consistent
results, so the code is already broken.
 
R

Ron Adam

Antoon said:
Op 2005-10-25, Steven D'Aprano schreef <[email protected]>:


Well there are two issues. One about correct behaviour and one about
practicallity.

The first problem is cmp. This is what the documentation states:

cmp(x, y)
Compare the two objects x and y and return an integer according to
the outcome. The return value is negative if x < y, zero if x == y
and strictly positive if x > y.

The problem is, that if someone implements a partial ordered class,
with the rich comparisons, cmp doesn't behave correct. It can't
behave correct because it will always give an number as a result
and such a number implies one of three relations between two
elements, but with partial ordered classes, it is possible none
of these relations exist between those two elements. So IMO cmp
should be changed to reflect this. This can be done either by cmp
returning None or UnequalValues. or by cmp raising UnequalValues
in such a case.

Instead of changing cmp, why not just have several common versions to
choose from instead of trying to make one tool do it all?
The second part is one of practicallity. Once you have accepted cmp,
should reflect this possibility, it makes sense that the __cmp__
method should get the same possibilities, so that where it is more
pratical to do so, the programmer can implement his partial order
through one __cmp__ method instead of having to write six.


In my opinion such a change would break no existing code. This
is because there are two possibilities.
>
1) The user is using cmp on objects that form a total order.
In such circumstances the behaviour of cmp doesn't change.

2) The user is using cmp on objects that don't form a total
order. In such circumstances cmp doesn't give consistent
results, so the code is already broken.


Adding complexity to cmp may not break code, but it could probably slow
down sorting in general. So I would think what ever improvements or
alternatives needs to be careful not to slow down existing sorting cases.

Cheers,
Ron
 
C

Christopher Subich

Antoon said:
I don't know. There are two possibilities.

1) The user is working with a total order. In that case the None
or IncomparableValues won't be returned anyway so nothing about
his code has to change.

2) The user is working with a partial order. In that case cmp
doesn't provide consistent results so the use of cmp in this
case was a bug anyway.

Case 3) The user is working with an unknown class, using duck typing,
and expects a total order. If cmp suddenly returned Incomparable or
None, the code would break in Unexpected Ways, with Undefined Behavior.

This is a classic "exception versus error code" argument; in this case,
returning None would be the error flag. It's almost always better to
just throw the exception, because then this allows error-checking at a
more natural level.
But cmp already behaves this way. The only difference would be that
the decision would be made based on the values of the objects instead
of only their class.




Well AFAIAC this isn't specific about library code.

A change that cmp return a 4th possible "value" (None or Incomparable)
is a fundamental language change. Because this would break large
amounts of existing code, it's a bad idea.

A change that cmp throw an exception iff the two objects, rather than
the two classes, were incomparable (allowing comparisons of( 1+0j and
2+0j) and ({a} and {a,b}) but not (1+1j and 2+0j) and ({a} and {b})) is
a stylistic guideline, since it's already possible to write your own
classes this way. The only place this change would matter is in the
standard library code, and in just a few places at that.
 
S

Steven Bethard

Antoon said:
Christopher Subich schreef :


I wont argue with you here, but it is what python gives as now.
Changing this behaviour is not going to happen.

FWIW, Guido has said a few times that in Python 3.0 we should "Raise an
exception when making comparisons (other than equality and inequality)
between two incongruent types."[1] But yes, the behavior won't change
in the Python 2.X series due to backwards compatibility concerns.

STeVe

[1] http://wiki.python.org/moin/Python3.0
 
A

Antoon Pardon

Op 2005-10-26 said:
Instead of changing cmp, why not just have several common versions to
choose from instead of trying to make one tool do it all?

That would be an option. Lets call such an alternative comp.
Would that mean also have a __comp__ method for customization.
Adding complexity to cmp may not break code, but it could probably slow
down sorting in general.

The evidence suggests that cmp is not used in sorting. If you have a
list of sets, sort will happily try to sort it, while calling cmp
with a set as an argument throws an exception.

I have a feeling that adding comp (and its brother __comp__) would
have a detrimental effect on sorting, because using '<' would then
mean one extra method to look for that could be used in determining
if a < b or not.
So I would think what ever improvements or
alternatives needs to be careful not to slow down existing sorting cases.

I have done some tests and have come to the conclusion that other
factors will have a greater influence on sorting times. I have written
the following program to estimate effects:

from random import shuffle

from time import time

class UnequalValues(Exception):
pass

class cla:
def __init__(self, i):
self.value = int(i)

def __cmp__(self, other):
return self.value - other.value

class clb:
def __init__(self, i):
self.value = int(i)

def __lt__(self, other):
return self.value < other.value

class clc:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
try:
return self.__comp__(other) < 0
except UnequalValues:
return False

def test(lng, rep):

for cl in cla, clb, clc:
total = 0.0
for _ in xrange(rep):
lst = [cl(i) for i in xrange(lng)]
shuffle(lst)
start = time()
lst.sort()
stop = time()
total += stop - start
print "%s: %d repeats, %d long, %9.6f secs" % (cl, rep, lng, total)

test(1000,1000)


This is the result I get:

__main__.cla: 1000 repeats, 1000 long, 31.986618 secs
__main__.clb: 1000 repeats, 1000 long, 8.963896 secs
__main__.clc: 1000 repeats, 1000 long, 16.893321 secs


I find it very odd that clc sorts about twice as fast as cla.

That means that every class that has a __cmp__ method can be
speeded up with sorting by writing a similar __lt__ method
as in clc. I do wonder what is causing this.
 
A

Antoon Pardon

Op 2005-10-26 said:
Case 3) The user is working with an unknown class, using duck typing,
and expects a total order. If cmp suddenly returned Incomparable or
None, the code would break in Unexpected Ways, with Undefined Behavior.

This is case 2. Under the current circumstances working with cmp with
a partial order will give inconsistent behaviour. So your code is
already broken for relying on cmp to give consistent results in
circumstances it can't.
This is a classic "exception versus error code" argument; in this case,
returning None would be the error flag. It's almost always better to
just throw the exception, because then this allows error-checking at a
more natural level.

If you prefer a raised exception, I could live with that.
A change that cmp return a 4th possible "value" (None or Incomparable)
is a fundamental language change. Because this would break large
amounts of existing code, it's a bad idea.

I have always included the option that cmp would raise an exception
instead of returning a fourth value.
 
B

Bengt Richter

On 27 Oct 2005 08:12:15 GMT said:
The evidence suggests that cmp is not used in sorting. If you have a
list of sets, sort will happily try to sort it, while calling cmp
with a set as an argument throws an exception.
A data point re evidence:
... def __getattr__(self, attr): print attr; raise AttributeError
...
__lt__
__gt__
__gt__
__lt__
__coerce__
__coerce__
__cmp__
__cmp__
[__repr__
<__main__.C instance at 0x02EF388C>, __repr__
<__main__.C instance at 0x02EF38CC>]

I think it will be slightly different if you define those methods
in a new-style class -- oh, heck, why not do it:
... def __lt__(*ignore): print '__lt__'; return NotImplemented
... def __gt__(*ignore): print '__gt__'; return NotImplemented
... def __coerce__(*ignore): print '__coerce__'; return NotImplemented
... def __cmp__(*ignore): print '__cmp__'; return NotImplemented
... __lt__
__gt__
__cmp__
__cmp__

(I haven't followed the thread much, so please excuse if irrelevant ;-)

Regards,
Bengt Richter
 
A

Antoon Pardon

Op 2005-10-26 said:
Adding complexity to cmp may not break code, but it could probably slow
down sorting in general. So I would think what ever improvements or
alternatives needs to be careful not to slow down existing sorting cases.

As a result of Bengt's post, I rewrote my test program, this is the
program now.

from random import shuffle

from time import time

class UnequalValues(Exception):
pass

__metaclass__ = type

class cla:
def __init__(self, i):
self.value = int(i)

def __cmp__(self, other):
return self.value - other.value

class clb:
def __init__(self, i):
self.value = int(i)

def __lt__(self, other):
return self.value < other.value

class clc:
def __init__(self, i):
self.value = int(i)

def __gt__(self, other):
return self.value > other.value

class cld:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
return self.__comp__(other) < 0

class cle:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
try:
return self.__comp__(other) < 0
except UnequalValues:
return False

def test(lng, rep):

for cl in cla, clb, clc, cld, cle:
total = 0.0
for _ in xrange(rep):
lst = [cl(i) for i in xrange(lng)]
shuffle(lst)
start = time()
lst.sort()
stop = time()
total += stop - start
for i in xrange(1,rep):
assert lst[i - 1] < lst
print "%s: %d repeats, %d long, %9.6f secs" % (cl, rep, lng, total)

test(1000,1000)

---

These are the results.

<class '__main__.cla'>: 1000 repeats, 1000 long, 10.061425 secs
<class '__main__.clb'>: 1000 repeats, 1000 long, 9.544035 secs
<class '__main__.clc'>: 1000 repeats, 1000 long, 10.450864 secs
<class '__main__.cld'>: 1000 repeats, 1000 long, 15.566061 secs
<class '__main__.cle'>: 1000 repeats, 1000 long, 15.776443 secs

Results on a longer sequence were:

<class '__main__.cla'>: 1000 repeats, 10000 long, 146.722443 secs
<class '__main__.clb'>: 1000 repeats, 10000 long, 139.480863 secs
<class '__main__.clc'>: 1000 repeats, 10000 long, 152.623424 secs
<class '__main__.cld'>: 1000 repeats, 10000 long, 224.630926 secs
<class '__main__.cle'>: 1000 repeats, 10000 long, 228.663825 secs

The most interesting result of this test is the difference between
cld and cle. IMO this difference is an indication of the cost
that my idea would have on sorting, should it be implemented.
That would be around 2%. I think that is bearable. Especially
since this were very simple routines. The moment the comparison
calculation become heavier, the relative contribution of the
try: except will diminuish.

If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.
 
A

Antoon Pardon

Op 2005-10-28 said:
These are the results.

<class '__main__.cla'>: 1000 repeats, 1000 long, 10.061425 secs
<class '__main__.clb'>: 1000 repeats, 1000 long, 9.544035 secs
<class '__main__.clc'>: 1000 repeats, 1000 long, 10.450864 secs
<class '__main__.cld'>: 1000 repeats, 1000 long, 15.566061 secs
<class '__main__.cle'>: 1000 repeats, 1000 long, 15.776443 secs

Results on a longer sequence were:

<class '__main__.cla'>: 1000 repeats, 10000 long, 146.722443 secs
<class '__main__.clb'>: 1000 repeats, 10000 long, 139.480863 secs
<class '__main__.clc'>: 1000 repeats, 10000 long, 152.623424 secs
<class '__main__.cld'>: 1000 repeats, 10000 long, 224.630926 secs
<class '__main__.cle'>: 1000 repeats, 10000 long, 228.663825 secs

The most interesting result of this test is the difference between
cld and cle. IMO this difference is an indication of the cost
that my idea would have on sorting, should it be implemented.
That would be around 2%. I think that is bearable. Especially
since this were very simple routines. The moment the comparison
calculation become heavier, the relative contribution of the
try: except will diminuish.

If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.

And as an afterthought: Adding __slots__ increased the
sorting speed between 7.5 and 9.0%
 
R

Ron Adam

Antoon said:
As a result of Bengt's post, I rewrote my test program, this is the
program now.
....

These are the results.

<class '__main__.cla'>: 1000 repeats, 1000 long, 10.061425 secs
<class '__main__.clb'>: 1000 repeats, 1000 long, 9.544035 secs
<class '__main__.clc'>: 1000 repeats, 1000 long, 10.450864 secs
<class '__main__.cld'>: 1000 repeats, 1000 long, 15.566061 secs
<class '__main__.cle'>: 1000 repeats, 1000 long, 15.776443 secs

Results on a longer sequence were:

<class '__main__.cla'>: 1000 repeats, 10000 long, 146.722443 secs
<class '__main__.clb'>: 1000 repeats, 10000 long, 139.480863 secs
<class '__main__.clc'>: 1000 repeats, 10000 long, 152.623424 secs
<class '__main__.cld'>: 1000 repeats, 10000 long, 224.630926 secs
<class '__main__.cle'>: 1000 repeats, 10000 long, 228.663825 secs

The most interesting result of this test is the difference between
cld and cle. IMO this difference is an indication of the cost
that my idea would have on sorting, should it be implemented.
That would be around 2%. I think that is bearable. Especially
since this were very simple routines. The moment the comparison
calculation become heavier, the relative contribution of the
try: except will diminuish.


class cle:
def __init__(self, i):
self.value = int(i)

def __comp__(self, other):
return self.value - other.value

def __lt__(self, other):
try:
return self.__comp__(other) < 0
except UnequalValues:
return False


This would only work with numeric types. I believe that cmp is closer to
the following.

return ( self.value < other.value and -1
or self.value > self.value and 1
or 0 )

I don't think it effects the speed a great deal however.

If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.

I haven't heard he was removing __cmp__, but I would think the sort or
sorted functions would just use the available comparisons methods or
equivalent C code for base types. So I expect it would only matter if
you need a custom or modified sort.

Although It is a thought that these cases could be improved by making
the sort value available to the underlying C sort function. Something
on the order of:

__sortvalue__ == self.value.

Cheers,
Ron
 
C

Christopher Subich

Antoon said:
If you are concerned about sorting times, I think you should
be more concerned about Guido's idea of doing away with __cmp__.
Sure __lt__ is faster. But in a number of cases writing __cmp__
is of the same complexity as writing __lt__. So if you then
need a __lt__, __le__, __eq__, __ne__, __gt__ and __ge__ it
would be a lot easier to write a __cmp__ and have all rich
comparisons methods call this instead of duplicating the code
about six times. So you would be more or less forced to write
your class as class cld or cle. This would have a bigger
impact on sorting times than my suggestion.

Honestly, I don't really mind the idea of __cmp__ going away; for
classes that behave Normally with respect to a single __cmp__ value,
it's easily possible to write a CompareMixin that defines __lt__,
__gt__, etc. for suitable __cmp__ values.

Much like DictMixin is part of the standard library.
 
A

Antoon Pardon

Op 2005-10-28 said:
I haven't heard he was removing __cmp__,

I read somewhere he was considering it.
but I would think the sort or
sorted functions would just use the available comparisons methods or
equivalent C code for base types. So I expect it would only matter if
you need a custom or modified sort.

Although It is a thought that these cases could be improved by making
the sort value available to the underlying C sort function. Something
on the order of:

__sortvalue__ == self.value.

I doubt that will be possible in general. You can't even calculate such
a __sortvalue__ for lists.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,269
Messages
2,571,338
Members
48,029
Latest member
Anchorman2022

Latest Threads

Top