Guido rethinking removal of cmp from sort method

S

Steven D'Aprano

The removal of cmp from the sort method of lists is probably the most
disliked change in Python 3. On the python-dev mailing list at the
moment, Guido is considering whether or not it was a mistake.

If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.
 
D

Dave Abrahams

Steven D'Aprano said:
If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.

I think it's probably provable that there are no cases in the first category,
provided you're willing to do something sufficiently contorted. However,
it also seems self-evident to me that many programmers will rightly chafe
at the idea of creating and tearing down a bunch of objects just to
compare things for sorting. Think of the heap churn! Even if it turns out
that Python 3 contains some magic implementation detail that makes it
efficient most of the time, it goes against a natural understanding of the
computation model

2p for y'all.
-Dave
 
S

Stefan Behnel

Steven D'Aprano, 13.03.2011 13:59:
The removal of cmp from the sort method of lists is probably the most
disliked change in Python 3. On the python-dev mailing list at the
moment, Guido is considering whether or not it was a mistake.

If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.

As Raymond Hettinger and Daniel Stutzbach pointed out in that thread, the
memory overhead is much lower in Python 3.2 and also depends on the usage.
If memory is a problem, it can still be traded for time by sorting in
multiple (stable) passes with smaller keys. It rarely is a problem in
practice, though, and Guido's initial post seems to suggest that as well.

Stefan
 
J

Jean-Michel Pichavant

Steven said:
The removal of cmp from the sort method of lists is probably the most
disliked change in Python 3. On the python-dev mailing list at the
moment, Guido is considering whether or not it was a mistake.

If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.
You seem concerned by this removal, do you have any use-case ?

JM
 
S

Steven D'Aprano

You seem concerned by this removal, do you have any use-case ?

You seem concerned by my concern. Why do you think I am concerned?

(1) I'm not concerned, but many people are. If you search the archives of
this newsgroup (mailing list), you'll see that I have defended the
removal of cmp from sort, e.g. this post:

http://www.mail-archive.com/[email protected]/msg261728.html


(2) If I had a good use-case for keeping cmp, I wouldn't need to ask
others if they had a good use-case.

As it is, Guido himself has mentioned one such good use for a comparison
function when sorting. Use of a key function trades off memory for time,
while sorting with a comparison function makes the opposite trade off,
using more time for the sake of saving memory.
 
J

Jean-Michel Pichavant

Steven said:
You seem concerned by my concern. Why do you think I am concerned?

(1) I'm not concerned, but many people are. If you search the archives of
this newsgroup (mailing list), you'll see that I have defended the
removal of cmp from sort, e.g. this post:

http://www.mail-archive.com/[email protected]/msg261728.html


(2) If I had a good use-case for keeping cmp, I wouldn't need to ask
others if they had a good use-case.

As it is, Guido himself has mentioned one such good use for a comparison
function when sorting. Use of a key function trades off memory for time,
while sorting with a comparison function makes the opposite trade off,
using more time for the sake of saving memory.
Just to know if currently the use-case count is zero. Nothing evil
hidden behind my question :p

JM
 
P

Paul Rubin

Steven D'Aprano said:
If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.

We've had this discussion a couple times before. I remember an example
involving comparing tree structures, that I thought couldn't be done
with key= in a reasonable way, and that this was a reasonable real-world
example. Raymond H then gave a way of encoding it with key= which
looked ok at the time, but I decided sometime later that I think we both
missed an error in that encoding, so I've been wanting to dig it out
again and look more closely.

key= can of course burn a lot more memory because of the DSU pattern
(say you are sorting file handles according to the contents of the file,
so with key= you might have to read N multi-megabyte files where with
cmp you just pairwise-compare them until they differ, which is probably
in the first few bytes).

Finally I concocted an "infinite" example which we agreed is artificial:
you are given a list of generators denoting real numbers, for example
pi generates the infinite sequence 3,1,4,1,5,9... while e generates
2,7,1,8,... You can sort these with cmp but not with key.
 
N

Nobody

Finally I concocted an "infinite" example which we agreed is artificial:
you are given a list of generators denoting real numbers, for example
pi generates the infinite sequence 3,1,4,1,5,9... while e generates
2,7,1,8,... You can sort these with cmp but not with key.

Is there some restriction on the return type of the key= function? If not,
you can define a class with appropriate comparison methods and have key=
return instances of that class.
 
P

Paul Rubin

Nobody said:
Is there some restriction on the return type of the key= function? If not,
you can define a class with appropriate comparison methods and have key=
return instances of that class.

Sorry, yeah, of course you can do it that way, but it's bloody ugly and
you're no longer gaining the supposed benefit of the DSU pattern.
 
I

Ian Kelly

Finally I concocted an "infinite" example which we agreed is artificial:
you are given a list of generators denoting real numbers, for example
pi generates the infinite sequence 3,1,4,1,5,9... while e generates
2,7,1,8,...  You can sort these with cmp but not with key.

I would think that you can sort them with key as long as none of the
sequences are equal (which would result in an infinite loop using
either method). Why not this?

_MISSING = object()

@functools.total_ordering
class IteratorKey(object):

def __init__(self, iterable):
self._iterator = iter(iterable)
self._cache = []

def __iter__(self):
# This is not reentrant, but it's
# good enough for this purpose.
for x in self._cache:
yield x
for x in self._iterator:
self._cache.append(x)
yield x

def __lt__(self, other):
for x, y in itertools.izip_longest(self, other, fillvalue=_MISSING):
if x is _MISSING or y is _MISSING:
return x is _MISSING
elif x < y or y < x:
return x < y
return False

def __eq__(self, other):
for x, y in itertools.izip_longest(self, other, fillvalue=_MISSING):
if x is _MISSING or y is _MISSING or x != y:
return False
return True
 
P

Paul Rubin

Ian Kelly said:
I would think that you can sort them with key as long as none of the
sequences are equal (which would result in an infinite loop using
either method). Why not this?

Yes you can do something like that, but look how ugly it is compared
with using cmp.
 
S

Stefan Behnel

Paul Rubin, 15.03.2011 09:00:
Yes you can do something like that, but look how ugly it is compared
with using cmp.

I would argue that it's a rare use case, though. In most cases, memory
doesn't really matter, and the overhead of a decorated sort is acceptable.
In some cases, where space matters, an external sort is a better choice or
even the only thing that will work, so all that's left are the cases where
speed really needs to be traded for space *and* an external sort is not
applicable, and those where things can be expressed more easily using cmp
than using a key. For the latter, there's support in the standard library,
as Steven pointed out. Now, the question that remains is: are the few
remaining cases worth being supported directly, thus complicating the
internal source code of the Python runtime?

Stefan
 
P

Paul Rubin

Stefan Behnel said:
the question that remains is: are the few remaining cases worth being
supported directly, thus complicating the internal source code of the
Python runtime?

The other cases were supported perfectly well already, and the support
was removed. That's just ludicrous in my opinion.
 
A

Antoon Pardon

The removal of cmp from the sort method of lists is probably the most
disliked change in Python 3. On the python-dev mailing list at the
moment, Guido is considering whether or not it was a mistake.

If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.

How about a list of tuples where you want them sorted first item in ascending
order en second item in descending order.
 
S

Stefan Behnel

Antoon Pardon, 23.03.2011 14:53:
How about a list of tuples where you want them sorted first item in ascending
order en second item in descending order.

You can use a stable sort in two steps for that.

Stefan
 
A

Antoon Pardon

Antoon Pardon, 23.03.2011 14:53:

You can use a stable sort in two steps for that.

Which isn't helpfull if where you decide how they have to be sorted is
not the place where they are actually sorted.

I have a class that is a priority queue. Elements are added at random but
are removed highest priority first. The priority queue can have a key or
a cmp function for deciding which item is the highest priority. It
can also take a list as an initializor, which will then be sorted.

So this list is sorted within the class but how it is sorted is decided
outside the class. So I can't do the sort in multiple steps.
 
I

Ian Kelly

Which isn't helpfull if where you decide how they have to be sorted is
not the place where they are actually sorted.

I have a class that is a priority queue. Elements are added at random but
are removed highest priority first. The priority queue can have a key or
a cmp function for deciding which item is the highest priority. It
can also take a list as an initializor, which will then be sorted.

So this list is sorted within the class but how it is sorted is decided
outside the class. So I can't do the sort in multiple steps.

You can't do this?

for (key, reversed) in self.get_multiple_sort_keys():
self.pq.sort(key=key, reversed=reversed)
 
S

Stefan Behnel

Antoon Pardon, 23.03.2011 16:14:
Which isn't helpfull if where you decide how they have to be sorted is
not the place where they are actually sorted.

I have a class that is a priority queue. Elements are added at random but
are removed highest priority first. The priority queue can have a key or
a cmp function for deciding which item is the highest priority. It
can also take a list as an initializor, which will then be sorted.

So this list is sorted within the class but how it is sorted is decided
outside the class. So I can't do the sort in multiple steps.

That sounds more like a use case for heap sort than for Python's builtin
list sort. See the heapq module.

Stefan
 
C

Carl Banks

Antoon Pardon, 23.03.2011 14:53:



You can use a stable sort in two steps for that.

How about this one: you have are given an obscure string collating
function implented in a C library you don't have the source to.

Or how about this: I'm sitting at an interactive session and I have a
convenient cmp function but no convenient key, and I care more about
the four minutes it'd take to whip up a clever key function or an
adapter class than the 0.2 seconds I'd save to on sorting time.

Removing cmp from sort was a mistake; it's the most straightforward
and natural way to sort in many cases. Reason enough for me to keep
it.


Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top