Guido rethinking removal of cmp from sort method

S

Stefan Behnel

Carl Banks, 23.03.2011 18:23:
How about this one: you have are given an obscure string collating
function implented in a C library you don't have the source to.

Or how about this: I'm sitting at an interactive session and I have a
convenient cmp function but no convenient key, and I care more about
the four minutes it'd take to whip up a clever key function or an
adapter class than the 0.2 seconds I'd save to on sorting time.

As usual with Python, it's just an import away:

http://docs.python.org/library/functools.html#functools.cmp_to_key

I think this is a rare enough use case to merit an import rather than being
a language feature.

Stefan
 
C

Carl Banks

Carl Banks, 23.03.2011 18:23:








As usual with Python, it's just an import away:

http://docs.python.org/library/functools.html#functools.cmp_to_key

I think this is a rare enough use case to merit an import rather than being
a language feature.

The original question posted here was, "Is there a use case for cmp?"
There is, and your excuse-making doesn't change the fact. It's the
most natural way to sort sometimes; that's a use case. We already
knew it could be worked around.

It's kind of ridiculous to claim that cmp adds much complexity (it's
maybe ten lines of extra C code), so the only reason not to include it
is that it's much slower than using key. Not including it for that
reason would be akin to the special-casing of sum to prevent strings
from being concatenated, although omitting cmp would not be as drastic
since it's not a special case.

Do we omit something that's useful but potentially slow? I say no.


Carl Banks
 
P

Paul Rubin

Carl Banks said:
It's kind of ridiculous to claim that cmp adds much complexity (it's
maybe ten lines of extra C code), so the only reason not to include it
is that it's much slower than using key.

Well, I thought it was also to get rid of 3-way cmp in general, in favor
of rich comparison.
 
C

Carl Banks

Well, I thought it was also to get rid of 3-way cmp in general, in favor
of rich comparison.

Supporting both __cmp__ and rich comparison methods of a class does
add a lot of complexity. The cmp argument of sort doesn't.

The cmp argument doesn't depend in any way on an object's __cmp__
method, so getting rid of __cmp__ wasn't any good readon to also get
rid of the cmp argument; their only relationship is that they're
spelled the same. Nor is there any reason why cmp being a useful
argument of sort should indicate that __cmp__ should be retained in
classes.


Carl Banks
 
D

Dennis Lee Bieber

Antoon Pardon, 23.03.2011 16:14:

That sounds more like a use case for heap sort than for Python's builtin
list sort. See the heapq module.

The first half of the problem description -- "Elements are added at
random" seems more suited to an in-place insertion sort method. As such,
I probably wouldn't inflict Python's built-in sort on the initialization
list -- I'd have the __init__ method take the finish with something
like:

for entry in initialList:
self.addEntry(entry)

and let addEntry worry about where that one belongs -- the same process
that would be used when an ad-hoc entry is made later.
 
P

Paul Rubin

Dennis Lee Bieber said:
The first half of the problem description -- "Elements are added at
random" seems more suited to an in-place insertion sort method.

This is precisely what a priority queue is for. Insertions take
O(log n) time and there's very little space overhead in heapq's
list-based implementation.
 
A

Antoon Pardon

You can't do this?

for (key, reversed) in self.get_multiple_sort_keys():
self.pq.sort(key=key, reversed=reversed)

Sure I can do that. I can do lots of things like writing a CMP class
that I will use as a key and where I can implement the logic for
comparing the original objects, which I otherwise would have put in a
cmp function. I thought this wasn't about how one can get by without
the cmp argument. This was about cases where the cmp argument was the
more beautiful or natural way to handle this case.

I think I have provided such a case. If you don't agree then
don't just give examples of what I can do, argue how your solution
would be a less ugly or more natural way to handle this.
 
A

Antoon Pardon

That sounds more like a use case for heap sort than for Python's
builtin list sort. See the heapq module.

No the heapq module is not usefull. The heapq functions don't have a
cmp, or a key argument. So you can't use it with priorities that differ
from the normal order of the items.

For the rest is solving this particular problem beside the point. It
was just an illustration of how, sorting a list can be done in a place
differently from where one decides the order-relation of the items in
the list. That fact makes it less obvious to use multiple steps in the
place where you actually sort.
 
M

Mel

Carl said:
Supporting both __cmp__ and rich comparison methods of a class does
add a lot of complexity. The cmp argument of sort doesn't.

The cmp argument doesn't depend in any way on an object's __cmp__
method, so getting rid of __cmp__ wasn't any good readon to also get
rid of the cmp argument; their only relationship is that they're
spelled the same. Nor is there any reason why cmp being a useful
argument of sort should indicate that __cmp__ should be retained in
classes.

I would have thought that the upper limit of cost of supporting cmp= and
key= would be two different internal front-ends to the internal
internal sort.

Mel.
 
I

Ian Kelly

Sure I can do that. I can do lots of things like writing a CMP class
that I will use as a key and where I can implement the logic for
comparing the original objects, which I otherwise would have put in a
cmp function. I thought this wasn't about how one can get by without
the cmp argument. This was about cases where the cmp argument was the
more beautiful or natural way to handle this case.

That's not what you wrote before. You wrote "I can't do the sort in
multiple steps." I was just responding to what you wrote.
I think I have provided such a case. If you don't agree then
don't just give examples of what I can do, argue how your solution
would be a less ugly or more natural way to handle this.

Well, the alternative is to generate the cmp function from the
externally selected keys dynamically at runtime, is it not? Something
like this:

def get_cmp(keys):
def cmp(a, b):
for (key, reversed) in keys:
if reversed:
result = cmp(key(b), key(a))
else:
result = cmp(key(a), key(b))
if result != 0:
return result
return result
return cmp

I fail to see how that is any more natural than my solution, unless
you have another way of doing it. And it's certainly not going to be
any faster.
 
A

Antoon Pardon

That's not what you wrote before. You wrote "I can't do the sort in
multiple steps." I was just responding to what you wrote.

That is because I tend to assume some intelligence with those I
communicate with, so that I don't need to be pedantly clear and
spell everything out.

However since that seems to be a problem for you I will be more
detailed. The original poster didn't ask for cases in which cmp
was necessary, he asked for cases in which not using cmp was
cumbersome. I gave a case where not using cmp was cumbersome.
a tuple where you want it sorted with first item in descending
order and second item ascending.

You then responded, that you could solve that by sorting in multiple
steps. The problem with this response is that how items are to be
compared can be decided in a different module from where they are
actually sorted. So if we would accept this MO, this would mean
that any module that needed to sort something according to a provided
key, would need to provide for the possibility that the sorting would
have to be done in multiple steps. So, in order to avoid complexity
in the internal python sort, we would increase the complexity of
all library code that would need to sort things in a by the user
provided way and didn't want to barf in such an event.

So when I wrote I couldn't do that, I implicitly meant if you
want a less cumbersome solution than using cmp, because your
solution would make all library code more cumbersome.
Well, the alternative is to generate the cmp function from the
externally selected keys dynamically at runtime, is it not? Something
like this:

def get_cmp(keys):
def cmp(a, b):
for (key, reversed) in keys:
if reversed:
result = cmp(key(b), key(a))
else:
result = cmp(key(a), key(b))
if result != 0:
return result
return result
return cmp

I fail to see how that is any more natural than my solution, unless
you have another way of doing it. And it's certainly not going to be
any faster.

First of all, there is no need for a dynamical generated cmp function in
my provided case. My cmp fumction would simply look like this:

def cmp(tp1, tp2):
return cmp(tp2[0], tp1[0]) or cmp(tp1[1], tp2[1])

Second, there is not only the question of what is more natural, but
there is also the question of how you are spreading the complexity.
Your first solution would mean spreading the complexity to all library
code where the user could provide the order-relation of the items that
would then be sorted within the library. This alternative of your, even
if too complicated than needed would mean that libary code could just
use the sort method and the complicated code is limited to those
specific items that need it.
 
I

Ian Kelly

That is because I tend to assume some intelligence with those I
communicate with, so that I don't need to be pedantly clear and
spell everything out.

However since that seems to be a problem for you I will be more
detailed. The original poster didn't ask for cases in which cmp
was necessary, he asked for cases in which not using cmp was
cumbersome.

False. He asked for either sort of case:
If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.

You then responded, that you could solve that by sorting in multiple
steps.

No, I did not.
The problem with this response is that how items are to be
compared can be decided in a different module from where they are
actually sorted. So if we would accept this MO, this would mean
that any module that needed to sort something according to a provided
key, would need to provide for the possibility that the sorting would
have to be done in multiple steps. So, in order to avoid complexity
in the internal python sort, we would increase the complexity of
all library code that would need to sort things in a by the user
provided way and didn't want to barf in such an event.

That is a perfectly reasonable argument that you apparently were too
lazy to even mention before. It is completely different from and not
at all implied by the patently absurd statement "So this list is
sorted within the class but how it is sorted is decided outside the
class. So I can't do the sort in multiple steps."

I would not say "The sky is green" when what I mean is "The color of
the sky is the result of the atmosphere scattering the white light
from the sun in the smaller wavelengths, including some green light."
Would you?
So when I wrote I couldn't do that, I implicitly meant if you
want a less cumbersome solution than using cmp, because your
solution would make all library code more cumbersome.
Well, the alternative is to generate the cmp function from the
externally selected keys dynamically at runtime, is it not?  Something
like this:

def get_cmp(keys):
    def cmp(a, b):
        for (key, reversed) in keys:
            if reversed:
                result = cmp(key(b), key(a))
            else:
                result = cmp(key(a), key(b))
            if result != 0:
                return result
        return result
    return cmp

I fail to see how that is any more natural than my solution, unless
you have another way of doing it.  And it's certainly not going to be
any faster.

First of all, there is no need for a dynamical generated cmp function in
my provided case. My cmp fumction would simply look like this:

def cmp(tp1, tp2):
 return cmp(tp2[0], tp1[0]) or cmp(tp1[1], tp2[1])

I misunderstood your use case, then. When you wrote that "how it is
sorted is decided outside of the class" in the context of sorting
based on multiple keys, I took that to imply that the external code
was deciding the sorting at runtime, e.g. that perhaps your priority
queue was being displayed in a GUI table that can be sorted on
multiple columns simultaneously. But evidently you expect that this
key-based sort order will always be hard-coded?
Second, there is not only the question of what is more natural, but
there is also the question of how you are spreading the complexity.
Your first solution would mean spreading the complexity to all library
code where the user could provide the order-relation of the items that
would then be sorted within the library. This alternative of your, even
if too complicated than needed would mean that libary code could just
use the sort method and the complicated code is limited to those
specific items that need it.

On the other hand, if at some point your library really does need to
accept a dynamically generated sort order, then you are pushing the
complexity onto the client code by forcing it to generate a cmp
function like the one I posted above. But perhaps that will never be
the case for your purpose, in which case I take your point.
 
S

Steven D'Aprano

However since that seems to be a problem for you I will be more
detailed. The original poster didn't ask for cases in which cmp was
necessary, he asked for cases in which not using cmp was cumbersome.

I'm the original poster, and that's not what I said. I said:

"If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up."

You'll notice that I said nothing about whether writing the code was easy
or cumbersome, and nothing about readability.

I
gave a case where not using cmp was cumbersome. a tuple where you want
it sorted with first item in descending order and second item ascending.

No, I'm sorry, calling sort twice is not cumbersome. In fact, if somebody
gave me code that sorted tuples in that way using a comparison function,
I would immediately rip it out and replace it with two calls to sort: not
only is it usually faster and more efficient, but it's easier to read,
easier to reason about, and easier to write.


from operator import itemgetter
data.sort(key=itemgetter(1))
data.sort(key=itemgetter(0), reverse=True)


A cmp function for this task may have been justified back in the Dark
Ages of Python 2.2, before Python's sort was guaranteed to be stable, but
there's no need for it now.

You then responded, that you could solve that by sorting in multiple
steps. The problem with this response is that how items are to be
compared can be decided in a different module from where they are
actually sorted.

So what? Instead of this:

# Module A decides how to sort, module B actually does the sort.
# In module A:
B.do_sorting(cmp=some_comparison_function)

it will become this:

# In module A:
func = functools.cmp_to_key(some_comparison_function)
B.do_sorting(key=func)

That's hardly cumbersome.

I'm looking for cases where using cmp_to_key doesn't work at all, or
performs badly, if such cases exist at all.


First of all, there is no need for a dynamical generated cmp function in
my provided case. My cmp fumction would simply look like this:

def cmp(tp1, tp2):
return cmp(tp2[0], tp1[0]) or cmp(tp1[1], tp2[1])

"Simply"?

Perhaps that works, I don't know -- but I do know that I can't tell
whether it works by just looking at it. That makes it complex enough that
it should have a test suite, to ensure it works the way you expect.

But regardless, nobody argues that a comparison function must always be
complicated. Or even that a comparison function is never less complicated
than a key function. That's not the point. The point is to find an
example where a comparison function will work where no key function is
either possible or practical.
 
M

Martin v. Loewis

The cmp argument doesn't depend in any way on an object's __cmp__
method, so getting rid of __cmp__ wasn't any good readon to also get
rid of the cmp argument

So what do you think about the cmp() builtin? Should have stayed,
or was it ok to remove it?

If it should have stayed: how should it's implementation have looked like?

If it was ok to remove it: how are people supposed to fill out the cmp=
argument in cases where they use the cmp() builtin in 2.x?

Regards,
Martin
 
M

MRAB

I'm the original poster, and that's not what I said. I said:

"If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up."

You'll notice that I said nothing about whether writing the code was easy
or cumbersome, and nothing about readability.



No, I'm sorry, calling sort twice is not cumbersome. In fact, if somebody
gave me code that sorted tuples in that way using a comparison function,
I would immediately rip it out and replace it with two calls to sort: not
only is it usually faster and more efficient, but it's easier to read,
easier to reason about, and easier to write.


from operator import itemgetter
data.sort(key=itemgetter(1))
data.sort(key=itemgetter(0), reverse=True)
If there are two, it isn't too bad, but if there are several (some
ascending, some descending), then it gets a little cumbersome.
A cmp function for this task may have been justified back in the Dark
Ages of Python 2.2, before Python's sort was guaranteed to be stable, but
there's no need for it now.
[snip]
It's one of those things I wouldn't have voted to remove, but now it's
gone, I'm unsure how much I'd like it back! I wouldn't object to it
returning...
 
C

Carl Banks

So what do you think about the cmp() builtin? Should have stayed,
or was it ok to remove it?

Since it's trivial to implement by hand, there's no point for it to be
a builtin. There wasn't any point before rich comparisons, either.
I'd vote not merely ok to remove, but probably a slight improvement.
It's probably the least justified builtin other than pow.

If it should have stayed: how should it's implementation have looked like?

Here is how cmp is documented: "The return value is negative if x < y,
zero if x == y and strictly positive if x > y."

So if it were returned as a built-in, the above documentation suggests
the following implementation:

def cmp(x,y):
if x < y: return -1
if x == y: return 0
if x > y: return 1
raise ValueError('arguments to cmp are not well-ordered')

(Another, maybe better, option would be to implement it so as to have
the same expectations as list.sort, which I believe only requires
__eq__ and __gt__.)

If it was ok to remove it: how are people supposed to fill out the cmp=
argument in cases where they use the cmp() builtin in 2.x?

Since it's trivial to implement, they can just write their own cmp
function, and as an added bonus they can work around any peculiarities
with an incomplete comparison set.


Carl Banks
 
S

Steven D'Aprano

It's probably the least justified builtin other than pow.

I don't know about that. Correctly, efficiently and *quickly*
implementing the three-argument version of pow is exactly the sort of
thing that should be in the built-ins, or at least the standard library.

[steve@sylar ~]$ python3 -m timeit -s "n=2" "(n**30000)%123456789"
1000 loops, best of 3: 633 usec per loop
[steve@sylar obfuscate]$ python3 -m timeit -s "n=2" "pow(n, 30000,
123456789)"
100000 loops, best of 3: 6.03 usec per loop

That's two orders of magnitude faster.

(You need the n=2 to defeat Python's compile-time constant folding,
otherwise timing measurements will be misleading.)
 
P

Paul Rubin

Steven D'Aprano said:
I don't know about that. Correctly, efficiently and *quickly*
implementing the three-argument version of pow is exactly the sort of
thing that should be in the built-ins, or at least the standard library.

There is also math.pow which works slightly differently from built-in
pow, sigh.
 
S

Stefan Behnel

Steven D'Aprano, 25.03.2011 06:46:
I don't know about that. Correctly, efficiently and *quickly*
implementing the three-argument version of pow is exactly the sort of
thing that should be in the built-ins, or at least the standard library.

I think that touches it already. We have a "math" module, so why is there a
*builtin* for doing special math? How much code is there really that only
uses pow() and does not import "math"?

Stefan
 
A

Antoon Pardon

I'm the original poster, and that's not what I said. I said:

"If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up."

You'll notice that I said nothing about whether writing the code was easy
or cumbersome, and nothing about readability.

Well fine. I should have realised the question was just a pretense and that
there really never was any intention to consider the reactions, because
the answer is already fixed. Of course a key function can always be written,
it may just need a specific class to implement the specific order. Likewise
there is no reason to expect the order-functions to preform worse when
implemented in a class, rather than in a function.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top