Guido rethinking removal of cmp from sort method

S

Stefan Behnel

Antoon Pardon, 25.03.2011 10:21:
Well fine. I should have realised the question was just a pretense and that
there really never was any intention to consider the reactions, because
the answer is already fixed.

I don't think it is. It's just not as simple as "see, I have an artificial
use case here, so you *must* put the feature back in the language".

Stefan
 
A

Antoon Pardon

Antoon Pardon, 25.03.2011 10:21:

I don't think it is. It's just not as simple as "see, I have an
artificial use case here, so you *must* put the feature back in the
language".

Look we are provided with the cmp_to_key function. If something doesn't work
with that function or performs realy bad with that function, it means either
the function has a bug in it or something the function relies on, has a bug
in it. In either case you just fix the bug.

My guess is that cmp_to_key is implemented similarly as my example below.
So we simply have a constant overhead before we call the user provided
cmp function on the items to be sorted. So if this would fail to sort the
items or started to perform really badly (more than can be expected
by the overhead illustrated below) it would indicate an internal python
problem, either in the implementation of cmp_to_key or elsewhere.

Well if python has an internal problem, my expectation is that they will
fix the problem, instead of providing a cmp argument to sort as a work around.
So the outcome was fixed from the beginning.


def cmp_to_key(func):

class cmpkey:
def __init__(self, arg):
self.data = arg
self.cmp = func

def __lt__(self, other):
return self.cmp(self.data, other.data) < 0

def __le__(self, other):
return self.cmp(self.data, other.data) <= 0

...

return cmpkey
 
W

Westley Martínez

Steven D'Aprano, 25.03.2011 06:46:

I think that touches it already. We have a "math" module, so why is there a
*builtin* for doing special math? How much code is there really that only
uses pow() and does not import "math"?

Stefan

pow() and math.pow() are different.
 
S

Stefan Behnel

Westley Martínez, 25.03.2011 14:39:
pow() and math.pow() are different.

I don't find that a good excuse for pow() being a builtin. Why not just
rename it to "math.powmod" instead?

Stefan
 
S

Steven D'Aprano

Well fine. I should have realised the question was just a pretense and
that there really never was any intention to consider the reactions,
because the answer is already fixed. Of course a key function can always
be written, it may just need a specific class to implement the specific
order. Likewise there is no reason to expect the order-functions to
preform worse when implemented in a class, rather than in a function.

The reason Guido is considering re-introducing cmp is that somebody at
Google approached him with a use-case where a key-based sort did not
work. The use-case was that the user had masses of data, too much data
for the added overhead of Decorate-Sort-Undecorate (which is what key
does), but didn't care if it took a day or two to sort.

So there is at least one use-case for preferring slowly sorting with a
comparison function over key-based sorting. I asked if there any others.
It seems not.
 
S

Steven D'Aprano

Look we are provided with the cmp_to_key function. If something doesn't
work with that function or performs realy bad with that function, it
means either the function has a bug in it or something the function
relies on, has a bug in it. In either case you just fix the bug.

No, it means that the trade-offs made by cmp_to_key, or by sorting with a
key function, do not interact well with what you are trying to do.

An analogy: Python lists are contiguous arrays of pointers. When the
array is full, Python automatically doubles the size of the list. This is
a good trade-off of space (memory) for time, because it means that list
access is O(1), searching is O(N), and appending is O(1) on average. At
worst, you waste 50% of the memory used, which is pretty good.

But if you're programming for a device with 8 K of memory (if such a
thing even exists these days!), this would be a *terrible* strategy.
(Python wouldn't even fit in 8K, but never mind that...) In this case,
memory is more precious than programmer convenience, or even execution
speed. It's not enough to hand-wave and say "Python lists have a bug that
needs fixing", because that's simply not the case. It's that the design
of Python lists doesn't suit the use-case, and if you try it, it will
perform badly or not at all: by design, Python lists assume memory will
be plentiful and time will be short.

That is exactly the same trade-off key-based sorting makes: it uses more
memory to speed up sorting. This is a good strategy for Python, because
Python already requires lots of memory (so no major loss there), and is a
rather slow language (so speed-ups help).

So the question is, being completely general, as there any *real-world*
(not artificial, made-up) uses where sorting with a comparison function
works but a key function doesn't? I'm not limiting you to cases where you
have a shortage of memory but lots of time available. There may be other
design choices where key-based sorting sucks. We already know that some
people just prefer the look and feel of writing and using cmp functions.
Is there anything else?
 
C

Carl Banks

The reason Guido is considering re-introducing cmp is that somebody at
Google approached him with a use-case where a key-based sort did not
work. The use-case was that the user had masses of data, too much data
for the added overhead of Decorate-Sort-Undecorate (which is what key
does), but didn't care if it took a day or two to sort.

So there is at least one use-case for preferring slowly sorting with a
comparison function over key-based sorting. I asked if there any others.
It seems not.

1. You asked for a specific kind of use case. Antoon gave you a use
case, you told him that wasn't the kind of use case you were asking
for, then you turn around and say "I guess there are no use
cases" (without the mentioning qualification).


2. I posted two use cases in this thread that fit your criteria, and
you followed up to that subthread so you most likely read them. Here
they are again so you won't overlook them this time:

"You have are given an obscure string collating function implented in
a C library you don't have the source to." (Fits your criterion
"can't be done with key=".)

"I'm sitting at an interactive session and I have a
convenient cmp function but no convenient key, and I care more about
the four minutes it'd take to whip up a clever key function or an
adapter class than the 0.2 seconds I'd save to on sorting
time." (Fits your criterion "performs really badly when done so".)


3. You evidently also overlooked the use-case example posted on Python-
dev that you followed up to.


Call me crazy, but you seem to be overlooking a lot of things in your
zeal to prove your point.


Carl Banks
 
O

OKB (not okblacke)

Steven said:
On Fri, 25 Mar 2011 10:21:35 +0100, Antoon Pardon wrote:
The reason Guido is considering re-introducing cmp is that somebody
at Google approached him with a use-case where a key-based sort did
not work. The use-case was that the user had masses of data, too
much data for the added overhead of Decorate-Sort-Undecorate (which
is what key does), but didn't care if it took a day or two to sort.

So there is at least one use-case for preferring slowly sorting
with a comparison function over key-based sorting. I asked if there
any others. It seems not.

That is one use case for preferring cmp. Another use case is "I
can easily think of how to write the cmp function and not the key
function and I don't care about either speed or memory usage". There's
more to practicality than code execution speed.

--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown
 
S

Steven D'Aprano

1. You asked for a specific kind of use case. Antoon gave you a use
case, you told him that wasn't the kind of use case you were asking for,
then you turn around and say "I guess there are no use cases" (without
the mentioning qualification).

That is astonishing. Did you read my post before going on the attack? I
know that there is a grand tradition of adversarial argument on Usenet,
but I am not your enemy here.

First you wrongly accuse me of stating that there are no use cases,
without qualification, when the very text you quote has me giving a use-
case. Nor did I say there are no use cases -- I said that I didn't think
there were any *others*. You even quote me saying this!

Then you go on to wrongly claim:
3. You evidently also overlooked the use-case example posted on Python-
dev that you followed up to.

No, I did not overlook it. I explicitly described it twice. Once in the
very text you quoted above, and once early in the thread when I wrote:

"Guido himself has mentioned one such good use for a comparison
function when sorting. Use of a key function trades off memory for time,
while sorting with a comparison function makes the opposite trade off,
using more time for the sake of saving memory."

2. I posted two use cases in this thread that fit your criteria, and you
followed up to that subthread so you most likely read them. Here they
are again so you won't overlook them this time:

"You have are given an obscure string collating function implented in a
C library you don't have the source to." (Fits your criterion "can't be
done with key=".)

I must admit I don't understand this use-case. What is a string collating
function, and what does this have to do with sorting? I could try to
guess from the plain English meaning of the words, but if I get it wrong,
that wouldn't be fair to you. Perhaps an example would help.

"I'm sitting at an interactive session and I have a convenient cmp
function but no convenient key, and I care more about the four minutes
it'd take to whip up a clever key function or an adapter class than the
0.2 seconds I'd save to on sorting time." (Fits your criterion
"performs really badly when done so".)

"I can't be bothered to write a key function" is not equivalent to "I
have a key function, but it is too slow to use".

In any case, from Python 3.2 onwards, a key function is never more than
one import and one function call away. (In 3.1, you're stuck writing your
own adapter, but in 3.1 sort doesn't take a cmp function anyway, so
either way you're stuck. The earliest it could be reintroduced, if at
all, is in 3.3.)

Call me crazy, but you seem to be overlooking a lot of things in your
zeal to prove your point.

And what point would that be?

Carl, anybody could have asked the question I have asked. Anyone is free
to summarize the best of the arguments -- or all of them -- and post them
to python-dev. If you think I have some ulterior motive in doing this,
please feel free to champion the cmp argument to python-dev instead.
 
M

Mark Dickinson

Westley Martínez, 25.03.2011 14:39:



I don't find that a good excuse for pow() being a builtin. Why not just
rename it to "math.powmod" instead?

+1 (modulo bikeshedding about the name). I've long thought that three-
argument pow belongs in the standard library rather than the core.
And then the existence of the ** operator obviates the need for two-
argument pow.

Worse than pow itself is the fact that the __pow__ special method
takes an optional 3rd argument, which apart from unnecessarily
complicating Python's internal, also leads to horrors like Decimal's
(IMO inappropriate) support for 3-argument pow.
 
T

Terry Reedy

I would call you a bit ungrateful. Steven could have stayed silent
instead of making himself a target by informing interested people on
this list about the pydev post by Guido.

The only point I have seen him 'push' it that it will take a good
argument to move Guido (and other developers) from 'possibly' to "I (we)
made a mistake. Let's revert."

Don't shoot the messenger.
And what point would that be?

That you like Python and want to see it improved even further, as do I ;-).
 
R

Raymond Hettinger

+1 (modulo bikeshedding about the name).  I've long thought that three-
argument pow belongs in the standard library rather than the core.
And then the existence of the ** operator obviates the need for two-
argument pow.

IMO, there is far too much willingness to churn long-standing APIs for
nearly zero benefit. In my decade as a core Python developer, I've
come view deprecations as things that cause unnecessary pain for users
and is partially to blame for so many organizations staying with
older, unmaintained versions of Python.

When it comes to moving pow() to another module or changing the
signature for the __pow__ magic method, there's not much of a win to
be had, but it will affect tons of existing code (including examples
in printed books where pow() is commonly used in examples).

I think we (as a group) need to develop a strong aversion to breaking
existing APIs and instead focus on making existing APIs more robust or
giving them useful extensions.

The Python language is not like a personal library where changing APIs
has limited (and predictable costs).

Raymond
 
A

Antoon Pardon

No, it means that the trade-offs made by cmp_to_key, or by sorting with a
key function, do not interact well with what you are trying to do.

So and if you find something like that, what stops the dev team from
saying that python is just not suited to deal with such a problem?
Specifically as your analogy points in that direction as your dillema has
a strategy that is good for python on the one hand and a problem not
suitable for python on the other hand.
[analgy of python lists, that double in size when full vs
memory management on a 8K device ]
So the question is, being completely general, as there any *real-world*
(not artificial, made-up) uses where sorting with a comparison function
works but a key function doesn't? I'm not limiting you to cases where you
have a shortage of memory but lots of time available. There may be other
design choices where key-based sorting sucks. We already know that some
people just prefer the look and feel of writing and using cmp functions.
Is there anything else?

Asking for *real-world* uses is just how the python community smothers
requests. Should someone come with a use case, I expect the following
to happenr: One will go over the case with magnifying glasses in
search for any possible way in which it could be rewritten so as not
to need the cmp anyway. This will probably not be a big problem because
the person with the use case will probably have given a simplified example
to explain the overal picture. However most people responding will
ingnore the overal picture and focus on details that may very well work
on the simplified example but may not in the specific real-world case.
But since going over all details to argue the case will be too time
consuming, in the end the person with the use case will simply fail to
convince.

Forcing people to use a key-function, will produce cases where python
will ask for more memory and take longer to sort, than allowing to provide
a cmp function. This for the simple reason that for some order-functions,
the only way to write a key-function will be to write a specific class,
that will implement the order with the same kind of code that otherwise
would have been put in the cmp function, making the trade off: memory use
vs speed no longer a trade off.

If the existance alone of such cases isn't enough to reverse the decision
about the cmp-argument of the sort-method. My guess is that nothing will.
 
S

Steven D'Aprano

So and if you find something like that, what stops the dev team from
saying that python is just not suited to deal with such a problem?

There are lots of problems that Python is not well-suited for.

It is not well-suited for writing the driver to a graphics card.

It is not well-suited to calculating pi to a trillion decimal places.

It is not well-suited for application development on a device with 512K
of RAM.

It is not well-suited for copying C idioms exactly, or Java, or Perl.

It is not well-suited for doing a lot of bit-twiddling very quickly.

That's why there is more than one programming language: because no one
language can be well-suited for everything.


[...]
Asking for *real-world* uses is just how the python community smothers
requests.

99% of requests should be smothered. That is a *good thing*. Insisting on
real-world uses prevents the developers from wasting their valuable time
bloating Python with thousands of badly desired anti-features that are
slow, unreliable, buggy, useless or unpopular.


[...]
Forcing people to use a key-function, will produce cases where python
will ask for more memory and take longer to sort, than allowing to
provide a cmp function.

More memory yes; take longer to sort, almost certainly not (except,
perhaps, for a very narrow window where the addition of a key causes the
application to page out to virtual memory, but a comparison function
wouldn't -- and even then, the key function might still be faster).

But we've already covered the "more memory" angle to death. We already
understand that. This is why I have asked if there are any *other* use-
cases for a comparison function.

Dan Stromberg has suggested that another use-case would be lazy-
comparisons, where you might not be able to generate a single key
cheaply, but you can perform a comparison lazily.

This for the simple reason that for some
order-functions, the only way to write a key-function will be to write a
specific class, that will implement the order with the same kind of code
that otherwise would have been put in the cmp function,

There is a recipe for that on ActiveState's website, google for
cmp_to_key. That recipe has been added to Python's standard library for
version 3.2. Please stop flogging that dead horse.

making the trade off: memory use vs speed no longer a trade off.

You want to use a comparison function: you can still do this. Instead of
calling

data.sort(cmp=myfunc)

you call

data.sort(key=functools.cmp_to_key(myfunc))

which is a little longer to type, and uses a little more memory, but
otherwise is no worse than using cmp.

If the existance alone of such cases isn't enough to reverse the
decision about the cmp-argument of the sort-method. My guess is that
nothing will.

Existence of which cases? Made-up imaginary cases that don't actually
happen? Silly cases that can easily be converted to key comparisons, and
run much faster if you do so? Trivial cases that can trivially be
converted to key functions? "I can't be bothered to write a key
function", to paraphrase one of the posters in this thread -- these are
hardly convincing.

So far I have one actual use-case: Guido's example of sorting a huge list
where the amount of time it takes doesn't matter; plus one hypothetical
but plausible use-case: sorting data where the comparison is expensive
but lazy. (And to give Carl Banks the benefit of the doubt, one that I
don't understand, about interfacing with "an obscure C library".)
 
A

Antoon Pardon

The reason Guido is considering re-introducing cmp is that somebody at
Google approached him with a use-case where a key-based sort did not
work. The use-case was that the user had masses of data, too much data
for the added overhead of Decorate-Sort-Undecorate (which is what key
does), but didn't care if it took a day or two to sort.

The question is: Why do you need a use case for this? Should the decision
to remove the cmp-argument, rest only on the fact whether or not current
users are processing such masses of data without considering the possibility
of future users processing such masses of data?

Isn't it obvious that if you force the users into using more memory, that
sooner or later, there will be a user for which this demand of extra memory
will be a problem?

The question is also, whether this really is a use case. Why don't you
advise this person to buy a computer with more memory? One could simply
see this as a problem of too little resources instead of a problem for
python.

IMO, the dev-team can look at this in two different ways. (1) If the
machine has enough resource to contain enormous lists of data, then
it should be possible to sort these data with only little extra memory.
or (2) Sorting will take extra memory proportional to the length of the
list you will try to sort. If your computer doesn't have this extra
memory, your computer just lacks the resource to sort this large
data with python.

It is this design decision that should guide whether or not to have
a cmp-argument. Not the fact whether or not there are now users with
a data set that still fits into a python program on their computer
but for which this data set is too large for python to sort on that
computer.
So there is at least one use-case for preferring slowly sorting with a
comparison function over key-based sorting. I asked if there any others.
It seems not.

How about the possibility of cases where it isn't a choice between slowly
sorting with a comparison function over a faster key-based sorting but
where the choice is over sorting with a comparison function over a slower
and more memory consuming key-based sorting?

I tried to sort lists of 10000 elemens. Each element a tuple two items that
was to be sorted first according to the first item in ascending order, then
according to the second item in descending order. This was done on python 2.6
so I had to write my own cmp_to_key function.

I then sorted these lists of 10000 elements a 1000 times with the following
methods.

1) a cmp-function
2) key = cmp_to_key
3) key = wrapper class around cmp-function
4) key = class specifically written to implement the ordering.

This was done, once with number tuples and and once with string tuples.
The result in seconds

numbers strings

1 63.8 74.3
2 155.3 169.6
3 156.1 170.2
4 96.2 104.4

Now you can of course ignore those numbers because they don't come from a real
world example and thus consider such numbers unimportant until someone in the
future has a case similar enough with the above and wonders why sorting his
data is so slow.
 
S

Steven D'Aprano

I tried to sort lists of 10000 elemens. Each element a tuple two items
that was to be sorted first according to the first item in ascending
order, then according to the second item in descending order. This was
done on python 2.6 so I had to write my own cmp_to_key function.

I then sorted these lists of 10000 elements a 1000 times with the
following methods.

Thank you for spending the time to get some hard data, but I can't
replicate your results since you haven't shown your code. Rather than
attempt to guess what you did and duplicate it, I instead came up with my
own timing measurements. Results are shown here, my code follows below:


[steve@sylar ~]$ python2.7 sort_test.py
Comparison function: 12.1256039143
Key function: 3.51603388786
Double sort: 2.33165812492
cmp_to_key: 28.1129128933


By far the best result comes from two calls to sort. Not far off is the
key function solution. (Admittedly, coming up with a key function for non-
numeric data would be challenging.) The comparison function, and the
cmp_to_key solution, are clearly *much* slower.

I may do another test with larger lists, and leave it running overnight
and see what happens.


Code:
from random import random, shuffle
from operator import itemgetter
from timeit import Timer
from functools import cmp_to_key

def make_data(n):
x = range(n)*2
y = range(n)*2
shuffle(x)
shuffle(y)
data = zip(x, y)
assert len(data) == 2*n
return data

# Sort according to the first item in ascending order, then by second item
# in descending order:

def cmp_func(t1, t2):
return cmp(t1[0], t2[0]) or cmp(t2[1], t1[1])

def key_func(t):
return (t[0], -t[1])

def sorter(d):
d = sorted(d, key=itemgetter(1), reverse=True)
return sorted(d, key=itemgetter(0))

# Check that all sort methods give the same result.
data = make_data(20)
assert sorted(data, cmp=cmp_func) == sorted(data, key=key_func) == \
sorter(data) == sorted(data, key=cmp_to_key(cmp_func))


data = make_data(5000)
assert len(data) == 10000
setup = "from __main__ import cmp_to_key, cmp_func, key_func, sorter,
data"

t1 = Timer("sorted(data, cmp=cmp_func)", setup)
t2 = Timer("sorted(data, key=key_func)", setup)
t3 = Timer("sorter(data)", setup)
t4 = Timer("sorted(data, key=cmp_to_key(cmp_func))", setup)

print "Comparison function:",
print min(t1.repeat(number=100, repeat=7))
print "Key function:",
print min(t2.repeat(number=100, repeat=7))
print "Double sort:",
print min(t3.repeat(number=100, repeat=7))
print "cmp_to_key:",
print min(t4.repeat(number=100, repeat=7))


[end code]
 
F

Fons Adriaensen

Asking for *real-world* uses is just how the python community smothers
requests.

It's quite a common strategy, I've seen it used in many
contexts. Which doesn't make it any more acceptable of
course.
Should someone come with a use case, I expect the following
to happenr: One will go over the case with magnifying glasses in
search for any possible way in which it could be rewritten so as not
to need the cmp anyway. This will probably not be a big problem because
the person with the use case will probably have given a simplified example
to explain the overal picture. However most people responding will
ingnore the overal picture and focus on details that may very well work
on the simplified example but may not in the specific real-world case.
But since going over all details to argue the case will be too time
consuming, in the end the person with the use case will simply fail to
convince.

An accurate description of the process.
Forcing people to use a key-function, will produce cases where python
will ask for more memory and take longer to sort, than allowing to provide
a cmp function. This for the simple reason that for some order-functions,
the only way to write a key-function will be to write a specific class,
that will implement the order with the same kind of code that otherwise
would have been put in the cmp function, making the trade off: memory use
vs speed no longer a trade off.

If cmp_to_key() would do some (probably impossible) magic - factoring
the user's cmp() into a non-trivial key mapping and a simpler and more
efficient compare - then it would make sense. But it doesn't do such
a thing. The 'key' it uses is just a copy, and the compare that will
be used finally is just the user's one. Nothing gained, just overhead
added.
If the existance alone of such cases isn't enough to reverse the decision
about the cmp-argument of the sort-method. My guess is that nothing will.

Sadly enough, that is probably a good guess.

Ciao,
 
R

Raymond Hettinger

Thank you for spending the time to get some hard data, but I can't
replicate your results since you haven't shown your code. Rather than
attempt to guess what you did and duplicate it, I instead came up with my
own timing measurements. Results are shown here, my code follows below:

[steve@sylar ~]$ python2.7 sort_test.py
Comparison function: 12.1256039143
Key function: 3.51603388786
Double sort: 2.33165812492
cmp_to_key: 28.1129128933

By far the best result comes from two calls to sort. Not far off is the
key function solution. (Admittedly, coming up with a key function for non-
numeric data would be challenging.) The comparison function, and the
cmp_to_key solution, are clearly *much* slower.

Thank you for spending the time to get some hard data, but I can't
replicate your results since you haven't shown your code. Rather than
attempt to guess what you did and duplicate it, I instead came up with my
own timing measurements. Results are shown here, my code follows below:

[steve@sylar ~]$ python2.7 sort_test.py
Comparison function: 12.1256039143
Key function: 3.51603388786
Double sort: 2.33165812492
cmp_to_key: 28.1129128933

By far the best result comes from two calls to sort. Not far off is the
key function solution. (Admittedly, coming up with a key function for non-
numeric data would be challenging.) The comparison function, and the
cmp_to_key solution, are clearly *much* slower.

There is less here than meets the eye. The cmp-function dispatch in
cmp_to_key() is written in pure Python. Essentially, the timings are
showing the already well-known fact that call forwarding is faster in
C than in pure python.

Each of the O(n log n) comparisons is run through pure Python code
that like this:

def __lt__(self, other):
# mycmp is nested scope variable pointing to
# the user-defined cmp-function
return mycmp(self.obj, other.obj) < 0

I intend to add a C version of cmp_to_key() so that no trips around
the eval-loop are needed. It should be about as efficient as
bound_method dispatch (very fast), leaving the user-supplied cmp-
function as the dominant cost in the handful of cases where the
superfast O(n) key-function approach can't be used for some reason.

The memory overhead should either stay the same or drop slightly
(currently at 14 bytes per element on a 32-bit build and 28 bytes per
element on a 64-bit build).

Also note that running timings on Py2.7 instead of Py3.2 disadvantages
key-functions. In 2.7, key-functions use sortwrapper objects which
were removed in 3.2 (saving memory and reducing dispatch time
considerably). Also, in Py2.7 cmp logic is called even when a key-
function is defined (because key-function logic was wrapped around the
already existing sort with cmp-logic) so you pay the price of both
(i.e. creating a 2-tuple for cmp arguments on every call). In Py3.x
the cmp logic is gone, making the remaining key-function logic even
faster. IOW, key-function logic is way faster and more space
efficient in 3.2.

One of the implications of the last paragraph is that if 2.7 style cmp-
logic were reinstated in 3.3, not only would it clutter the API with
two-ways-to-do-it, it would also disadvantage make the common case
(using key-functions) run much more slowly. In other words, the
performance mavens will have shot themselves (and us) in the foot.

Raymond
 
P

Paul Rubin

Steven D'Aprano said:
There are lots of problems that Python is not well-suited for.
It is not well-suited for writing the driver to a graphics card.
It is not well-suited to calculating pi to a trillion decimal places.
...

But sorting moderate amounts of data on complicated criteria is an area
Python IS supposed to be well-suited for.
99% of requests should be smothered. That is a *good thing*.

We're not talking about a "request" for a feature to be added. There
was an existing feature that worked perfectly well, that was removed for
some ideological reason that makes no sense to me, with some slight
allusions to technical reasons that I've also never completely
understood.
More memory yes; take longer to sort, almost certainly not (except,
perhaps, for a very narrow window where the addition of a key causes

I don't see how using something like cmp_to_key could ever fail to slow
the program down. It's calling the exact same comparison function the
same number of times, but also creating and releasing a pile of class
instances, and doing a lot of runtime lookups to access the comparison
function in each instance on each comparison.
data.sort(key=functools.cmp_to_key(myfunc))
which is a little longer to type, and uses a little more memory, but
otherwise is no worse than using cmp.

I don't believe that for one second. Have you actually timed it? Like,
sort an array of 100000 random (int,int) pairs ascending on the first
int and descending on the second, using a simple comparison function and
using cmp_to_key? That's Anton's example, and by coincidence that
exact situation came up in something I'm hacking on RIGHT NOW. I
have (roughly) a list of transactions in the form
(person's name, timestamp, transaction info)
and I want to print a report sorted alphabetically by person, listing
each person's transactions most-recent-first. Of course I was able
to concoct a solution with a few moments of head-scratching, but
using a cmp argument is much more natural and transparent in my
opinion. DSU is a clever and useful design pattern, but comparison
sorting is what all the sorting textbooks are written about.
 
A

Antoon Pardon

[...]
Forcing people to use a key-function, will produce cases where python
will ask for more memory and take longer to sort, than allowing to
provide a cmp function.

More memory yes; take longer to sort, almost certainly not (except,
perhaps, for a very narrow window where the addition of a key causes the
application to page out to virtual memory, but a comparison function
wouldn't -- and even then, the key function might still be faster).

But we've already covered the "more memory" angle to death. We already
understand that. This is why I have asked if there are any *other* use-
cases for a comparison function.

Dan Stromberg has suggested that another use-case would be lazy-
comparisons, where you might not be able to generate a single key
cheaply, but you can perform a comparison lazily.

This for the simple reason that for some
order-functions, the only way to write a key-function will be to write a
specific class, that will implement the order with the same kind of code
that otherwise would have been put in the cmp function,

There is a recipe for that on ActiveState's website, google for
cmp_to_key. That recipe has been added to Python's standard library for
version 3.2. Please stop flogging that dead horse.

This doesn't make any sense. As far as I can see there is nothing in Dan's
code that would make it more troublesome to use with cmp_to_key than Carl
Bank's examples or that would make it loose more speed than the overhead
I was describing.

Whatever is meant by lazily, the cmp_to_key function will generate a key
function that during the sorting will finally defer to this cmp-function
to do the comparisons. If the cmp is lazy when it does its work as a
cmp-argument, it will be lazy when it does its work when a key is wrapped
around it.

This argument works for whatever property, the cmp-function should posses.
If your cmp argument has property foo when comparing during a sort then
using cmp_to_key will preserve than property when comparing during a sort.
This for the simple reason that with a cmp_to_key function when the
sort comapares two keys, it will after some overhead just call the cmp
function with the undecorated data.

That means that asking for use cases that fall outside the overhead in
memory or speed of the cmp_to_key function and are not about the cumbersome
work in writing your own key-function in order to reduce that overhead,
that asking for such use cases is like asking for a real number that squares
to a negative. We can conclude a priori that there won't be any.

That is why I don't consider a request for such use cases a serious request.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top