Replacing cmp with key for sorting

George Sakkis · Nov 3, 2008

I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

George

bearophileHUGS · Nov 3, 2008

I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

George

Your input and output:

s = ['a', 'bc', 'bd', 'bcb', 'ba', 'ab']
r = ['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

To me your lambda looks like an abuse of the inline if expression. So
I suggest to replace it with a true function, that is more readable:

def mycmp(x, y):
if x == y:
return 0
elif x.startswith(y):
return -1
elif y.startswith(x):
return +1
else:
return cmp(x, y)

print sorted(s, cmp=mycmp)

It's a peculiar cmp function, I'm thinking still in what situations it
can be useful.

To use the key argument given a cmp function I use the simple code
written by Hettinger:

def cmp2key(mycmp):
"Converts a cmp= function into a key= function"
class K:
def __init__(self, obj, *args):
self.obj = obj
def __cmp__(self, other):
return mycmp(self.obj, other.obj)
return K
print sorted(s, key=cmp2key(mycmp))

Now I'll look for simpler solutions...

Bye,
bearophile

Arnaud Delobelle · Nov 3, 2008

George Sakkis said:
I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

Here's an idea:
['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

The 3 above is the length of the longest string in the list

Here's another idea, probably more practical:
['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

HTH

bearophileHUGS · Nov 3, 2008

Alan G Isaac:

Probably not what you had in mind ...
...
>>> def k(si): return si+'z'*(maxlen-len(si))

This looks a little better:

assert isinstance(s, str)
sorted(s, key=lambda p: p.ljust(maxlen, "\255"))

If the string is an unicode that may not work anymore.
I don't know if there are better solutions.

Bye,
bearophile

bearophileHUGS · Nov 3, 2008

Arnaud Delobelle:

Here's another idea, probably more practical:

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Bye,
bearophile

Arnaud Delobelle · Nov 3, 2008

Arnaud Delobelle:

Here's another idea, probably more practical:

Click to expand...

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Of course that's better! (although mine will work with unicode if yours
does). It's funny how the obvious escapes me so often. Still I think
the idea of the 'double reverse' (one letterwise, the other listwise)
was quite good.

bearophileHUGS · Nov 3, 2008

Arnaud Delobelle:

It's funny how the obvious escapes me so often.

In this case it's a well known cognitive effect: the mind of humans
clings to first good/working solution, not allowing its final tuning.
For that you may need to think about something else for a short time,
and then look at your solution with a little "fresher" mind.

This (ugly) translation into D + my functional-style libs shows why
Python syntax is a good idea:

import d.all;
void main() {
auto txt = "a bc bd bcb ba ab".split();
putr( sorted(txt, (string s){ return map((char c){return -
cast(int)c;}, s);} ).reverse );
}

Long Live To Python!

Bye,
bearophile

George Sakkis · Nov 3, 2008

Arnaud Delobelle:

Here's another idea, probably more practical:

Click to expand...

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Bye,
bearophile

Awesome! I tested it on a sample list of ~61K words [1] and it's
almost 40% faster, from ~1.05s dropped to ~0.62s. That's still >15
times slower than the default sorting (0.04s) but I guess there's not
much more room for improvement.

George

[1] http://www.cs.pitt.edu/~kirk/cs1501/Pruhs/Spring2006/assignments/boggle/5desk.txt

bearophileHUGS · Nov 3, 2008

George Sakkis:

but I guess there's not much more room for improvement.

That's nonsense, Python is a high level language, so there's nearly
always room for improvement (even in programs written in assembly you
can generally find faster solutions).
If speed is what you look for, and your strings are ASCII then this is
much faster:

tab = "".join(map(chr, xrange(256)))[::-1]
s.sort(key=lambda x: x.translate(tab), reverse=True)

Bye,
bearophile

Q: sort's key and cmp parameters	45	Oct 1, 2009
Q: sort's key and cmp parameters	1	Oct 1, 2009
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Parallel sorting algorithms...	0	Sep 7, 2012
Struggling for inspiration with lists	1	Dec 18, 2013
Paralle sorting algorithms...	0	Sep 7, 2012
Perl-Python-a-Day: Sorting	20	Oct 10, 2005
sorting question	6	Aug 10, 2005

Replacing cmp with key for sorting

George Sakkis

bearophileHUGS

Arnaud Delobelle

bearophileHUGS

bearophileHUGS

Arnaud Delobelle

bearophileHUGS

George Sakkis

bearophileHUGS

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads