* Steven D'Aprano:
L = ["æ", "ø", "å"] # This is in SORTED ORDER in Norwegian L
[...]
L.sort( key = locale.strxfrm )
L ['å', 'æ', 'ø']
locale.strcoll( "å", "æ" ) 1
locale.strcoll( "æ", "ø" )
-1
Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that
is, it would have if it could have been used as cmp function to sort (or
better, to a separate routine named e.g. custom_sort).
This is in Python2.5, so I'm explicitly specifying unicode strings:
L = [u"æ", u"ø", u"å"]
assert sorted(L) == [u'å', u'æ', u'ø']
The default C-locale sorting of L does not equal to L. Now let's change
to Norwegian locale:
æøå
So far so good, we've confirmed that in Python 2.5 we can sort in a
locale-aware form. Now, can we do this with a key function? Thanks to
Raymond Hettinger's recipe here:
http://code.activestate.com/recipes/576653/
æøå
Success!
Let's try it in Python 3.1:
L = ["æ", "ø", "å"]
assert sorted(L) == ['å', 'æ', 'ø']
import locale
locale.setlocale(locale.LC_ALL, 'nb_NO')
'nb_NO'
Hm. A bit off-topic, but...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\cpython\python31\lib\locale.py", line 527, in
return _setlocale(category, locale)
locale.Error: unsupported locale setting
This on a machine where the Python default locale is Norwegian.
'æøå'
The definition of CmpToKey can be found at the URL above. It's not very
exciting, but here it is:
def CmpToKey(mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) == -1
def __gt__(self, other):
return mycmp(self.obj, other.obj) == 1
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) != 1
def __ge__(self, other):
return mycmp(self.obj, other.obj) != -1
def __ne__(self, other):
return mycmp(self.obj, other.obj) != 0
return K
This is pretty smart as a generic solution.
Thanks!
I was thinking more of sticking those comparisions in some custom string class
or such, which would be rather ugly...
The above does have quite high overhead though!
That is, it's /inefficient/ compared to using a 'cmp' function directly.
If that's too verbose for you, stick this as a helper function in your
application:
def CmpToKey(mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
__lt__ = lambda s, o: mycmp(s.obj, o.obj) == -1
__gt__ = lambda s, o: mycmp(s.obj, o.obj) == 1
__eq__ = lambda s, o: mycmp(s.obj, o.obj) == 0
__le__ = lambda s, o: mycmp(s.obj, o.obj) != 1
__ge__ = lambda s, o: mycmp(s.obj, o.obj) != -1
__ne__ = lambda s, o: mycmp(s.obj, o.obj) != 0
return K
[...]
The above may just be a bug in the 3.x stxfrm. But it illustrates that
sometimes you have your sort order defined by a comparision function.
Transforming that into a key can be practically impossible (it can also
be quite inefficient).
This might be true, but you haven't demonstrated it.
The "can be ... practically impossible" was hogwash. I posted late, sorry. The
"quite inefficient" holds.
With one little
helper function, you should be able to convert any comparison function
into a key function, with relatively little overhead.
I wouldn't call an extra Python method call per comparision "relatively little
overhead".
Note that the same number of comparisions as with a 'cmp' based sort is performed.
But with the wrapper every comparision is indirected through a Python method
call (I think, but not sure, that the creation of those comparision objects does
not introduce significant overhead, but the calls must).
<code>
# -*- coding: utf-8 -*-
from __future__ import print_function
from __future__ import unicode_literals
try:
range = xrange
except:
pass
import locale
import sys
import random
import timeit
def CmpToKey( mycmp ):
"Convert a cmp= function into a key= function"
class K( object ):
def __init__( self, obj, *args ):
self.obj = obj
def __lt__( self, other ):
return mycmp( self.obj, other.obj ) == -1
def __gt__( self, other ):
return mycmp( self.obj, other.obj ) == 1
def __eq__( self, other ):
return mycmp( self.obj, other.obj ) == 0
def __le__( self, other ):
return mycmp( self.obj, other.obj ) != 1
def __ge__( self, other ):
return mycmp( self.obj, other.obj ) != -1
def __ne__( self, other ):
return mycmp( self.obj, other.obj ) != 0
return K
def elapsed_time_for( f ):
n_calls = 1
return timeit.timeit( f, number = n_calls )
def major_pyversion():
return sys.version_info[0]
def sorting_of( string_list ):
def sort2x():
string_list.sort( cmp = locale.strcoll )
def sort3x():
string_list.sort( key = CmpToKey( locale.strcoll ) )
return sort2x if major_pyversion() <= 2 else sort3x
alphabet = "abcdefghijklmnopqrstuvwxyzæøå"
n_strings = int( 1e5 )
nbno_locale = "" # "nb_NO" raises exception on my machine.
def rnd_string():
len = random.randint( 1, 80 )
s = ""
for i in range( len ):
s = s + random.choice( alphabet )
return s
locale.setlocale( locale.LC_ALL, nbno_locale )
random.seed( 666 )
assert( rnd_string() == "æmoxqudxlszåaåcteærmuocøjrdæpbvyabwhn" )
print( "Creating array of pseudo-random strings..." )
L = [rnd_string() for i in range( n_strings )]
print( "Sorting..." )
time = elapsed_time_for( sorting_of( L ) )
print( "Last string is '{0}'".format( L[-1] ) )
print( "{0}".format( time ) )
assert( L[-1][0] == "Ã¥" )
</code>
<example>
C:\Documents and Settings\Alf\test> py3 sort.py
Creating array of pseudo-random strings...
Sorting...
Last string is
'åååywxuybrxgvnstoyvcmdåjdlnxwwfsbnzsinxncmtrxuxgtqtduzvåixqovtjxiflfesuvfa'
8.45987772189
C:\Documents and Settings\Alf\test> py2 sort.py
Creating array of pseudo-random strings...
Sorting...
Last string is
'åååywxuybrxgvnstoyvcmdåjdlnxwwfsbnzsinxncmtrxuxgtqtduzvåixqovtjxiflfesuvfa'
3.41336790011
C:\Documents and Settings\Alf\test> _
</example>
In the run above Python 3.x was only roughly 2.5 times slower. On my machine it
ranges from 2.5 times slower to 3 times slower. Most often around 2.5 x slower.
Of course this does not measure how much faster a Python 3.x cmp/"<" based sort
/could be/, only how slow, sluggish and snail-like it is compared to 2.x.
In other words, that's the speed price, 2.5 to 3 times slower sorting, for
ditching cmp instead of e.g. replacing it with something better.
Cheers & hth.,
- Alf