Perl-Python-a-Day: Sorting

X

Xah Lee

Sort a List

Xah Lee, 200510

In this page, we show how to sort a list in Python & Perl and also
discuss some math of sort.

To sort a list in Python, use the “sort†method. For example:

li=[1,9,2,3];
li.sort();
print li;


Note that sort is a method, and the list is changed in place.

Suppose you have a matrix, and you want to sort by second column.
Example Solution:

li=[[2,6],[1,3],[5,4]]
li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]


The line “li.sort(lambda x, y: cmp(x[1],y[1]))†can also be written
as “li.sort(cmp=lambda x, y: cmp(x[1],y[1]))â€

The argument to sort is a function of two arguments, that returns -1,
0, 1. This function is a decision function that tells sort() how to
decide the order of any two elements in your list. If the first
argument is “less†then second argument, the function should return
-1. If equal, then 0. Else, 1.

Here's a more complex example. Suppose you have a list of strings.

'my283.jpg'
'my23i.jpg'
'web7-s.jpg'
'fris88large.jpg'
....


You want to sort them by the number embedded in them. What you have to
do, is to provide sort() method a function, that takes two strings, and
compares the integer inside the string. Here's the solution:

li=[
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
]

def myComp (x,y):
import re
def getNum(str): return float(re.findall(r'\d+',str)[0])
return cmp(getNum(x),getNum(y))

li.sort(myComp)
print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg',
'my283.jpg']


Here, we defined a function myComp to tell sort about the ordering.
Normally, one would use the “lambda†construct, but Python's lambda
construct can only represent the simplest functions.

Some Math about Sorting

In general, the function f used to determine the order of any two
element must satisfy some constraints:

• f(a,a)==0
• if f(a,b)==0 then f(b,a)==0
• if f(a,b)==0 and f(b,c)==0, then f(a,c)==0.
• if f(a,b)==-1 and f(b,c)==-1, then f(a,c)==-1.
• if f(a,b)==-1, then f(b,a)==1.


If the comparison function does not behave as the above, then it is not
consistent, meaning that the result “ordered†list is may actually
be different depending how the language happens to implement sort.

The significance of all these is that in real software you may want to
sort a list of non-simple entities by a specialized ordering. For
example, you may want to sort a list of polygonal surfaces in 3D space,
for particular reasons in implementing some computer graphics features.
Say, you want to sort these polygons by their spacial orientations. It
is in advanced cases like these, understanding the basic math about
ordering is important. Otherwise, you might have a bewildering result
yet unable to locate any flaws in your code.

Python's “sort†method's optional parameters: “key†and
“reverseâ€

Most of the time, sorting is done for a list of atomic element such as
[3,2,4]. This is simply done by myList.sort() without any argument.
Other than simple list, sort is frequently used on matrixes (e.g.
[[2,6],[1,3],[5,4]]). For matrixes, almost always a particular column
is used for the basis of ordering. For example, if we want to sort by
second column, we do: “li.sort(lambda x, y: cmp(x[1],y[1]))â€. Since
this is frequently used, Python provides a somewhat shorter syntax for
it, by specifying the column used as the ordering “keyâ€. For
example:

li=[[2,6],[1,3],[5,4]]
li.sort(key=lambda x:x[1] ) # is equivalent to the following
#li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]


Because Python's implementation is not very refined , this specialized
syntax is actually much speedier than the general form “lambda x, y:
cmp(x[1],y[1])â€. It is a burden on the programer to always use the
“key†syntax idiosyncrasy if he is sorting a large matrix.

Another idiosyncratic provision is the optional “reverse†argument.
This parameter is somewhat necessary when using the “keyâ€
parameter. One can reverse the ordering by using the “reverseâ€
keyword as a argument to sort. Example:

The following are equivalent:

li.sort(key=lambda x:x[1], reverse=True )
li.sort(lambda x, y: cmp(x[1],y[1]), reverse=True)
li.sort(lambda x, y: cmp(y[1],x[1]))


The official doc on Python's sort method is at (bottom):
http://python.org/doc/2.4/lib/typesseq-mutable.html

Sorting in Perl

(to be posted in a couple of days)

This post is archived at:
http://xahlee.org/perl-python/sort_list.html

Xah
(e-mail address removed)
∑ http://xahlee.org/
 
M

Marcin 'Qrczak' Kowalczyk

Followup-To: comp.lang.scheme

Xah Lee said:
Since this is frequently used, Python provides a somewhat shorter
syntax for it, by specifying the column used as the ordering “keyâ€. [...]
Because Python's implementation is not very refined , this specialized
syntax is actually much speedier than the general form “lambda x, y:
cmp(x[1],y[1])â€. It is a burden on the programer to always use the
“key†syntax idiosyncrasy if he is sorting a large matrix.

It's not only clearer for a human, but also faster in all good
implementations of all languages which support that, except when the
ordering function is very simple. It's called Schwartzian transform
and I wish more language designers and programmers knew about it.

http://en.wikipedia.org/wiki/Schwartzian_transform

I urge future SRFI authors to include it. The withdrawn SRFI-32 for
sorting didn't do that, and I can't find any other SRFI which deals
with sorting.
 
U

Ulrich Hobelmann

Xah said:
To sort a list in Python, use the “sort†method. For example:

li=[1,9,2,3];
li.sort();
print li;

Likewise in Common Lisp. In Scheme there are probably packages for that
as well. My apologies for not being very fluent anymore.

CL-USER> (setf list (sort '(1 9 2 3) #'<)) ; input
(1 2 3 9) ; output

The second argument is mandatory too (comparison function).
Note that sort is a method, and the list is changed in place.

Same here. To be safe, assign the result to "list".
Suppose you have a matrix, and you want to sort by second column.
Example Solution:

li=[[2,6],[1,3],[5,4]]
li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

CL-USER> (setf list (sort '((2 6) (1 3) (5 4))
#'(lambda (x y) (< (second x) (second y)))))
((1 3) (5 4) (2 6)) ; output
The argument to sort is a function of two arguments, that returns -1,
0, 1. This function is a decision function that tells sort() how to
decide the order of any two elements in your list. If the first
argument is “less†then second argument, the function should return
-1. If equal, then 0. Else, 1.

In CL you only need a smaller-than function. I guess if elements are
"equal", they don't need sorting anyway.
li=[
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
]

CL-USER> (setf list '("my283.jpg" "my23i.jpg" "web7-s.jpg"
"fris88large.jpg"))
def myComp (x,y):
import re
def getNum(str): return float(re.findall(r'\d+',str)[0])
return cmp(getNum(x),getNum(y))

CL-USER> (defun my-comp (x y)
(flet ((getnum (s)
(parse-integer s :start (position-if #'digit-char-p s)
:junk-allowed t)))
(< (getnum x) (getnum y))))
li.sort(myComp)
print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg',
'my283.jpg']

CL-USER> (setf list (sort list #'my-comp))
("web7-s.jpg" "my23i.jpg" "fris88large.jpg" "my283.jpg") ; output
li=[[2,6],[1,3],[5,4]]
li.sort(key=lambda x:x[1] ) # is equivalent to the following
#li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

CL-USER> (setf list (sort '((2 6) (1 3) (5 4)) #'< :key #'second))
((1 3) (5 4) (2 6)) ; output

Here some people might jump in and say "lists might be more readable
than vectors, but lists are slow."
If they are slow for your data set, just use vectors instead ;)
 
P

Pascal Costanza

Ulrich said:
Xah said:
To sort a list in Python, use the “sort†method. For example:

li=[1,9,2,3];
li.sort();
print li;

Likewise in Common Lisp. In Scheme there are probably packages for that
as well. My apologies for not being very fluent anymore.

CL-USER> (setf list (sort '(1 9 2 3) #'<)) ; input
(1 2 3 9) ; output

Careful. Common Lisp's sort function is specified to be destructive, so
you shouldn't use it on literal constants. So don't say (sort '(1 9 2 3)
....), say (sort (list 1 9 2 3) ...), etc.


Pascal
 
X

Xah Lee

Python Doc Problem Example: sort()

Xah Lee, 200503
Exhibit: Incompletion & Imprecision

Python doc “3.6.4 Mutable Sequence Types†at
http://python.org/doc/2.4/lib/typesseq-mutable.html

in which contains the documentation of the “sort†method of a list.
Quote:

«
Operation Result Notes
s.sort([cmp[, key[, reverse]]]) sort the items of s in place (7),
(8), (9), (10)

(7) The sort() and reverse() methods modify the list in place for
economy of space when sorting or reversing a large list. To remind you
that they operate by side effect, they don't return the sorted or
reversed list.

(8) The sort() method takes optional arguments for controlling the
comparisons.

cmp specifies a custom comparison function of two arguments (list
items) which should return a negative, zero or positive number
depending on whether the first argument is considered smaller than,
equal to, or larger than the second argument: "cmp=lambda x,y:
cmp(x.lower(), y.lower())"

key specifies a function of one argument that is used to extract a
comparison key from each list element: "cmp=str.lower"

reverse is a boolean value. If set to True, then the list elements
are sorted as if each comparison were reversed.

In general, the key and reverse conversion processes are much
faster than specifying an equivalent cmp function. This is because cmp
is called multiple times for each list element while key and reverse
touch each element only once.

Changed in version 2.3: Support for None as an equivalent to
omitting cmp was added.

Changed in version 2.4: Support for key and reverse was added.

(9) Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary
grade).

(10) While a list is being sorted, the effect of attempting to
mutate, or even inspect, the list is undefined. The C implementation of
Python 2.3 and newer makes the list appear empty for the duration, and
raises ValueError if it can detect that the list has been mutated
during a sort.
»


As a piece of documentation, this is a lousy one.

The question Python doc writers need to ask when evaluating this piece
of doc are these:

• can a experienced programer who is expert at several languages but
new to Python, and also have read the official Python tutorial, can he,
read this doc, and know exactly how to use sort with all the options?

• can this piece of documentation be rewritten fairly easily, so that
the answer to the previous question is a resounding yes?

To me, the answers to the above questions are No and Yes. Here are some
issues with the doc:

• In the paragraph about the “key†parameter, the illustration
given is: “cmp=str.lowerâ€. It should be be “key=str.lowerâ€

• This doc lacks examples. One or two examples will help a lot,
especially to less experienced programers. (which comprises the
majority of readers) In particular, it should give a full example of
using the comparison function and one with the “key†parameter.
Examples are particularly needed here because these parameteres are
functions, often with the “lambda†construct. These are unusual and
advanced constructs among imperative programers.

• This doc fails to mention what happens when the predicate and the
shortcut version conflicts. e.g. “myList.sort(cmp=lambda x,y:
cmp(x[0], y[0]), key=lambda x: str(x[1]) )â€

• The syntax notation Python doc have adopted for indicating optional
parameters, does not give a clear view just exactly what combination of
optional parameters can be omitted. The notation: “s.sort([cmp[,
key[, reverse]]])†gives the impression that only trailing arguments
can be omitted, which is not true.

• The doc gives no indication of how to omit a optional arg. Should
it be “nulâ€, “Nullâ€, 0, or left empty? Since it doesn't give
any examples, doc reader who isn't Python experts is left to guess at
how true/false values are presented in Python.

• On the whole, the way this doc is written does not give a clear
picture of the roles of the supplied options, nor how to use them.

Suggested Quick Remedy: add a example of using the cmp function. And a
example using the “key†function. Add a example of Using one of
them and with reverse. (the examples need not to come with much
explanations. One sentence annotation is better than none.)

Other than that, the way the doc is layed out with a terse table and
run-on footnotes (employed in several places in Python doc) is not
inductive. For a better improvement, there needs to be a overhaul of
the organization and the attitude of the entire doc. The organization
needs to be programing based, as opposed to implementation or computer
science based. (in this regard, one can learn from the Perl folks). As
to attitude, the writing needs to be Python-as-is, as opposed to
computer science framework, as indicated in the early parts of this
critique series.
----------------
This post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html

Xah
(e-mail address removed)
∑ http://xahlee.org/
 
X

Xah Lee

Addendum, 200510

Here's further example of Python's extreme low quality of
documentation. In particular, what follows focuses on the bad writing
skill aspect, and comments on some language design and quality issues
of Python.
From the Official Python documentation of the sort() method, at:
http://python.org/doc/2.4.2/lib/typesseq-mutable.html, Quote:

«The sort() method takes optional arguments for controlling the
comparisons.»

It should be “optional parameter†not “optional argumentâ€.
Their difference is that “parameter†indicates the variable, while
“argument†indicates the actual value.

«... for controlling the comparisons.»

This is a bad writing caused by lack of understanding. No, it doesn't
“control the comparisonâ€. The proper way to say it is that “the
comparison function specifies an orderâ€.

«The sort() and reverse() methods modify the list in place for economy
of space when sorting or reversing a large list. To remind you that
they operate by side effect, they don't return the sorted or reversed
list. »

This is a example of tech-geeking drivel. The sort() and reverse()
methods are just the way they are. Their design and behavior are really
not for some economy or remind programers of something. The Python doc
is bulked with these irrelevant drivels. These littered inanities
dragged down the whole quality and effectiveness of the doc implicitly.

«Changed in version 2.4: Support for key and reverse was added.»

«In general, the key and reverse conversion processes are much faster
than specifying an equivalent cmp function. This is because cmp is
called multiple times for each list element while key and reverse touch
each element only once.»

When sorting something, one needs to specify a order. The easiest way
is to simply list all the elements as a sequence. That way, their order
is clearly laid out. However, this is in general not feasible and
impractical. Therefore, we devised a mathematically condensed way to
specify the order, by defining a function f(x,y) that can take any two
elements and tell us which one comes first. This, is the gist of
sorting a list in any programing language.

The ordering function, being a mathematically condensed way of
specifying the order, has some constraints. For example, the function
should not tell us x < y and y < x. (For a complete list of these
constraints, see http://xahlee.org/perl-python/sort_list.html )

With this ordering function, it is all sort needed to sort a list.
Anything more is interface complexity.

The optional parameters “key†and “reverse†in Python's sort
method is a interface complexity. What happened here is that a compiler
optimization problem is evaded by moving it into the language syntax
for programers to worry about. If the programer does not use the
“key†syntax when sorting a large matrix (provided that he knew in
advance of the list to be sorted or the ordering function), then he is
penalized by a severe inefficiency by a order of magnitude of execution
time.

This situation, of moving compiler problems to the syntax surface is
common in imperative languages.

«Changed in version 2.3: Support for None as an equivalent to omitting
cmp was added.»

This is a epitome of catering towards morons. “myList.sort()†is
perfect but Pythoners had to add “myList.sort(None)†interface
complexity just because idiots need it.

The motivation here is simple: a explicit “None†gives coding
monkeys a direct sensory input of the fact that “there is no
comparison functionâ€. This is like the double negative in black
English “I ain't no gonna do it!â€. Logically, “None†is not
even correct and leads to bad thinking. What really should be stated in
the doc, is that “the default ordering function to sort() is the
‘cmp’ function.â€.

«Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary
grade).»

One is quite surprised to read this. For about a decade of a language's
existence, its sort functionality is not smart enough to preserve
order?? A sort that preserves original order isn't something difficult
to implement. What we have here is sloppiness and poor quality common
in OpenSource projects.

Also note the extreme low quality of the writing. It employes the
jargon “stable sort†then proceed to explain what it is, and the
latch on of “multiple passes†and the mysterious “by department,
by salaryâ€.

Here's a suggested rewrite: “Since Python 2.3, the result of sort()
no longer rearrange elements where the comparison function returns
0.â€
-----------
This post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html

Xah
(e-mail address removed)
∑ http://xahlee.org/
 
B

Bryan

Xah said:
Addendum, 200510

Here's further example of Python's extreme low quality of
documentation. In particular, what follows focuses on the bad writing
skill aspect, and comments on some language design and quality issues
of Python.

http://python.org/doc/2.4.2/lib/typesseq-mutable.html, Quote:

«The sort() method takes optional arguments for controlling the
comparisons.»

It should be “optional parameter†not “optional argumentâ€.
Their difference is that “parameter†indicates the variable, while
“argument†indicates the actual value.

«... for controlling the comparisons.»

This is a bad writing caused by lack of understanding. No, it doesn't
“control the comparisonâ€. The proper way to say it is that “the
comparison function specifies an orderâ€.

«The sort() and reverse() methods modify the list in place for economy
of space when sorting or reversing a large list. To remind you that
they operate by side effect, they don't return the sorted or reversed
list. »

This is a example of tech-geeking drivel. The sort() and reverse()
methods are just the way they are. Their design and behavior are really
not for some economy or remind programers of something. The Python doc
is bulked with these irrelevant drivels. These littered inanities
dragged down the whole quality and effectiveness of the doc implicitly.

«Changed in version 2.4: Support for key and reverse was added.»

«In general, the key and reverse conversion processes are much faster
than specifying an equivalent cmp function. This is because cmp is
called multiple times for each list element while key and reverse touch
each element only once.»

When sorting something, one needs to specify a order. The easiest way
is to simply list all the elements as a sequence. That way, their order
is clearly laid out. However, this is in general not feasible and
impractical. Therefore, we devised a mathematically condensed way to
specify the order, by defining a function f(x,y) that can take any two
elements and tell us which one comes first. This, is the gist of
sorting a list in any programing language.

The ordering function, being a mathematically condensed way of
specifying the order, has some constraints. For example, the function
should not tell us x < y and y < x. (For a complete list of these
constraints, see http://xahlee.org/perl-python/sort_list.html )

With this ordering function, it is all sort needed to sort a list.
Anything more is interface complexity.

The optional parameters “key†and “reverse†in Python's sort
method is a interface complexity. What happened here is that a compiler
optimization problem is evaded by moving it into the language syntax
for programers to worry about. If the programer does not use the
“key†syntax when sorting a large matrix (provided that he knew in
advance of the list to be sorted or the ordering function), then he is
penalized by a severe inefficiency by a order of magnitude of execution
time.

This situation, of moving compiler problems to the syntax surface is
common in imperative languages.

«Changed in version 2.3: Support for None as an equivalent to omitting
cmp was added.»

This is a epitome of catering towards morons. “myList.sort()†is
perfect but Pythoners had to add “myList.sort(None)†interface
complexity just because idiots need it.

The motivation here is simple: a explicit “None†gives coding
monkeys a direct sensory input of the fact that “there is no
comparison functionâ€. This is like the double negative in black
English “I ain't no gonna do it!â€. Logically, “None†is not
even correct and leads to bad thinking. What really should be stated in
the doc, is that “the default ordering function to sort() is the
‘cmp’ function.â€.

«Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary
grade).»

One is quite surprised to read this. For about a decade of a language's
existence, its sort functionality is not smart enough to preserve
order?? A sort that preserves original order isn't something difficult
to implement. What we have here is sloppiness and poor quality common
in OpenSource projects.

Also note the extreme low quality of the writing. It employes the
jargon “stable sort†then proceed to explain what it is, and the
latch on of “multiple passes†and the mysterious “by department,
by salaryâ€.

Here's a suggested rewrite: “Since Python 2.3, the result of sort()
no longer rearrange elements where the comparison function returns
0.â€
-----------
This post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html

Xah
(e-mail address removed)
∑ http://xahlee.org/


omg!!! wow!!! after reading this i feel like i just stepped in to some bizarro
world. this entire posting is like something you would read in the onion.
unfortunately this posting has just enough real words mixed with BS that a
newbie might actually fall for this stuff. i think mr. xah has published enough
python satire by now that we could probably make a nice bathroom reading book
"python untechgeeked". sorry folks, i'm still laughing... i just don't get how
someone can go on and on and on, day after day, month after month writing this
stuff that is so completely off base that you wonder if he's looking at the same
python as we are. mr. xah, why do you spend so much time agonizing over a
language you obviously don't like. why is python so important to you that you
are willing to waste so much of your life on this? there is obviously something
else i'm not seeing here. is there a pschologist in the house? can someone
explain xah to me? is he clinically depressed? suicidal? does he have signs of
a serial murderer? does he need a girl friend and a social life? or maybe just
take a yoga class? could these ramblings of his simply be on days that he
doesn't take his medication? i'm starting to think he is not simply just a
troll. i think there might be something seriously wrong with him and this is
just his way of asking for help. maybe i'm wrong, there was just something is
his last writings that made think he's hurting inside.
 
B

Bryan

mr. xah... would you be willing to give a lecture at pycon 2006? i'm sure you
would draw a huge crowd and a lot of people would like to meet you in person...

thanks.
 
G

Guest

Bryan said:
mr. xah... would you be willing to give a lecture at pycon 2006? i'm
sure you would draw a huge crowd and a lot of people would like to meet
you in person...

thanks.

I think that would be a highly un-pythonesque crowd. Python isn't much
in the sense of limitations, but that crowd probably needs to be limited
in one way or another, like "only 2 rotten fruits per person" or similar.
 
P

Piet van Oostrum

Bryan said:
B> omg!!! wow!!! after reading this i feel like i just stepped in to some
B> bizarro world.

So why do you have to repeat the whole thing?? I have kill-filed XL, and
now you put the message in my face. Please don't react to this drivel.
 
X

Xah Lee

Sorting in Perl

In Perl, to sort a list, do like this:

@li=(1,9,2,3);
@li2 = sort {$a <=> $b} @li;
print join(' ', @li2);


In Perl, sort is a function, not some Object Oriented thing. It returns
the sorted result as another list. This is very simple and nice.

It works like this: sort takes the form “sort {...} @myListâ€.
Inside the enclosing braces is the body of the ordering function, where
variables $a and $b inside are predefined by the language to represent
two elements in the list. The operator “<=>†returns -1 if left
element is less than right element. If equal, it returns 0, else 1. It
is equivalent to Python's “cmp†function.

Perl provides another comparison operator “cmpâ€, which treats its
operands as strings. So, for example:

print "3" <=> "20"; # prints -1
print "3" cmp "20"; # prints 1


In Perl, numbers and strings are mutually automatically converted if
needed.

Another form of sort is “sort orderFunctionName @listâ€, which uses
a function name in place of the comparison block

Just for completeness, let's define a Perl equivalent of a Python
example above.

@li=(
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
);

# sorts a list of strings by their embedded number
@li2 = sort { ($a=~m/(\d+)/)[0] <=> ($b=~m/(\d+)/)[0];} @li;
print join(' ', @li2); # prints web7-s.jpg my23i.jpg fris88large.jpg
my283.jpg


Note, that Perl also has pure functions (lambda) construct. In Perl, a
pure function is simply done as “def {...}â€, and applied to
argument by the form “pureFunction->(args)â€. Unlike Python, Perl
has no limitation in its pure function construct. Because of this, Perl
supports a limited form of functional programing. Here's a simple
example:

$result = sub($) {$_[0]+1}->(3);
print $result; # prints 4
# a value obtained by a pure function that adds 1 to its argument,
# applied to a argument of 3.


Perl, like Python, whose compiler is not very smart. In sorting, the
ordering function is called unnecessarily repetitiously. Like Python,
Perlers have sought means to avoid this penalty. If the programer knew
in advance that his matrix is huge or knew in advance his ordering
function, then he can code his sort by writing it using a very complex
workaround.:

Here's how this work around works: Suppose you want to sort a huge list
with a expensive ordering function. If you simply do “sort
orderingFunction @myListâ€, then Perl is going to call your
orderingFunction wantonly, and you lose. To beat Perl, you first
generate a copy newList, whose elements are pairs (x,y), where x is
from the original list, and y is the sorting key extracted from x.
Then, you call Perl's sort function on this newList and using a
ordering function based on the second list element (the y). Once this
newList is sorted, you construct your original list by extracting the
first element from this sorted newList. Here's the code example:

# the following two lines are equivalent
@li2 = sort { ($a=~m/(\d+)/)[0] <=> ($b=~m/(\d+)/)[0];} @li;
@li2 = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_,
($_=~m/(\d+)/)[0] ] } @li;


In the above Perl code, the part “map { [ $_, ($_=~m/(\d+)/)[0] ] }
@li;†generates the intermediate newList. Then, sort is applied to
it, then, another map “map { $_->[0] }†gets the original list
sorted.

In this way, the cost of the internals of the ordering function is
avoided. (it runs on each list element once) However, your huge list is
copied 1 extra time. So, there are pros and cons. Because this work
around is very complex in both its semantics and syntax, it has
acquired a coolness factor among Perl coders, and is given the name
Schwartzian Transform.

It is interesting to note what compiler flaws can do to imperative
languages and its people. In Python, the language syntax is tainted. In
Perl, a complex construct is invented. In both camps, the basic
mathematics of sorting and its implementation aspects are completely
belied.

For the official doc of Perl's sort, type on the command line:
“perldoc -f sortâ€.
---------
this post is archived at:
http://xahlee.org/perl-python/sort_list.html

Xah
(e-mail address removed)
∑ http://xahlee.org/
 
G

Guest

Xah said:
Sorting in Perl

In Perl, to sort a list, do like this:

@li=(1,9,2,3);
@li2 = sort {$a <=> $b} @li;
print join(' ', @li2);


In Perl, sort is a function, not some Object Oriented thing. It returns
the sorted result as another list. This is very simple and nice.

Like the sorted function in Python ?

li2 = sorted(li)

you can also specify a key and a cmp function if you need to.

In this way, the cost of the internals of the ordering function is
avoided. (it runs on each list element once) However, your huge list is
copied 1 extra time. So, there are pros and cons. Because this work
around is very complex in both its semantics and syntax, it has
acquired a coolness factor among Perl coders, and is given the name
Schwartzian Transform.

That's about the worst explanation I've seen of Scwartzian Transform
anywhere. Why deviate from what *every other source* does? Give an
example, like the typical one of sorting filenames based on the size of
the file they refer to. If you get the size in the comparison function,
you can potentially get the size (which is a rather costly operation)
many times for each file. Instead, you prefetch it once for each
filename and store that in the list, and use the size as the key for
sorting.

So while your explanation is technically correct, you lost some of the
explaining factors. In other words, using Google lets anyone wanting to
know about "Schwartzian Transform" find *much* better explanations than
your mumbo jumbo.

But that's no surprise, you're Xah Lee anyway, loosing out is your
middle name.

BTW, this way of doing the sort is nothing special for Perl, that
construct can be seen many places simply because it does not fight any
of Perls "problems" with sorting, instead it overcomes a common problem
with sorting in any language.
It is interesting to note what compiler flaws can do to imperative
languages and its people. In Python, the language syntax is tainted. In

tainted how?
Perl, a complex construct is invented. In both camps, the basic

invented how?
mathematics of sorting and its implementation aspects are completely
belied.

belied how?

It's interesting to note that these "fact posts" of yours are nothing
bud badly worded opinions.
 
X

Xah Lee

Lasse said:
Like the sorted function in Python ?

li2 = sorted(li)

you can also specify a key and a cmp function if you need to.

Thanks. I didn't know there's also a sort function in Python (2.4),
besides the method. (i've mentioned your name as acknowledgement at my
website essay)

The Python doc really should mention it at the place where the sort
method is documented.

Xah
(e-mail address removed)
∑ http://xahlee.org/
 
D

Diez B. Roggisch

In Perl, sort is a function, not some Object Oriented thing. It returns
the sorted result as another list. This is very simple and nice.

And sometimes exteremely stupid - if your list is large, and making a
copy just form sorting it (when you don't have to keep a referenece to
the old list) is highly inefficient.

Calling this stupid tells us just a little bit more of your total lack
of programming fundamentals.

Not to mention that writing a helper function

def sorted(l):
l = l[:]
l.sort()
return l

to have sorted in versions below 2.4 isn't exactly "magic." But hey,
we're talking about Xah Lee, a guy who believes his hair could have
saved the world [1]....


Diez

[1] http://xahlee.org/PageTwo_dir/Personal_dir/mi_pixra.html
 
T

Tim Roberts

Marcin 'Qrczak' Kowalczyk said:
This entry is obsolete: it should mention the 'key' option of the
standard sort method.

It should mention it, but not necessarily recommend it.

I haven't run the numbers in Python, but in Perl, the undecorated sort is
so well-optimized that the Schwartzian transform is almost always faster
than passing a custom comparator to the sort function.
 
F

Fredrik Lundh

Tim said:
It should mention it, but not necessarily recommend it.

I haven't run the numbers in Python, but in Perl, the undecorated sort is
so well-optimized that the Schwartzian transform is almost always faster
than passing a custom comparator to the sort function.

the "key" option is used for decoration. from the documentation:

key specifies a function of one argument that is used to extract
a comparison key from each list element: "key=str.lower"

In general, the key and reverse conversion processes are
much faster than specifying an equivalent cmp function. This
is because cmp is called multiple times for each list element
while key and reverse touch each element only once.

</F>
 
X

Xah Lee

Bryan said:
mr. xah... would you be willing to give a lecture at pycon 2006? i'm sure you
would draw a huge crowd and a lot of people would like to meet you in person...

thanks.

I'd be delight to.

My requirements are: 1 cup of fat-free milk, free, and free pizza.

Xah
(e-mail address removed)
∑ http://xahlee.org/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top