Velocity Reviews > Perl > Perl-Python-a-Day: Sorting

# Perl-Python-a-Day: Sorting

Xah Lee
Guest
Posts: n/a

 10-10-2005
Sort a List

Xah Lee, 200510

In this page, we show how to sort a list in Python & Perl and also
discuss some math of sort.

To sort a list in Python, use the “sort” method. For example:

li=[1,9,2,3];
li.sort();
print li;

Note that sort is a method, and the list is changed in place.

Suppose you have a matrix, and you want to sort by second column.
Example Solution:

li=[[2,6],[1,3],[5,4]]
li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

The line “li.sort(lambda x, y: cmp(x[1],y[1]))” can also bewritten
as “li.sort(cmp=lambda x, y: cmp(x[1],y[1]))”

The argument to sort is a function of two arguments, that returns -1,
0, 1. This function is a decision function that tells sort() how to
decide the order of any two elements in your list. If the first
argument is “less” then second argument, the function should return
-1. If equal, then 0. Else, 1.

Here's a more complex example. Suppose you have a list of strings.

'my283.jpg'
'my23i.jpg'
'web7-s.jpg'
'fris88large.jpg'
....

You want to sort them by the number embedded in them. What you have to
do, is to provide sort() method a function, that takes two strings, and
compares the integer inside the string. Here's the solution:

li=[
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
]

def myComp (x,y):
import re
def getNum(str): return float(re.findall(r'\d+',str)[0])
return cmp(getNum(x),getNum(y))

li.sort(myComp)
print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg',
'my283.jpg']

Here, we defined a function myComp to tell sort about the ordering.
Normally, one would use the “lambda” construct, but Python's lambda
construct can only represent the simplest functions.

In general, the function f used to determine the order of any two
element must satisfy some constraints:

• f(a,a)==0
• if f(a,b)==0 then f(b,a)==0
• if f(a,b)==0 and f(b,c)==0, then f(a,c)==0.
• if f(a,b)==-1 and f(b,c)==-1, then f(a,c)==-1.
• if f(a,b)==-1, then f(b,a)==1.

If the comparison function does not behave as the above, then it is not
consistent, meaning that the result “ordered” list is may actually
be different depending how the language happens to implement sort.

The significance of all these is that in real software you may want to
sort a list of non-simple entities by a specialized ordering. For
example, you may want to sort a list of polygonal surfaces in 3D space,
for particular reasons in implementing some computer graphics features.
Say, you want to sort these polygons by their spacial orientations. It
ordering is important. Otherwise, you might have a bewildering result
yet unable to locate any flaws in your code.

Python's “sort” method's optional parameters: “key” and
“reverse”

Most of the time, sorting is done for a list of atomic element such as
[3,2,4]. This is simply done by myList.sort() without any argument.
Other than simple list, sort is frequently used on matrixes (e.g.
[[2,6],[1,3],[5,4]]). For matrixes, almost always a particular column
is used for the basis of ordering. For example, if we want to sort by
second column, we do: “li.sort(lambda x, y: cmp(x[1],y[1]))”. Since
this is frequently used, Python provides a somewhat shorter syntax for
it, by specifying the column used as the ordering “key”. For
example:

li=[[2,6],[1,3],[5,4]]
li.sort(key=lambda x[1] ) # is equivalent to the following
#li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

Because Python's implementation is not very refined , this specialized
syntax is actually much speedier than the general form “lambda x, y:
cmp(x[1],y[1])”. It is a burden on the programer to always use the
“key” syntax idiosyncrasy if he is sorting a large matrix.

Another idiosyncratic provision is the optional “reverse” argument.
This parameter is somewhat necessary when using the “key”
parameter. One can reverse the ordering by using the “reverse”
keyword as a argument to sort. Example:

The following are equivalent:

li.sort(key=lambda x[1], reverse=True )
li.sort(lambda x, y: cmp(x[1],y[1]), reverse=True)
li.sort(lambda x, y: cmp(y[1],x[1]))

The official doc on Python's sort method is at (bottom):
http://python.org/doc/2.4/lib/typesseq-mutable.html

Sorting in Perl

(to be posted in a couple of days)

This post is archived at:
http://xahlee.org/perl-python/sort_list.html

Xah
http://www.velocityreviews.com/forums/(E-Mail Removed)
http://xahlee.org/

Marcin 'Qrczak' Kowalczyk
Guest
Posts: n/a

 10-10-2005
Followup-To: comp.lang.scheme

"Xah Lee" <(E-Mail Removed)> writes:

> Since this is frequently used, Python provides a somewhat shorter
> syntax for it, by specifying the column used as the ordering “key”.

[...]
> Because Python's implementation is not very refined , this specialized
> syntax is actually much speedier than the general form “lambda x, y:
> cmp(x[1],y[1])”. It is a burden on the programer to always use the
> “key” syntax idiosyncrasy if he is sorting a large matrix.

It's not only clearer for a human, but also faster in all good
implementations of all languages which support that, except when the
ordering function is very simple. It's called Schwartzian transform
and I wish more language designers and programmers knew about it.

http://en.wikipedia.org/wiki/Schwartzian_transform

I urge future SRFI authors to include it. The withdrawn SRFI-32 for
sorting didn't do that, and I can't find any other SRFI which deals
with sorting.

--
__("< Marcin Kowalczyk
\__/ (E-Mail Removed)
^^ http://qrnik.knm.org.pl/~qrczak/

Ulrich Hobelmann
Guest
Posts: n/a

 10-10-2005
Xah Lee wrote:
> To sort a list in Python, use the “sort” method. For example:
>
> li=[1,9,2,3];
> li.sort();
> print li;

Likewise in Common Lisp. In Scheme there are probably packages for that
as well. My apologies for not being very fluent anymore.

CL-USER> (setf list (sort '(1 9 2 3) #'<)) ; input
(1 2 3 9) ; output

The second argument is mandatory too (comparison function).

> Note that sort is a method, and the list is changed in place.

Same here. To be safe, assign the result to "list".

> Suppose you have a matrix, and you want to sort by second column.
> Example Solution:
>
> li=[[2,6],[1,3],[5,4]]
> li.sort(lambda x, y: cmp(x[1],y[1]))
> print li; # prints [[1, 3], [5, 4], [2, 6]]

CL-USER> (setf list (sort '((2 6) (1 3) (5 4))
#'(lambda (x y) (< (second x) (second y)))))
((1 3) (5 4) (2 6)) ; output

> The argument to sort is a function of two arguments, that returns -1,
> 0, 1. This function is a decision function that tells sort() how to
> decide the order of any two elements in your list. If the first
> argument is “less” then second argument, the function should return
> -1. If equal, then 0. Else, 1.

In CL you only need a smaller-than function. I guess if elements are
"equal", they don't need sorting anyway.

> li=[
> 'my283.jpg',
> 'my23i.jpg',
> 'web7-s.jpg',
> 'fris88large.jpg',
> ]

CL-USER> (setf list '("my283.jpg" "my23i.jpg" "web7-s.jpg"
"fris88large.jpg"))

> def myComp (x,y):
> import re
> def getNum(str): return float(re.findall(r'\d+',str)[0])
> return cmp(getNum(x),getNum(y))

CL-USER> (defun my-comp (x y)
(flet ((getnum (s)
(parse-integer s :start (position-if #'digit-char-p s)
:junk-allowed t)))
(< (getnum x) (getnum y))))

> li.sort(myComp)
> print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg',
> 'my283.jpg']

CL-USER> (setf list (sort list #'my-comp))
("web7-s.jpg" "my23i.jpg" "fris88large.jpg" "my283.jpg") ; output

> li=[[2,6],[1,3],[5,4]]
> li.sort(key=lambda x[1] ) # is equivalent to the following
> #li.sort(lambda x, y: cmp(x[1],y[1]))
> print li; # prints [[1, 3], [5, 4], [2, 6]]

CL-USER> (setf list (sort '((2 6) (1 3) (5 4)) #'< :key #'second))
((1 3) (5 4) (2 6)) ; output

Here some people might jump in and say "lists might be more readable
than vectors, but lists are slow."
If they are slow for your data set, just use vectors instead

--
State, the new religion from the friendly guys who brought you fascism.

Pascal Costanza
Guest
Posts: n/a

 10-10-2005
Ulrich Hobelmann wrote:
> Xah Lee wrote:
>
>> To sort a list in Python, use the “sort” method. For example:
>>
>> li=[1,9,2,3];
>> li.sort();
>> print li;

>
> Likewise in Common Lisp. In Scheme there are probably packages for that
> as well. My apologies for not being very fluent anymore.
>
> CL-USER> (setf list (sort '(1 9 2 3) #'<)) ; input
> (1 2 3 9) ; output

Careful. Common Lisp's sort function is specified to be destructive, so
you shouldn't use it on literal constants. So don't say (sort '(1 9 2 3)
....), say (sort (list 1 9 2 3) ...), etc.

Pascal

--
OOPSLA'05 tutorial on generic functions & the CLOS Metaobject Protocol
++++ see http://p-cos.net/oopsla05-tutorial.html for more details ++++

Xah Lee
Guest
Posts: n/a

 10-11-2005
Python Doc Problem Example: sort()

Xah Lee, 200503
Exhibit: Incompletion & Imprecision

Python doc “3.6.4 Mutable Sequence Types” at
http://python.org/doc/2.4/lib/typesseq-mutable.html

in which contains the documentation of the “sort” method ofa list.
Quote:

«
Operation Result Notes
s.sort([cmp[, key[, reverse]]]) sort the items of s in place (7),
(, (9), (10)

(7) The sort() and reverse() methods modify the list in place for
economy of space when sorting or reversing a large list. To remind you
that they operate by side effect, they don't return the sorted or
reversed list.

( The sort() method takes optional arguments for controlling the
comparisons.

cmp specifies a custom comparison function of two arguments (list
items) which should return a negative, zero or positive number
depending on whether the first argument is considered smaller than,
equal to, or larger than the second argument: "cmp=lambda x,y:
cmp(x.lower(), y.lower())"

key specifies a function of one argument that is used to extract a
comparison key from each list element: "cmp=str.lower"

reverse is a boolean value. If set to True, then the list elements
are sorted as if each comparison were reversed.

In general, the key and reverse conversion processes are much
faster than specifying an equivalent cmp function. This is because cmp
is called multiple times for each list element while key and reverse
touch each element only once.

Changed in version 2.3: Support for None as an equivalent to

Changed in version 2.4: Support for key and reverse was added.

(9) Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary

(10) While a list is being sorted, the effect of attempting to
mutate, or even inspect, the list is undefined. The C implementation of
Python 2.3 and newer makes the list appear empty for the duration, and
raises ValueError if it can detect that the list has been mutated
during a sort.
»

As a piece of documentation, this is a lousy one.

The question Python doc writers need to ask when evaluating this piece
of doc are these:

• can a experienced programer who is expert at several languages but
new to Python, and also have read the official Python tutorial, can he,
read this doc, and know exactly how to use sort with all the options?

• can this piece of documentation be rewritten fairly easily, so that
the answer to the previous question is a resounding yes?

To me, the answers to the above questions are No and Yes. Here are some
issues with the doc:

• In the paragraph about the “key” parameter, the illustration
given is: “cmp=str.lower”. It should be be “key=str.lower”

• This doc lacks examples. One or two examples will help a lot,
especially to less experienced programers. (which comprises the
majority of readers) In particular, it should give a full example of
using the comparison function and one with the “key” parameter.
Examples are particularly needed here because these parameteres are
functions, often with the “lambda” construct. These are unusual and

• This doc fails to mention what happens when the predicate and the
shortcut version conflicts. e.g. “myList.sort(cmp=lambda x,y:
cmp(x[0], y[0]), key=lambda x: str(x[1]) )”

• The syntax notation Python doc have adopted for indicating optional
parameters, does not give a clear view just exactly what combination of
optional parameters can be omitted. The notation: “s.sort([cmp[,
key[, reverse]]])” gives the impression that only trailing arguments
can be omitted, which is not true.

• The doc gives no indication of how to omit a optional arg. Should
it be “nul”, “Null”, 0, or left empty? Since it doesn't give
any examples, doc reader who isn't Python experts is left to guess at
how true/false values are presented in Python.

• On the whole, the way this doc is written does not give a clear
picture of the roles of the supplied options, nor how to use them.

Suggested Quick Remedy: add a example of using the cmp function. And a
example using the “key” function. Add a example of Using one of
them and with reverse. (the examples need not to come with much
explanations. One sentence annotation is better than none.)

Other than that, the way the doc is layed out with a terse table and
run-on footnotes (employed in several places in Python doc) is not
inductive. For a better improvement, there needs to be a overhaul of
the organization and the attitude of the entire doc. The organization
needs to be programing based, as opposed to implementation or computer
science based. (in this regard, one can learn from the Perl folks). As
to attitude, the writing needs to be Python-as-is, as opposed to
computer science framework, as indicated in the early parts of this
critique series.
----------------
This post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html

Xah
(E-Mail Removed)
http://xahlee.org/

Xah Lee
Guest
Posts: n/a

 10-11-2005

Here's further example of Python's extreme low quality of
documentation. In particular, what follows focuses on the bad writing
skill aspect, and comments on some language design and quality issues
of Python.

>From the Official Python documentation of the sort() method, at:

http://python.org/doc/2.4.2/lib/typesseq-mutable.html, Quote:

«The sort() method takes optional arguments for controlling the
comparisons.»

It should be “optional parameter” not “optional argument”.
Their difference is that “parameter” indicates the variable, while
“argument” indicates the actual value.

«... for controlling the comparisons.»

This is a bad writing caused by lack of understanding. No, it doesn't
“control the comparison”. The proper way to say it is that “the
comparison function specifies an order”.

«The sort() and reverse() methods modify the list in place for economy
of space when sorting or reversing a large list. To remind you that
they operate by side effect, they don't return the sorted or reversed
list. »

This is a example of tech-geeking drivel. The sort() and reverse()
methods are just the way they are. Their design and behavior are really
not for some economy or remind programers of something. The Python doc
is bulked with these irrelevant drivels. These littered inanities
dragged down the whole quality and effectiveness of the doc implicitly.

«Changed in version 2.4: Support for key and reverse was added.»

«In general, the key and reverse conversion processes are much faster
than specifying an equivalent cmp function. This is because cmp is
called multiple times for each list element while key and reverse touch
each element only once.»

When sorting something, one needs to specify a order. The easiest way
is to simply list all the elements as a sequence. That way, their order
is clearly laid out. However, this is in general not feasible and
impractical. Therefore, we devised a mathematically condensed way to
specify the order, by defining a function f(x,y) that can take any two
elements and tell us which one comes first. This, is the gist of
sorting a list in any programing language.

The ordering function, being a mathematically condensed way of
specifying the order, has some constraints. For example, the function
should not tell us x < y and y < x. (For a complete list of these
constraints, see http://xahlee.org/perl-python/sort_list.html )

With this ordering function, it is all sort needed to sort a list.
Anything more is interface complexity.

The optional parameters “key” and “reverse”in Python's sort
method is a interface complexity. What happened here is that a compiler
optimization problem is evaded by moving it into the language syntax
for programers to worry about. If the programer does not use the
“key” syntax when sorting a large matrix (provided that he knew in
advance of the list to be sorted or the ordering function), then he is
penalized by a severe inefficiency by a order of magnitude of execution
time.

This situation, of moving compiler problems to the syntax surface is
common in imperative languages.

«Changed in version 2.3: Support for None as an equivalent to omitting

This is a epitome of catering towards morons. “myList.sort()” is
complexity just because idiots need it.

The motivation here is simple: a explicit “None” gives coding
monkeys a direct sensory input of the fact that “there is no
comparison function”. This is like the double negative in black
English “I ain't no gonna do it!”. Logically, “None” is not
even correct and leads to bad thinking. What really should be stated in
the doc, is that “the default ordering function to sort() is the
‘cmp’ function.”.

«Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary

existence, its sort functionality is not smart enough to preserve
order?? A sort that preserves original order isn't something difficult
to implement. What we have here is sloppiness and poor quality common
in OpenSource projects.

Also note the extreme low quality of the writing. It employes the
jargon “stable sort” then proceed to explain what it is, and the
latch on of “multiple passes” and the mysterious “by department,
by salary”.

Here's a suggested rewrite: “Since Python 2.3, the result of sort()
no longer rearrange elements where the comparison function returns
0.”
-----------
This post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html

Xah
(E-Mail Removed)
http://xahlee.org/

Xah Lee
Guest
Posts: n/a

 10-13-2005
Sorting in Perl

In Perl, to sort a list, do like this:

@li=(1,9,2,3);
@li2 = sort {\$a <=> \$b} @li;
print join(' ', @li2);

In Perl, sort is a function, not some Object Oriented thing. It returns
the sorted result as another list. This is very simple and nice.

It works like this: sort takes the form “sort {...} @myList”.
Inside the enclosing braces is the body of the ordering function, where
variables \$a and \$b inside are predefined by the language to represent
two elements in the list. The operator “<=>” returns -1 if left
element is less than right element. If equal, it returns 0, else 1. It
is equivalent to Python's “cmp” function.

Perl provides another comparison operator “cmp”, which treats its
operands as strings. So, for example:

print "3" <=> "20"; # prints -1
print "3" cmp "20"; # prints 1

In Perl, numbers and strings are mutually automatically converted if
needed.

Another form of sort is “sort orderFunctionName @list”, which uses
a function name in place of the comparison block

Just for completeness, let's define a Perl equivalent of a Python
example above.

@li=(
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
);

# sorts a list of strings by their embedded number
@li2 = sort { (\$a=~m/(\d+)/)[0] <=> (\$b=~m/(\d+)/)[0];} @li;
print join(' ', @li2); # prints web7-s.jpg my23i.jpg fris88large.jpg
my283.jpg

Note, that Perl also has pure functions (lambda) construct. In Perl, a
pure function is simply done as “def {...}”, and applied to
argument by the form “pureFunction->(args)”. Unlike Python,Perl
has no limitation in its pure function construct. Because of this, Perl
supports a limited form of functional programing. Here's a simple
example:

\$result = sub(\$) {\$_[0]+1}->(3);
print \$result; # prints 4
# a value obtained by a pure function that adds 1 to its argument,
# applied to a argument of 3.

Perl, like Python, whose compiler is not very smart. In sorting, the
ordering function is called unnecessarily repetitiously. Like Python,
Perlers have sought means to avoid this penalty. If the programer knew
in advance that his matrix is huge or knew in advance his ordering
function, then he can code his sort by writing it using a very complex
workaround.:

Here's how this work around works: Suppose you want to sort a huge list
with a expensive ordering function. If you simply do “sort
orderingFunction @myList”, then Perl is going to call your
orderingFunction wantonly, and you lose. To beat Perl, you first
generate a copy newList, whose elements are pairs (x,y), where x is
from the original list, and y is the sorting key extracted from x.
Then, you call Perl's sort function on this newList and using a
ordering function based on the second list element (the y). Once this
newList is sorted, you construct your original list by extracting the
first element from this sorted newList. Here's the code example:

# the following two lines are equivalent
@li2 = sort { (\$a=~m/(\d+)/)[0] <=> (\$b=~m/(\d+)/)[0];} @li;
@li2 = map { \$_->[0] } sort { \$a->[1] <=> \$b->[1] } map { [ \$_,
(\$_=~m/(\d+)/)[0] ] } @li;

In the above Perl code, the part “map { [ \$_, (\$_=~m/(\d+)/)[0] ]}
@li;” generates the intermediate newList. Then, sort is applied to
it, then, another map “map { \$_->[0] }” gets the original list
sorted.

In this way, the cost of the internals of the ordering function is
avoided. (it runs on each list element once) However, your huge list is
copied 1 extra time. So, there are pros and cons. Because this work
around is very complex in both its semantics and syntax, it has
acquired a coolness factor among Perl coders, and is given the name
Schwartzian Transform.

It is interesting to note what compiler flaws can do to imperative
languages and its people. In Python, the language syntax is tainted. In
Perl, a complex construct is invented. In both camps, the basic
mathematics of sorting and its implementation aspects are completely
belied.

For the official doc of Perl's sort, type on the command line:
“perldoc -f sort”.
---------
this post is archived at:
http://xahlee.org/perl-python/sort_list.html

Xah
(E-Mail Removed)
http://xahlee.org/

Lasse Vågsæther Karlsen
Guest
Posts: n/a

 10-13-2005
Xah Lee wrote:
> Sorting in Perl
>
> In Perl, to sort a list, do like this:
>
> @li=(1,9,2,3);
> @li2 = sort {\$a <=> \$b} @li;
> print join(' ', @li2);
>
>
> In Perl, sort is a function, not some Object Oriented thing. It returns
> the sorted result as another list. This is very simple and nice.

Like the sorted function in Python ?

li2 = sorted(li)

you can also specify a key and a cmp function if you need to.

<snip>
> In this way, the cost of the internals of the ordering function is
> avoided. (it runs on each list element once) However, your huge list is
> copied 1 extra time. So, there are pros and cons. Because this work
> around is very complex in both its semantics and syntax, it has
> acquired a coolness factor among Perl coders, and is given the name
> Schwartzian Transform.

That's about the worst explanation I've seen of Scwartzian Transform
anywhere. Why deviate from what *every other source* does? Give an
example, like the typical one of sorting filenames based on the size of
the file they refer to. If you get the size in the comparison function,
you can potentially get the size (which is a rather costly operation)
many times for each file. Instead, you prefetch it once for each
filename and store that in the list, and use the size as the key for
sorting.

So while your explanation is technically correct, you lost some of the
explaining factors. In other words, using Google lets anyone wanting to
know about "Schwartzian Transform" find *much* better explanations than

But that's no surprise, you're Xah Lee anyway, loosing out is your
middle name.

BTW, this way of doing the sort is nothing special for Perl, that
construct can be seen many places simply because it does not fight any
of Perls "problems" with sorting, instead it overcomes a common problem
with sorting in any language.

>
> It is interesting to note what compiler flaws can do to imperative
> languages and its people. In Python, the language syntax is tainted. In

tainted how?

> Perl, a complex construct is invented. In both camps, the basic

invented how?

> mathematics of sorting and its implementation aspects are completely
> belied.

belied how?

It's interesting to note that these "fact posts" of yours are nothing

--
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
(E-Mail Removed)
PGP KeyID: 0x2A42A1C2

Xah Lee
Guest
Posts: n/a

 10-13-2005

Lasse Vågsæther Karlsen wrote:

> Like the sorted function in Python ?
>
> li2 = sorted(li)
>
> you can also specify a key and a cmp function if you need to.

Thanks. I didn't know there's also a sort function in Python (2.4),
besides the method. (i've mentioned your name as acknowledgement at my
website essay)

The Python doc really should mention it at the place where the sort
method is documented.

Xah
(E-Mail Removed)
http://xahlee.org/

Diez B. Roggisch
Guest
Posts: n/a

 10-13-2005
> In Perl, sort is a function, not some Object Oriented thing. It returns
> the sorted result as another list. This is very simple and nice.

And sometimes exteremely stupid - if your list is large, and making a
copy just form sorting it (when you don't have to keep a referenece to
the old list) is highly inefficient.

Calling this stupid tells us just a little bit more of your total lack
of programming fundamentals.

Not to mention that writing a helper function

def sorted(l):
l = l[:]
l.sort()
return l

to have sorted in versions below 2.4 isn't exactly "magic." But hey,
we're talking about Xah Lee, a guy who believes his hair could have
saved the world [1]....

Diez

[1] http://xahlee.org/PageTwo_dir/Person.../mi_pixra.html