Velocity Reviews > Perl-Python-a-Day: Sorting

# Perl-Python-a-Day: Sorting

Xah Lee
Guest
Posts: n/a

 10-10-2005
Sort a List

Xah Lee, 200510

In this page, we show how to sort a list in Python & Perl and also
discuss some math of sort.

To sort a list in Python, use the “sort” method. For example:

li=[1,9,2,3];
li.sort();
print li;

Note that sort is a method, and the list is changed in place.

Suppose you have a matrix, and you want to sort by second column.
Example Solution:

li=[[2,6],[1,3],[5,4]]
li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

The line “li.sort(lambda x, y: cmp(x[1],y[1]))” can also be written
as “li.sort(cmp=lambda x, y: cmp(x[1],y[1]))”

The argument to sort is a function of two arguments, that returns -1,
0, 1. This function is a decision function that tells sort() how to
decide the order of any two elements in your list. If the first
argument is “less” then second argument, the function should return
-1. If equal, then 0. Else, 1.

Here's a more complex example. Suppose you have a list of strings.

'my283.jpg'
'my23i.jpg'
'web7-s.jpg'
'fris88large.jpg'
....

You want to sort them by the number embedded in them. What you have to
do, is to provide sort() method a function, that takes two strings, and
compares the integer inside the string. Here's the solution:

li=[
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
]

def myComp (x,y):
import re
def getNum(str): return float(re.findall(r'\d+',str)[0])
return cmp(getNum(x),getNum(y))

li.sort(myComp)
print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg',
'my283.jpg']

Here, we defined a function myComp to tell sort about the ordering.
Normally, one would use the “lambda” construct, but Python's lambda
construct can only represent the simplest functions.

In general, the function f used to determine the order of any two
element must satisfy some constraints:

• f(a,a)==0
• if f(a,b)==0 then f(b,a)==0
• if f(a,b)==0 and f(b,c)==0, then f(a,c)==0.
• if f(a,b)==-1 and f(b,c)==-1, then f(a,c)==-1.
• if f(a,b)==-1, then f(b,a)==1.

If the comparison function does not behave as the above, then it is not
consistent, meaning that the result “ordered” list is may actually
be different depending how the language happens to implement sort.

The significance of all these is that in real software you may want to
sort a list of non-simple entities by a specialized ordering. For
example, you may want to sort a list of polygonal surfaces in 3D space,
for particular reasons in implementing some computer graphics features.
Say, you want to sort these polygons by their spacial orientations. It
ordering is important. Otherwise, you might have a bewildering result
yet unable to locate any flaws in your code.

Python's “sort” method's optional parameters: “key” and
“reverse”

Most of the time, sorting is done for a list of atomic element such as
[3,2,4]. This is simply done by myList.sort() without any argument.
Other than simple list, sort is frequently used on matrixes (e.g.
[[2,6],[1,3],[5,4]]). For matrixes, almost always a particular column
is used for the basis of ordering. For example, if we want to sort by
second column, we do: “li.sort(lambda x, y: cmp(x[1],y[1]))”. Since
this is frequently used, Python provides a somewhat shorter syntax for
it, by specifying the column used as the ordering “key”. For
example:

li=[[2,6],[1,3],[5,4]]
li.sort(key=lambda x[1] ) # is equivalent to the following
#li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

Because Python's implementation is not very refined , this specialized
syntax is actually much speedier than the general form “lambda x, y:
cmp(x[1],y[1])”. It is a burden on the programer to always use the
“key” syntax idiosyncrasy if he is sorting a large matrix.

Another idiosyncratic provision is the optional “reverse” argument.
This parameter is somewhat necessary when using the “key”
parameter. One can reverse the ordering by using the “reverse”
keyword as a argument to sort. Example:

The following are equivalent:

li.sort(key=lambda x[1], reverse=True )
li.sort(lambda x, y: cmp(x[1],y[1]), reverse=True)
li.sort(lambda x, y: cmp(y[1],x[1]))

The official doc on Python's sort method is at (bottom):
http://python.org/doc/2.4/lib/typesseq-mutable.html

Sorting in Perl

(to be posted in a couple of days)

This post is archived at:
http://xahlee.org/perl-python/sort_list.html

Xah
http://www.velocityreviews.com/forums/(E-Mail Removed)
http://xahlee.org/

Marcin 'Qrczak' Kowalczyk
Guest
Posts: n/a

 10-10-2005
Followup-To: comp.lang.scheme

"Xah Lee" <(E-Mail Removed)> writes:

> Since this is frequently used, Python provides a somewhat shorter
> syntax for it, by specifying the column used as the ordering “key”.

[...]
> Because Python's implementation is not very refined , this specialized
> syntax is actually much speedier than the general form “lambda x, y:
> cmp(x[1],y[1])”. It is a burden on the programer to always use the
> “key” syntax idiosyncrasy if he is sorting a large matrix.

It's not only clearer for a human, but also faster in all good
implementations of all languages which support that, except when the
ordering function is very simple. It's called Schwartzian transform
and I wish more language designers and programmers knew about it.

http://en.wikipedia.org/wiki/Schwartzian_transform

I urge future SRFI authors to include it. The withdrawn SRFI-32 for
sorting didn't do that, and I can't find any other SRFI which deals
with sorting.

--
__("< Marcin Kowalczyk
\__/ (E-Mail Removed)
^^ http://qrnik.knm.org.pl/~qrczak/

Ulrich Hobelmann
Guest
Posts: n/a

 10-10-2005
Xah Lee wrote:
> To sort a list in Python, use the “sort” method. For example:
>
> li=[1,9,2,3];
> li.sort();
> print li;

Likewise in Common Lisp. In Scheme there are probably packages for that
as well. My apologies for not being very fluent anymore.

CL-USER> (setf list (sort '(1 9 2 3) #'<)) ; input
(1 2 3 9) ; output

The second argument is mandatory too (comparison function).

> Note that sort is a method, and the list is changed in place.

Same here. To be safe, assign the result to "list".

> Suppose you have a matrix, and you want to sort by second column.
> Example Solution:
>
> li=[[2,6],[1,3],[5,4]]
> li.sort(lambda x, y: cmp(x[1],y[1]))
> print li; # prints [[1, 3], [5, 4], [2, 6]]

CL-USER> (setf list (sort '((2 6) (1 3) (5 4))
#'(lambda (x y) (< (second x) (second y)))))
((1 3) (5 4) (2 6)) ; output

> The argument to sort is a function of two arguments, that returns -1,
> 0, 1. This function is a decision function that tells sort() how to
> decide the order of any two elements in your list. If the first
> argument is “less” then second argument, the function should return
> -1. If equal, then 0. Else, 1.

In CL you only need a smaller-than function. I guess if elements are
"equal", they don't need sorting anyway.

> li=[
> 'my283.jpg',
> 'my23i.jpg',
> 'web7-s.jpg',
> 'fris88large.jpg',
> ]

CL-USER> (setf list '("my283.jpg" "my23i.jpg" "web7-s.jpg"
"fris88large.jpg"))

> def myComp (x,y):
> import re
> def getNum(str): return float(re.findall(r'\d+',str)[0])
> return cmp(getNum(x),getNum(y))

CL-USER> (defun my-comp (x y)
(flet ((getnum (s)
(parse-integer s :start (position-if #'digit-char-p s)
:junk-allowed t)))
(< (getnum x) (getnum y))))

> li.sort(myComp)
> print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg',
> 'my283.jpg']

CL-USER> (setf list (sort list #'my-comp))
("web7-s.jpg" "my23i.jpg" "fris88large.jpg" "my283.jpg") ; output

> li=[[2,6],[1,3],[5,4]]
> li.sort(key=lambda x[1] ) # is equivalent to the following
> #li.sort(lambda x, y: cmp(x[1],y[1]))
> print li; # prints [[1, 3], [5, 4], [2, 6]]

CL-USER> (setf list (sort '((2 6) (1 3) (5 4)) #'< :key #'second))
((1 3) (5 4) (2 6)) ; output

Here some people might jump in and say "lists might be more readable
than vectors, but lists are slow."
If they are slow for your data set, just use vectors instead

--
State, the new religion from the friendly guys who brought you fascism.

Pascal Costanza
Guest
Posts: n/a

 10-10-2005
Ulrich Hobelmann wrote:
> Xah Lee wrote:
>
>> To sort a list in Python, use the “sort” method. For example:
>>
>> li=[1,9,2,3];
>> li.sort();
>> print li;

>
> Likewise in Common Lisp. In Scheme there are probably packages for that
> as well. My apologies for not being very fluent anymore.
>
> CL-USER> (setf list (sort '(1 9 2 3) #'<)) ; input
> (1 2 3 9) ; output

Careful. Common Lisp's sort function is specified to be destructive, so
you shouldn't use it on literal constants. So don't say (sort '(1 9 2 3)
....), say (sort (list 1 9 2 3) ...), etc.

Pascal

--
OOPSLA'05 tutorial on generic functions & the CLOS Metaobject Protocol
++++ see http://p-cos.net/oopsla05-tutorial.html for more details ++++

Xah Lee
Guest
Posts: n/a

 10-11-2005
Python Doc Problem Example: sort()

Xah Lee, 200503
Exhibit: Incompletion & Imprecision

Python doc “3.6.4 Mutable Sequence Types” at
http://python.org/doc/2.4/lib/typesseq-mutable.html

in which contains the documentation of the “sort” method of a list.
Quote:

«
Operation Result Notes
s.sort([cmp[, key[, reverse]]]) sort the items of s in place (7),
(, (9), (10)

(7) The sort() and reverse() methods modify the list in place for
economy of space when sorting or reversing a large list. To remind you
that they operate by side effect, they don't return the sorted or
reversed list.

( The sort() method takes optional arguments for controlling the
comparisons.

cmp specifies a custom comparison function of two arguments (list
items) which should return a negative, zero or positive number
depending on whether the first argument is considered smaller than,
equal to, or larger than the second argument: "cmp=lambda x,y:
cmp(x.lower(), y.lower())"

key specifies a function of one argument that is used to extract a
comparison key from each list element: "cmp=str.lower"

reverse is a boolean value. If set to True, then the list elements
are sorted as if each comparison were reversed.

In general, the key and reverse conversion processes are much
faster than specifying an equivalent cmp function. This is because cmp
is called multiple times for each list element while key and reverse
touch each element only once.

Changed in version 2.3: Support for None as an equivalent to

Changed in version 2.4: Support for key and reverse was added.

(9) Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary

(10) While a list is being sorted, the effect of attempting to
mutate, or even inspect, the list is undefined. The C implementation of
Python 2.3 and newer makes the list appear empty for the duration, and
raises ValueError if it can detect that the list has been mutated
during a sort.
»

As a piece of documentation, this is a lousy one.

The question Python doc writers need to ask when evaluating this piece
of doc are these:

• can a experienced programer who is expert at several languages but
new to Python, and also have read the official Python tutorial, can he,
read this doc, and know exactly how to use sort with all the options?

• can this piece of documentation be rewritten fairly easily, so that
the answer to the previous question is a resounding yes?

To me, the answers to the above questions are No and Yes. Here are some
issues with the doc:

• In the paragraph about the “key” parameter, the illustration
given is: “cmp=str.lower”. It should be be “key=str.lower”

• This doc lacks examples. One or two examples will help a lot,
especially to less experienced programers. (which comprises the
majority of readers) In particular, it should give a full example of
using the comparison function and one with the “key” parameter.
Examples are particularly needed here because these parameteres are
functions, often with the “lambda” construct. These are unusual and

• This doc fails to mention what happens when the predicate and the
shortcut version conflicts. e.g. “myList.sort(cmp=lambda x,y:
cmp(x[0], y[0]), key=lambda x: str(x[1]) )”

• The syntax notation Python doc have adopted for indicating optional
parameters, does not give a clear view just exactly what combination of
optional parameters can be omitted. The notation: “s.sort([cmp[,
key[, reverse]]])” gives the impression that only trailing arguments
can be omitted, which is not true.

• The doc gives no indication of how to omit a optional arg. Should
it be “nul”, “Null”, 0, or left empty? Since it doesn't give
any examples, doc reader who isn't Python experts is left to guess at
how true/false values are presented in Python.

• On the whole, the way this doc is written does not give a clear
picture of the roles of the supplied options, nor how to use them.

Suggested Quick Remedy: add a example of using the cmp function. And a
example using the “key” function. Add a example of Using one of
them and with reverse. (the examples need not to come with much
explanations. One sentence annotation is better than none.)

Other than that, the way the doc is layed out with a terse table and
run-on footnotes (employed in several places in Python doc) is not
inductive. For a better improvement, there needs to be a overhaul of
the organization and the attitude of the entire doc. The organization
needs to be programing based, as opposed to implementation or computer
science based. (in this regard, one can learn from the Perl folks). As
to attitude, the writing needs to be Python-as-is, as opposed to
computer science framework, as indicated in the early parts of this
critique series.
----------------
This post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html

Xah
(E-Mail Removed)
http://xahlee.org/

Xah Lee
Guest
Posts: n/a

 10-11-2005

Here's further example of Python's extreme low quality of
documentation. In particular, what follows focuses on the bad writing
skill aspect, and comments on some language design and quality issues
of Python.

>From the Official Python documentation of the sort() method, at:

http://python.org/doc/2.4.2/lib/typesseq-mutable.html, Quote:

«The sort() method takes optional arguments for controlling the
comparisons.»

It should be “optional parameter” not “optional argument”.
Their difference is that “parameter” indicates the variable, while
“argument” indicates the actual value.

«... for controlling the comparisons.»

This is a bad writing caused by lack of understanding. No, it doesn't
“control the comparison”. The proper way to say it is that “the
comparison function specifies an order”.

«The sort() and reverse() methods modify the list in place for economy
of space when sorting or reversing a large list. To remind you that
they operate by side effect, they don't return the sorted or reversed
list. »

This is a example of tech-geeking drivel. The sort() and reverse()
methods are just the way they are. Their design and behavior are really
not for some economy or remind programers of something. The Python doc
is bulked with these irrelevant drivels. These littered inanities
dragged down the whole quality and effectiveness of the doc implicitly.

«Changed in version 2.4: Support for key and reverse was added.»

«In general, the key and reverse conversion processes are much faster
than specifying an equivalent cmp function. This is because cmp is
called multiple times for each list element while key and reverse touch
each element only once.»

When sorting something, one needs to specify a order. The easiest way
is to simply list all the elements as a sequence. That way, their order
is clearly laid out. However, this is in general not feasible and
impractical. Therefore, we devised a mathematically condensed way to
specify the order, by defining a function f(x,y) that can take any two
elements and tell us which one comes first. This, is the gist of
sorting a list in any programing language.

The ordering function, being a mathematically condensed way of
specifying the order, has some constraints. For example, the function
should not tell us x < y and y < x. (For a complete list of these
constraints, see http://xahlee.org/perl-python/sort_list.html )

With this ordering function, it is all sort needed to sort a list.
Anything more is interface complexity.

The optional parameters “key” and “reverse” in Python's sort
method is a interface complexity. What happened here is that a compiler
optimization problem is evaded by moving it into the language syntax
for programers to worry about. If the programer does not use the
“key” syntax when sorting a large matrix (provided that he knew in
advance of the list to be sorted or the ordering function), then he is
penalized by a severe inefficiency by a order of magnitude of execution
time.

This situation, of moving compiler problems to the syntax surface is
common in imperative languages.

«Changed in version 2.3: Support for None as an equivalent to omitting

This is a epitome of catering towards morons. “myList.sort()” is
complexity just because idiots need it.

The motivation here is simple: a explicit “None” gives coding
monkeys a direct sensory input of the fact that “there is no
comparison function”. This is like the double negative in black
English “I ain't no gonna do it!”. Logically, “None” is not
even correct and leads to bad thinking. What really should be stated in
the doc, is that “the default ordering function to sort() is the
‘cmp’ function.”.

«Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary

existence, its sort functionality is not smart enough to preserve
order?? A sort that preserves original order isn't something difficult
to implement. What we have here is sloppiness and poor quality common
in OpenSource projects.

Also note the extreme low quality of the writing. It employes the
jargon “stable sort” then proceed to explain what it is, and the
latch on of “multiple passes” and the mysterious “by department,
by salary”.

Here's a suggested rewrite: “Since Python 2.3, the result of sort()
no longer rearrange elements where the comparison function returns
0.”
-----------
This post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html

Xah
(E-Mail Removed)
http://xahlee.org/

Bryan
Guest
Posts: n/a

 10-12-2005
Xah Lee wrote:
>
> Here's further example of Python's extreme low quality of
> documentation. In particular, what follows focuses on the bad writing
> skill aspect, and comments on some language design and quality issues
> of Python.
>
>>From the Official Python documentation of the sort() method, at:

> http://python.org/doc/2.4.2/lib/typesseq-mutable.html, Quote:
>
> «The sort() method takes optional arguments for controlling the
> comparisons.»
>
> It should be “optional parameter” not “optional argument”.
> Their difference is that “parameter” indicates the variable, while
> “argument” indicates the actual value.
>
> «... for controlling the comparisons.»
>
> This is a bad writing caused by lack of understanding. No, it doesn't
> “control the comparison”. The proper way to say it is that “the
> comparison function specifies an order”.
>
> «The sort() and reverse() methods modify the list in place for economy
> of space when sorting or reversing a large list. To remind you that
> they operate by side effect, they don't return the sorted or reversed
> list. »
>
> This is a example of tech-geeking drivel. The sort() and reverse()
> methods are just the way they are. Their design and behavior are really
> not for some economy or remind programers of something. The Python doc
> is bulked with these irrelevant drivels. These littered inanities
> dragged down the whole quality and effectiveness of the doc implicitly.
>
> «Changed in version 2.4: Support for key and reverse was added.»
>
> «In general, the key and reverse conversion processes are much faster
> than specifying an equivalent cmp function. This is because cmp is
> called multiple times for each list element while key and reverse touch
> each element only once.»
>
> When sorting something, one needs to specify a order. The easiest way
> is to simply list all the elements as a sequence. That way, their order
> is clearly laid out. However, this is in general not feasible and
> impractical. Therefore, we devised a mathematically condensed way to
> specify the order, by defining a function f(x,y) that can take any two
> elements and tell us which one comes first. This, is the gist of
> sorting a list in any programing language.
>
> The ordering function, being a mathematically condensed way of
> specifying the order, has some constraints. For example, the function
> should not tell us x < y and y < x. (For a complete list of these
> constraints, see http://xahlee.org/perl-python/sort_list.html )
>
> With this ordering function, it is all sort needed to sort a list.
> Anything more is interface complexity.
>
> The optional parameters “key” and “reverse” in Python's sort
> method is a interface complexity. What happened here is that a compiler
> optimization problem is evaded by moving it into the language syntax
> for programers to worry about. If the programer does not use the
> “key” syntax when sorting a large matrix (provided that he knew in
> advance of the list to be sorted or the ordering function), then he is
> penalized by a severe inefficiency by a order of magnitude of execution
> time.
>
> This situation, of moving compiler problems to the syntax surface is
> common in imperative languages.
>
> «Changed in version 2.3: Support for None as an equivalent to omitting
>
> This is a epitome of catering towards morons. “myList.sort()” is
> complexity just because idiots need it.
>
> The motivation here is simple: a explicit “None” gives coding
> monkeys a direct sensory input of the fact that “there is no
> comparison function”. This is like the double negative in black
> English “I ain't no gonna do it!”. Logically, “None” is not
> even correct and leads to bad thinking. What really should be stated in
> the doc, is that “the default ordering function to sort() is the
> ‘cmp’ function.”.
>
> «Starting with Python 2.3, the sort() method is guaranteed to be
> stable. A sort is stable if it guarantees not to change the relative
> order of elements that compare equal -- this is helpful for sorting in
> multiple passes (for example, sort by department, then by salary
>
> existence, its sort functionality is not smart enough to preserve
> order?? A sort that preserves original order isn't something difficult
> to implement. What we have here is sloppiness and poor quality common
> in OpenSource projects.
>
> Also note the extreme low quality of the writing. It employes the
> jargon “stable sort” then proceed to explain what it is, and the
> latch on of “multiple passes” and the mysterious “by department,
> by salary”.
>
> Here's a suggested rewrite: “Since Python 2.3, the result of sort()
> no longer rearrange elements where the comparison function returns
> 0.”
> -----------
> This post is archived at:
> http://xahlee.org/perl-python/python_doc_sort.html
>
> Xah
> (E-Mail Removed)
> ∑ http://xahlee.org/
>

omg!!! wow!!! after reading this i feel like i just stepped in to some bizarro
world. this entire posting is like something you would read in the onion.
unfortunately this posting has just enough real words mixed with BS that a
newbie might actually fall for this stuff. i think mr. xah has published enough
python satire by now that we could probably make a nice bathroom reading book
"python untechgeeked". sorry folks, i'm still laughing... i just don't get how
someone can go on and on and on, day after day, month after month writing this
stuff that is so completely off base that you wonder if he's looking at the same
python as we are. mr. xah, why do you spend so much time agonizing over a
language you obviously don't like. why is python so important to you that you
are willing to waste so much of your life on this? there is obviously something
else i'm not seeing here. is there a pschologist in the house? can someone
explain xah to me? is he clinically depressed? suicidal? does he have signs of
a serial murderer? does he need a girl friend and a social life? or maybe just
take a yoga class? could these ramblings of his simply be on days that he
doesn't take his medication? i'm starting to think he is not simply just a
troll. i think there might be something seriously wrong with him and this is
just his way of asking for help. maybe i'm wrong, there was just something is
his last writings that made think he's hurting inside.

Bryan
Guest
Posts: n/a

 10-12-2005
mr. xah... would you be willing to give a lecture at pycon 2006? i'm sure you
would draw a huge crowd and a lot of people would like to meet you in person...

thanks.

=?UTF-8?B?TGFzc2UgVsOlZ3PDpnRoZXIgS2FybHNlbg==?=
Guest
Posts: n/a

 10-12-2005
Bryan wrote:
> mr. xah... would you be willing to give a lecture at pycon 2006? i'm
> sure you would draw a huge crowd and a lot of people would like to meet
> you in person...
>
> thanks.
>

I think that would be a highly un-pythonesque crowd. Python isn't much
in the sense of limitations, but that crowd probably needs to be limited
in one way or another, like "only 2 rotten fruits per person" or similar.

--
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
(E-Mail Removed)
PGP KeyID: 0x2A42A1C2

Piet van Oostrum
Guest
Posts: n/a

 10-12-2005
>>>>> Bryan <(E-Mail Removed)> (B) wrote:

>B> omg!!! wow!!! after reading this i feel like i just stepped in to some
>B> bizarro world.

So why do you have to repeat the whole thing?? I have kill-filed XL, and
now you put the message in my face. Please don't react to this drivel.
--
Piet van Oostrum <(E-Mail Removed)>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: (E-Mail Removed)