Velocity Reviews > Perl > A Sort Optimization Technique: decorate-sort-dedecorate

# A Sort Optimization Technique: decorate-sort-dedecorate

xahlee@gmail.com
Guest
Posts: n/a

 08-27-2006
Last year, i've posted a tutorial and commentary about Python and
Perl's sort function. (http://xahlee.org/perl-python/sort_list.html)

In that article, i discussed a technique known among juvenile Perlers
as the Schwartzian Transform, which also manifests in Python as its
“key” optional parameter.

Here, i give a more detailed account on why and how of this construct.

----

Language and Sort Optimization: decorate-sort-dedecorate

There are many algorithms for sorting. Each language can chose which to
use. See wikipedia for a detailed list and explanation:
Sorting_algorithm↗ .

However, does not matter which algorithm is used, ultimately it will
need the order-deciding function on the items to be sorted. Suppose
Various algorithms will try to minimize the number of times F is
called, but nevertheless, F will be applied to a particular element in
the list multiple times. For example, F(a,b) may be called to see which
of “a” or “b” comes first. Then, later the algorithm might need
to call F(m,a), or F(a,z). The point here is that, F will be called
many times on arbitrary two items in your list, even if one of the
element has been compared to others before.

Now suppose, you are sorting some polygons in 3D space, or personal
records by the person's address's distance from a location, or sorting
matrixes by their eigen-values in some math application, or ordering
files by number of occurrences of some text in the file.

In general, when you define your decision function F(x,y), you will
need to extract some property from the elements to be sorted. For
example, when sorting points in space by a criterion of distance, one
will need to compute the distance for the point. When sorting personal
records from database by the person's location, the decision function
will need to retrieve the person's address from the database, then find
the coordinate of that address, that compute the distance from there to
a given coordinate. In sorting matrixes in math by eigen-values, the
order-decision will first compute the eigen-value of the matrix. A
common theme from all of the above is that they all need to do some
non-trivial computation on each element.

As we can see, the order-decision function F may need to do some
expensive computation on each element first, and this is almost always
the case when sorting elements other than simple numbers. Also, we know
that a sorting algorithm will need to call F(x,y) many times, even if
one of x or y has been compared to others before. So, this may result
high inefficiency. For example, you need to order people by their
is first retrieved from a database across a network, the coordinate of
her address is looked up in another database. Then the distance to your
home are computed using spherical geometry. The exact same thing is
done for Jane. But later on, it may call F(Mary,Chrissy),
F(Mary,Jonesy), F(Mary,Pansy) and so on, and the entire record
retrieval for Mary is repeated many times.

One solution, is to do the expensive extraction one time for each
element, then associate that with the corresponding elements. Suppose
this expensive extraction function is called gist(). So, you create a
new list ([Mary,gist(Mary)], [Jane,gist(Jane)], [John,gist(John)],
[Jenny,gist(Jenny)], ...) and sort this list instead, when done, remove
associated gist. This technique is sometimes called
decorate-sort-dedecorate.

In Perl programing, this decorate-sort-dedecorate technique is sillily
known as Schwartzian Transform as we have demonstrated previously. In
Python, they tried to incorporate this technique into the language, by
adding the “key” optional parameter, which is our gist() function.

----------
This post is archived at:
http://xahlee.org/perl-python/sort_list.html

I would be interested in comments about how Common Lisp, Scheme, and
Haskell deal with the decorate-sort-dedecorate technique. In
particular, does it manifest in the language itself? If not, how does
one usually do it in the code? (note: please reply to approprate groups
if it is of no general interest. Thanks) (am also interested on how
Perl6 or Python3000 does this, if there are major changes to their sort
function)

Thanks.

Xah
http://www.velocityreviews.com/forums/(E-Mail Removed)
http://xahlee.org/

Tom Cole
Guest
Posts: n/a

 08-28-2006
Well you cross-posted this enough, including a Java group, and didn't

In Java, classes can implement the Comparable interface. This interface
contains only one method, a compareTo(Object o) method, and it is
defined to return a value < 0 if the Object is considered less than the
one being passed as an argument, it returns a value > 0 if considered
greater than, and 0 if they are considered equal.

The object implementing this interface can use any of the variables
available to it (AKA address, zip code, longitude, latitude, first
name, whatever) to return this -1, 0 or 1. This is slightly different
than what you mention as we don't have to "decorate" the object. These
are all variables that already exist in the Object, and if fact make it
what it is. So, of course, there is no need to un-decorate at the end.

There are several built-in objects and methods available to sort
Objects that are Comparable, even full Arrays of them.

William James
Guest
Posts: n/a

 08-28-2006
(E-Mail Removed) wrote:

> I would be interested in comments about how Common Lisp, Scheme, and
> Haskell deal with the decorate-sort-dedecorate technique.

%w(FORTRAN LISP COBOL).sort_by{|s| s.reverse}
==>["COBOL", "FORTRAN", "LISP"]

--
Common Lisp did kill Lisp. Period. ... It is to Lisp what
C++ is to C. A monstrosity that totally ignores the basics
of language design, simplicity, and orthogonality to begin
with. --- Bernard Lang

Marc 'BlackJack' Rintsch
Guest
Posts: n/a

 08-28-2006
In <(E-Mail Removed). com>, Tom Cole wrote:

> In Java, classes can implement the Comparable interface. This interface
> contains only one method, a compareTo(Object o) method, and it is
> defined to return a value < 0 if the Object is considered less than the
> one being passed as an argument, it returns a value > 0 if considered
> greater than, and 0 if they are considered equal.
>
> The object implementing this interface can use any of the variables
> available to it (AKA address, zip code, longitude, latitude, first
> name, whatever) to return this -1, 0 or 1. This is slightly different
> than what you mention as we don't have to "decorate" the object. These
> are all variables that already exist in the Object, and if fact make it
> what it is. So, of course, there is no need to un-decorate at the end.

Python has such a mechanism too, the special `__cmp__()` method
has basically the same signature. The problem the decorate, sort,
un-decorate pattern solves is that this object specific compare operations
only use *one* criteria.

Let's say you have a `Person` object with name, surname, date of birth and
so on. When you have a list of such objects and want to sort them by name
or by date of birth you can't use the `compareTo()` method for both.

Ciao,
Marc 'BlackJack' Rintsch

Dr.Ruud
Guest
Posts: n/a

 08-28-2006
Jim Gibson schreef:

> The problem addressed by what is know in Perl as the 'Schwartzian
> Transform' is that the compare operation can be an expensive one,
> regardless of the whether the comparison uses multiple keys. Since in
> comparison sorts, the compare operation will be executed N(logN)
> times, it is more efficient to pre-compute a set of keys, one for
> each object to be sorted. That need be done only N times. The sort
> can then use these pre-computed keys to sort the objects.

Basically it first builds, than sorts an index.

The pre-computed (multi-)keys can often be optimized, see Uri's
Sort::Maker http://search.cpan.org/search?query=Sort::Maker
for facilities.

--
Affijn, Ruud

"Gewoon is een tijger."

Uri Guttman
Guest
Posts: n/a

 08-28-2006
>>>>> "R" == Ruud <(E-Mail Removed)> writes:

R> Basically it first builds, than sorts an index.

R> The pre-computed (multi-)keys can often be optimized, see Uri's
R> Sort::Maker http://search.cpan.org/search?query=Sort::Maker
R> for facilities.

gack, now i ave been mentioned and responding to a thread started by the
moron xah. his flame on the ST shows how stupid he is. he calls it
decorate/sort/undecorate but he doesn't know it was a well known
technique way back in the 50's and 60's. typical of his narrow mind and
lack of real understanding and computer history. the ST didn't invent it
but just showed how to do it easily in perl. and the GRT didn't invent
pack/sort either, it just (and Sort::Maker) just does it easily for you
in perl. both the ST and GRT are implemetation idioms of a known sort
optimization trick used in the real world. in pure algorithmic analysis,
neither one would be even considered as they don't improve the actual
growth curve of the sort. and all the other langs (deleted from the
headers) are just showing their simple equivilent sort/cmp ops. BFD.

uri

--
Uri Guttman ------ (E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Joachim Durchholz
Guest
Posts: n/a

 08-29-2006
Jim Gibson schrieb:
>
> The problem addressed by what is know in Perl as the 'Schwartzian
> Transform' is that the compare operation can be an expensive one,
> regardless of the whether the comparison uses multiple keys. Since in
> comparison sorts, the compare operation will be executed N(logN) times,
> it is more efficient to pre-compute a set of keys, one for each object
> to be sorted. That need be done only N times.

Wikipedia says it's going from 2NlogN to N. If a sort is massively
dominated by the comparison, that could give a speedup of up to 100%
(approximately - dropping the logN factor is almost irrelevant, what
counts is losing that factor of 2).

Regards,
Jo

xhoster@gmail.com
Guest
Posts: n/a

 08-29-2006
Joachim Durchholz <(E-Mail Removed)> wrote:
> Jim Gibson schrieb:
> >
> > The problem addressed by what is know in Perl as the 'Schwartzian
> > Transform' is that the compare operation can be an expensive one,
> > regardless of the whether the comparison uses multiple keys. Since in
> > comparison sorts, the compare operation will be executed N(logN) times,
> > it is more efficient to pre-compute a set of keys, one for each object
> > to be sorted. That need be done only N times.

>
> Wikipedia says it's going from 2NlogN to N. If a sort is massively
> dominated by the comparison, that could give a speedup of up to 100%
> (approximately - dropping the logN factor is almost irrelevant, what
> counts is losing that factor of 2).

It seems to me that ln 1,000,000 is 13.8, and that 13.8 is quite a bit
greater than 2.

Cheers,

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

Jamie
Guest
Posts: n/a

 08-30-2006
In <(E-Mail Removed)>,
Uri Guttman <(E-Mail Removed)> mentions:
>>>>>> "R" == Ruud <(E-Mail Removed)> writes:

>
> R> Basically it first builds, than sorts an index.
>
> R> The pre-computed (multi-)keys can often be optimized, see Uri's
> R> Sort::Maker http://search.cpan.org/search?query=Sort::Maker
> R> for facilities.
>
>gack, now i ave been mentioned and responding to a thread started by the
>moron xah. his flame on the ST shows how stupid he is. he calls it
>decorate/sort/undecorate but he doesn't know it was a well known
>technique way back in the 50's and 60's. typical of his narrow mind and
>lack of real understanding and computer history. the ST didn't invent it
>but just showed how to do it easily in perl. and the GRT didn't invent
>pack/sort either, it just (and Sort::Maker) just does it easily for you
>in perl. both the ST and GRT are implemetation idioms of a known sort
>optimization trick used in the real world. in pure algorithmic analysis,
>neither one would be even considered as they don't improve the actual
>growth curve of the sort. and all the other langs (deleted from the
>headers) are just showing their simple equivilent sort/cmp ops. BFD.

I don't really grok why this is such a big deal.

"Schwartzian xform" (or whatever you want to call it) is just a nice, simple
way of explaining a well known concept that should seem fairly obvious to
most people. (I've always been partial to "alternate index" myself, but who
cares?)

'Course, some people probably want some huge words that no one
understands.

Hmm..

To keep fully buzzword compliant, we could maybe get one of the folks
from w3c.org or something to write up a huge 200 page document about
it.

In the interest of being complete, we could implement the alternate
index (the virtual paradomain blah blah) in XML, using some namespace
tags and stuff describing the "methodology". An example could be written
using a DOM object for the "source data linkage descriptors". Proving
once again how XML can be used to solve all of our problems.

Or...

We could just use common sense. Naaa!

Jamie
--
http://www.geniegate.com Custom web programming
(E-Mail Removed) (rot13) User Management Solutions

neoedmund
Guest
Posts: n/a

 08-31-2006
yeah, java also have 2 interface, Comparator and Comparable, which
equal to python's compareTo() and __cmp__()

Marc 'BlackJack' Rintsch wrote:
> In <(E-Mail Removed). com>, Tom Cole wrote:
>
> > In Java, classes can implement the Comparable interface. This interface
> > contains only one method, a compareTo(Object o) method, and it is
> > defined to return a value < 0 if the Object is considered less than the
> > one being passed as an argument, it returns a value > 0 if considered
> > greater than, and 0 if they are considered equal.
> >
> > The object implementing this interface can use any of the variables
> > available to it (AKA address, zip code, longitude, latitude, first
> > name, whatever) to return this -1, 0 or 1. This is slightly different
> > than what you mention as we don't have to "decorate" the object. These
> > are all variables that already exist in the Object, and if fact make it
> > what it is. So, of course, there is no need to un-decorate at the end.

>
> Python has such a mechanism too, the special `__cmp__()` method
> has basically the same signature. The problem the decorate, sort,
> un-decorate pattern solves is that this object specific compare operations
> only use *one* criteria.
>
> Let's say you have a `Person` object with name, surname, date of birth and
> so on. When you have a list of such objects and want to sort them by name
> or by date of birth you can't use the `compareTo()` method for both.
>
> Ciao,
> Marc 'BlackJack' Rintsch