Velocity Reviews > Re: sorting on keys in a list of dicts

# Re: sorting on keys in a list of dicts

Jp Calderone
Guest
Posts: n/a

 01-06-2005
On Thu, 06 Jan 2005 15:31:22 +0100, J Berends <j.p.t.j.berends@[n0sp4m].nl> wrote:
>Suppose I have a list of dictionaries and each dict has a common keyname
> with a (sortable) value in it.
>
> How can I shuffle their position in the list in such way that they
> become sorted.
>

In Python 2.4,

import operator
L.sort(key=operator.itemgetter(key))

In Python 2.3,

L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
L2.sort()
L = [d for (v, i, d) in L2]

Jp

J Berends
Guest
Posts: n/a

 01-06-2005
Jp Calderone wrote:
> On Thu, 06 Jan 2005 15:31:22 +0100, J Berends <j.p.t.j.berends@[n0sp4m].nl> wrote:
>
>>Suppose I have a list of dictionaries and each dict has a common keyname
>>with a (sortable) value in it.
>>
>>How can I shuffle their position in the list in such way that they
>>become sorted.
>>

>
>
> In Python 2.4,
>
> import operator
> L.sort(key=operator.itemgetter(key))
>
> In Python 2.3,
>
> L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
> L2.sort()
> L = [d for (v, i, d) in L2]
>
> Jp

the 2.3 version seems to work quite nicely! thanks. Using the 2.3 for
version compatibility. Thanks!

Jeff Shannon
Guest
Posts: n/a

 01-06-2005
Jp Calderone wrote:

> L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
> L2.sort()
> L = [d for (v, i, d) in L2]

Out of curiosity, any reason that you're including the index? I'd
have expected to just do

L2 = [(d[key], d) for d in L]
L2.sort()
L = [d for (v, d) in L2]

I suppose that your version has the virtue that, if the sortkey value
is equal, items retain the order that they were in the original list,
whereas my version will sort them into an essentially arbitrary order.
Is there anything else that I'm missing here?

Jeff Shannon
Technician/Programmer
Credit International

Nick Coghlan
Guest
Posts: n/a

 01-07-2005
Jeff Shannon wrote:
> I suppose that your version has the virtue that, if the sortkey value is
> equal, items retain the order that they were in the original list,
> whereas my version will sort them into an essentially arbitrary order.
> Is there anything else that I'm missing here?

Stability in sorting is a property not to be sneezed at - it means switching to
sorting by a second key gives the effect of "sort by key 1, then by key 2",
whereas that doesn't hold with an unstable sort algorithm. If you've ever used
an application with an unstable sorting process and that only allows sorting a
table on one column at a time, you'll appreciate the frustration that can cause

Also, it's required to match the behaviour of the Python 2.4 version (which gets
to take advantage of the stability of the builtin sort).

Cheers,
Nick.

--
Nick Coghlan | http://www.velocityreviews.com/forums/(E-Mail Removed) | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

It's me
Guest
Posts: n/a

 01-07-2005
What does it mean by "stability in sorting"?

Can somebody please give a sample for using the code posted? I am a little
lost here and I like to know more about the use of keys....

Thanks,

"Nick Coghlan" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Jeff Shannon wrote:
> > I suppose that your version has the virtue that, if the sortkey value is
> > equal, items retain the order that they were in the original list,
> > whereas my version will sort them into an essentially arbitrary order.
> > Is there anything else that I'm missing here?

>
> Stability in sorting is a property not to be sneezed at - it means

switching to
> sorting by a second key gives the effect of "sort by key 1, then by key

2",
> whereas that doesn't hold with an unstable sort algorithm. If you've ever

used
> an application with an unstable sorting process and that only allows

sorting a
> table on one column at a time, you'll appreciate the frustration that can

cause
>
> Also, it's required to match the behaviour of the Python 2.4 version

(which gets
> to take advantage of the stability of the builtin sort).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | (E-Mail Removed) | Brisbane, Australia
> ---------------------------------------------------------------
> http://boredomandlaziness.skystorm.net

Craig Ringer
Guest
Posts: n/a

 01-07-2005
On Sat, 2005-01-08 at 00:26, It's me wrote:
> What does it mean by "stability in sorting"?

If I understand correctly, it means that when two sorts are performed in
sequence, the keys that are equal to the second sort end up ordered the
way they were left by the first sort.

I'm far from certain of this, but at least I'm presenting an opportunity
for someone to yell "no, you're wrong!" and in the process definitively

For example, given the list:

..>>> l = [(1,2), (8,2), (2,2), (3,2), (4,3), (5,3), (8,9)]

if we sort by the first element of each tuple then the second (the
default), we get:

..>>> l.sort()
..>>> l
[(1, 2), (2, 2), (3, 2), (4, 3), (5, 3), (8, 2), (8, 9)]

Now, if we sort based on the second element we get:

..>>> def seconditem(x):
..... return x[1]
.....
..>>> l.sort(key=seconditem)
..>>> l
[(1, 2), (2, 2), (3, 2), (8, 2), (4, 3), (5, 3), (8, 9)]

You'll note that there are several correct answers to the request "sort
the list 'l' by the second element of each item", including:

[(1, 2), (2, 2), (3, 2), (8, 2), (4, 3), (5, 3), (8, 9)]
[(2, 2), (1, 2), (8, 2), (3, 2), (4, 3), (5, 3), (8, 9)]
[(1, 2), (2, 2), (3, 2), (8, 2), (5, 3), (4, 3), (8, 9)]

and many others. Because we didn't specify that the first item in the
value tuples should be used in the sort key, so long as the second key
is equal for a group of items it doesn't matter what order items in that
group appear in.

Python (at least 2.4), however, returns those groups where the order
isn't defined in the same order they were before the sort. Look at this,
for example:

..>>> l.sort()
..>>> l.reverse()
..>>> l
[(8, 9), (8, 2), (5, 3), (4, 3), (3, 2), (2, 2), (1, 2)]
..>>> l.sort(key=seconditem)
..>>> l
[(8, 2), (3, 2), (2, 2), (1, 2), (5, 3), (4, 3), (8, 9)]

See how the exact same sort command was used this time around, but
because the list was reverse-sorted first, the elements are in reverse
order by first item when the second item is equal?

In the first case we used the same result as the stable sort could be
obtained with:

..>>> def revitem(x):
..... return (x[1], x[0])
>>> l.sort(key=revitem)
>>> l

[(1, 2), (2, 2), (3, 2), (8, 2), (4, 3), (5, 3), (8, 9)]

(in other words, saying "use the value tuple as the sort key, but sort
by the second element before the first")

That doesn't extend to more complex cases very well though. Imagine you
had 3-tuples not 2-tuples, and wanted to maintain the previous sort
order of equal groupings when re-sorting by a different key... but you
didn't know what key was last used for sorting. A stable sort algorithm
means you don't need to care, because the order will be maintained for
you not randomized.

Well, that's several hundred more words than were probably required, but

--
Craig Ringer

Nick Coghlan
Guest
Posts: n/a

 01-07-2005
It's me wrote:
> What does it mean by "stability in sorting"?
>
> Can somebody please give a sample for using the code posted? I am a little
> lost here and I like to know more about the use of keys....

It's the jargon for what Jeff said - if you are sorting by some value calculated
from each list entry, and different list entries may give the same value, then a
stable sort guarantees that the order of entries giving the same value will be
preserved from the original list.

Consider:

Py> from operator import itemgetter as item
Py> seq = [(1, 1), (2, 1), (2, 3), (1, 5)]
Py> seq.sort(key=item(1))
Py> seq #1
[(1, 1), (2, 1), (2, 3), (1, 5)]
Py> seq.sort(reverse=True)
Py> seq #2
[(2, 3), (2, 1), (1, 5), (1, 1)]
Py> seq.sort(key=item(1))
Py> seq #3
[(2, 1), (1, 1), (2, 3), (1, 5)]

This snippet sorts the tuples according to the second item, then sorts them in
reverse order of the whole tuple, then resorts them according to the second item.

Notice that the order of the first two items is different between point #1 and
point #3. This is because sorting by the second item makes no distinction
between these two tuples, and they retain whatever order they had before the
sort began.

This is what it means to have a stable sort, and it makes expressing complex
sorting quite easy by chaining different sort operations together.

Python 2.3 has a stable sort, and Python 2.4 brought the guarantee that it shall
remain that way. I'm not sure about Python 2.2 and earlier.

Cheers,
Nick.

--
Nick Coghlan | (E-Mail Removed) | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

Tim Peters
Guest
Posts: n/a

 01-07-2005
[Nick Coghlan]
....
> Python 2.3 has a stable sort, and Python 2.4 brought the guarantee that it shall
> remain that way. I'm not sure about Python 2.2 and earlier.

No list.sort() implementation before 2.3 was stable. It was
confusing, though, because the samplesort/binary_insertion_sort hybrid
Python used for the 4 years preceding 2.3 *was* stable for all "small
enough" lists. More than one person got fooled by guessing that
stability observed in small test cases meant it was always stable.

Nick Coghlan
Guest
Posts: n/a

 01-07-2005
Craig Ringer wrote:
> Well, that's several hundred more words than were probably required, but
> I hope I made sense.

Remarkably similar to what I just posted. . . I guess numeric 2-tuples are just
too good to pass up when discussing sorting

Cheers,
Nick.

--
Nick Coghlan | (E-Mail Removed) | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

Scott David Daniels
Guest
Posts: n/a

 01-07-2005
Jeff Shannon wrote:
> Jp Calderone wrote:
>> L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
>> L2.sort()
>> L = [d for (v, i, d) in L2]

> Out of curiosity, any reason that you're including the index?

Others have already remarked that this preserves sort stability
(which is, in fact a lovely property). There is another property
which hasn't been mentioned:
As written, only the key and the index are compared.

Try sorting:
tricky = [dict(a=5j, b=1), dict(a=4j, b=1)]
or:
similar = [(5j, 1), (4j, 1)]

without the index.

--Scott David Daniels
(E-Mail Removed)