Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Custom alphabetical sort

Reply
Thread Tools

Re: Custom alphabetical sort

 
 
Roy Smith
Guest
Posts: n/a
 
      12-24-2012
In article <(E-Mail Removed)>,
Pander Musubi <(E-Mail Removed)> wrote:

> Hi all,
>
> I would like to sort according to this order:
>
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> How can I do this? The default sorted() does not give the desired result.


I'm assuming that doesn't correspond to some standard locale's collating
order, so we really do need to roll our own encoding (and that you have
a good reason for wanting to do this). I'm also assuming that what I'm
seeing as question marks are really accented characters in some encoding
that my news reader just isn't dealing with (it seems to think your post
was in ISO-2022-CN (Simplified Chinese).

I'm further assuming that you're starting with a list of unicode
strings, the contents of which are limited to the above alphabet. I'm
even further assuming that the volume of data you need to sort is small
enough that efficiency is not a huge concern.

Given all that, I would start by writing some code which turned your
alphabet into a pair of dicts. One maps from the code point to a
collating sequence number (i.e. ordinals), the other maps back.
Something like (for python 2.7):

alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
[...]
'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

map1 = {c: n for n, c in enumerate(alphabet)}
map2 = {n: c for n, c in enumerate(alphabet)}

Next, I would write some functions which encode your strings as lists of
ordinals (and back again)

def encode(s):
"encode('foo') ==> [34, 19, 19]" # made-up ordinals
return [map1[c] for c in s]

def decode(l):
"decode([34, 19, 19]) ==> 'foo'"
return ''.join(map2[i] for i in l)

Use these to convert your strings to lists of ints which will sort as
per your specified collating order, and then back again:

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

That's just a rough sketch, and completely untested, but it should get
you headed in the right direction. Or at least one plausible direction.
Old-time perl hackers will recognize this as the Schwartzian Transform.
 
Reply With Quote
 
 
 
 
Pander Musubi
Guest
Posts: n/a
 
      12-24-2012
> > Hi all,
>
> >

>
> > I would like to sort according to this order:

>
> >

>
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',

>
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',

>
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',

>
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',

>
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',

>
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',

>
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',

>
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

>
> >

>
> > How can I do this? The default sorted() does not give the desired result.

>
>
>
> I'm assuming that doesn't correspond to some standard locale's collating
>
> order, so we really do need to roll our own encoding (and that you have
>
> a good reason for wanting to do this).


It is for creating a Dutch dictionary. This sorting order is not to be found in an existing locale.

> I'm also assuming that what I'm
>
> seeing as question marks are really accented characters in some encoding
>
> that my news reader just isn't dealing with (it seems to think your post
>
> was in ISO-2022-CN (Simplified Chinese).
>
>
>
> I'm further assuming that you're starting with a list of unicode
>
> strings, the contents of which are limited to the above alphabet.


Correct.

> I'm
>
> even further assuming that the volume of data you need to sort is small
>
> enough that efficiency is not a huge concern.


Well, it is for 200,000 - 450,000 words but the code is allowed be slow. It will not be used for web application or something which requires a quick response.

> Given all that, I would start by writing some code which turned your
>
> alphabet into a pair of dicts. One maps from the code point to a
>
> collating sequence number (i.e. ordinals), the other maps back.
>
> Something like (for python 2.7):
>
>
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
>
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
>
> [...]
>
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
>
>
> map1 = {c: n for n, c in enumerate(alphabet)}
>
> map2 = {n: c for n, c in enumerate(alphabet)}


OK, similar to Thomas' proposal.

> Next, I would write some functions which encode your strings as lists of
>
> ordinals (and back again)
>
>
>
> def encode(s):
>
> "encode('foo') ==> [34, 19, 19]" # made-up ordinals
>
> return [map1[c] for c in s]
>
>
>
> def decode(l):
>
> "decode([34, 19, 19]) ==> 'foo'"
>
> return ''.join(map2[i] for i in l)
>
>
>
> Use these to convert your strings to lists of ints which will sort as
>
> per your specified collating order, and then back again:
>
>
>
> encoded_strings = [encode(s) for s in original_list]
>
> encoded_strings.sort()
>
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> That's just a rough sketch, and completely untested, but it should get
>
> you headed in the right direction. Or at least one plausible direction.
>
> Old-time perl hackers will recognize this as the Schwartzian Transform.


I will test it and let you know. Pander
 
Reply With Quote
 
 
 
 
Roy Smith
Guest
Posts: n/a
 
      12-24-2012
In article <(E-Mail Removed)>,
Pander Musubi <(E-Mail Removed)> wrote:

> > I'm assuming that doesn't correspond to some standard locale's collating
> > order, so we really do need to roll our own encoding (and that you have
> > a good reason for wanting to do this).

>
> It is for creating a Dutch dictionary.


Wait a minute. You're telling me that Python, of all languages, doesn't
have a built-in way to sort Dutch words???
 
Reply With Quote
 
Pander Musubi
Guest
Posts: n/a
 
      12-24-2012
>
>
>
> > > I'm assuming that doesn't correspond to some standard locale's collating

>
> > > order, so we really do need to roll our own encoding (and that you have

>
> > > a good reason for wanting to do this).

>
> >

>
> > It is for creating a Dutch dictionary.

>
>
>
> Wait a minute. You're telling me that Python, of all languages, doesn't
>
> have a built-in way to sort Dutch words???


Not when you want Roman characters with diacritics to be sorted in the normal a-Z range.
 
Reply With Quote
 
Mark Lawrence
Guest
Posts: n/a
 
      12-24-2012
On 24/12/2012 17:40, Roy Smith wrote:
> In article <(E-Mail Removed)>,
> Pander Musubi <(E-Mail Removed)> wrote:
>
>>> I'm assuming that doesn't correspond to some standard locale's collating
>>> order, so we really do need to roll our own encoding (and that you have
>>> a good reason for wanting to do this).

>>
>> It is for creating a Dutch dictionary.

>
> Wait a minute. You're telling me that Python, of all languages, doesn't
> have a built-in way to sort Dutch words???
>


There's a built-in called secret that's only available to those who are
Dutch and members of the PSU.

A slight aside, I understand that the BDFL is currently on holiday. For
those who want a revolution now is as good a time as any

--
Cheers.

Mark Lawrence.

 
Reply With Quote
 
Joshua Landau
Guest
Posts: n/a
 
      12-24-2012
On 24 December 2012 16:18, Roy Smith <(E-Mail Removed)> wrote:

> In article <(E-Mail Removed)>,
> Pander Musubi <(E-Mail Removed)> wrote:
>
> > Hi all,
> >
> > I would like to sort according to this order:
> >
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

> 'a',
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c',

> 'C',
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?',

> 'f',
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?',

> '?',
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O',

> '?',
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r',

> 'R',
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?',

> 'v',
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> >
> > How can I do this? The default sorted() does not give the desired result.

>


<snip>

Given all that, I would start by writing some code which turned your
> alphabet into a pair of dicts. One maps from the code point to a
> collating sequence number (i.e. ordinals), the other maps back.
> Something like (for python 2.7):
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
> [...]
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> map1 = {c: n for n, c in enumerate(alphabet)}
> map2 = {n: c for n, c in enumerate(alphabet)}
>
> Next, I would write some functions which encode your strings as lists of
> ordinals (and back again)
>
> def encode(s):
> "encode('foo') ==> [34, 19, 19]" # made-up ordinals
> return [map1[c] for c in s]
>
> def decode(l):
> "decode([34, 19, 19]) ==> 'foo'"
> return ''.join(map2[i] for i in l)
>
> Use these to convert your strings to lists of ints which will sort as
> per your specified collating order, and then back again:
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>


This isn't needed and the not-so-new way to do this is through .sort's key
attribute.

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

changes to

encoded_strings.sort(key=encode)

[Which happens to be faster </reasonable_guess>]

Hence you neither need map2 or decode:

## CODE ##

alphabet = (
' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
'A', '', '', '', '', '', '',
'', '', '', '', 'b', 'B', 'c', 'C', '', '', 'd', 'D', 'e', 'E', '',
'', '', '', '', '', '', '',
'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', '', '', '', '', '','', '',
'', 'j', 'J', 'k', 'K', 'l', 'L',
'm', 'M', 'n', '', 'N', '', 'o', 'O', '', '', '', '', '', '', '',
'', '', '', 'p', 'P', 'q', 'Q',
'r', 'R', 's', 'S', 't', 'T', 'u', 'U', '', '', '', '', '','', '',
'', 'v', 'V', 'w', 'W', 'x', 'X',
'y', 'Y', 'z', 'Z'
)

hashindex = {character:index for index, character in enumerate(alphabet)}
def string2sortlist(string):
return [hashindex[s] for s in string]

# Quickly make some stuff to sort. Let's try 200k, as that's what's
suggested.
import random
things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6)))
for _ in range(200000)]

print(things_to_sort[:15])

things_to_sort.sort(key=string2sortlist)

print(things_to_sort[:15])

## END CODE ##

Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to
Tomas Bach's method.

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      12-24-2012
On Mon, 24 Dec 2012 11:18:37 -0500, Roy Smith wrote:

> In article <(E-Mail Removed)>,
> Pander Musubi <(E-Mail Removed)> wrote:
>
>> Hi all,
>>
>> I would like to sort according to this order:

[...]
> I'm assuming that doesn't correspond to some standard locale's collating
> order, so we really do need to roll our own encoding (and that you have
> a good reason for wanting to do this). I'm also assuming that what I'm
> seeing as question marks are really accented characters in some encoding
> that my news reader just isn't dealing with (it seems to think your post
> was in ISO-2022-CN (Simplified Chinese).


Good lord man, what sort of crappy newsreader software are you using? (It
claims to be "MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)" -- I think
anything as bad as that shouldn't advertise what it is.) The OP's post
was correctly labelled with an encoding, and not an obscure one:

Content-Type: text/plain; charset=ISO-8859-1

which if I remember correctly is Latin-1. If your newsreader can't handle
that, surely it should default to UTF-8, which should give you the right
results sans question marks.




--
Steven
 
Reply With Quote
 
Pander Musubi
Guest
Posts: n/a
 
      12-24-2012
On Monday, December 24, 2012 7:12:43 PM UTC+1, Joshua Landau wrote:
> On 24 December 2012 16:18, Roy Smith <(E-Mail Removed)> wrote:
>
>
>
>
> In article <(E-Mail Removed)>,
>
> *Pander Musubi <(E-Mail Removed)> wrote:
>
>
>
> > Hi all,

>
>
> >

>
> > I would like to sort according to this order:

>
> >

>
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9','a',

>
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',

>
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',

>
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',

>
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',

>
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',

>
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',

>
>
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

>
> >

>
>
> > How can I do this? The default sorted() does not give the desired result.

>
>
>
> <snip>*
>
>
>
>
> Given all that, I would start by writing some code which turned your
>
> alphabet into a pair of dicts. *One maps from the code point to a
>
> collating sequence number (i.e. ordinals), the other maps back.
>
> Something like (for python 2.7):
>
>
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
>
> * * * * * * '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
>
> * * * * * * [...]
>
>
> * * * * * * 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
>
>
> map1 = {c: n for n, c in enumerate(alphabet)}
>
> map2 = {n: c for n, c in enumerate(alphabet)}
>
>
>
> Next, I would write some functions which encode your strings as lists of
>
> ordinals (and back again)
>
>
>
> def encode(s):
>
> * *"encode('foo') ==> [34, 19, 19]" *# made-up ordinals
>
> * *return [map1[c] for c in s]
>
>
>
> def decode(l):
>
> * *"decode([34, 19, 19]) ==> 'foo'"
>
> * * return ''.join(map2[i] for i in l)
>
>
>
> Use these to convert your strings to lists of ints which will sort as
>
> per your specified collating order, and then back again:
>
>
>
> encoded_strings = [encode(s) for s in original_list]
>
> encoded_strings.sort()
>
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> This isn't needed and the not-so-new way to do this is through .sort's key attribute.
>
>
>
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> changes to
>
>
>
>
> encoded_strings.sort(key=encode)
>
>
>
> [Which happens to be faster </reasonable_guess>]
>
>
>
>
> Hence you neither need map2 or decode:
>
>
> ## CODE ##
>
>
>
>
>
> alphabet = (
> ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', '', '', '', '', '', '',
>
>
> '', '', '', '', 'b', 'B', 'c', 'C', '', '', 'd', 'D', 'e', 'E', '', '', '', '', '', '', '', '',
>
>
> 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', '', '', '', '', '', '', '', '', 'j', 'J', 'k', 'K', 'l', 'L',
>
>
> 'm', 'M', 'n', '', 'N', '', 'o', 'O', '', '', '', '', '', '', '', '', '', '', 'p', 'P', 'q', 'Q',
>
>
> 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', '', '', '', '', '', '', '', '', 'v', 'V', 'w', 'W', 'x', 'X',
>
>
> 'y', 'Y', 'z', 'Z'
> )
>
>
>
> hashindex = {character:index for index, character in enumerate(alphabet)}
>
> def string2sortlist(string):
> return [hashindex[s] for s in string]
>
>
>
>
> # Quickly make some stuff to sort. Let's try 200k, as that's what's suggested.
> import random
> things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6))) for _ in range(200000)]
>
>
>
>
> print(things_to_sort[:15])
>
>
> things_to_sort.sort(key=string2sortlist)
>
>
>
>
> print(things_to_sort[:15])
>
>
> ## END CODE ##
>
>
>
>
> Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to Tomas Bach's method.


With Python2.7 I had to use

alphabet = (
u' ', u'.', u'\'', u'-', u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'a', u'A', u'', u'', u'', u'', u'', u'',
u'', u'', u'', u'', u'b', u'B', u'c', u'C', u'', u'', u'd', u'D', u'e', u'E', u'', u'', u'', u'', u'', u'', u'', u'',
u'f', u'F', u'g', u'G', u'h', u'H', u'i', u'I', u'', u'', u'', u'', u'', u'', u'', u'', u'j', u'J', u'k', u'K', u'l', u'L',
u'm', u'M', u'n', u'', u'N', u'', u'o', u'O', u'', u'', u'',u'', u'', u'', u'', u'', u'', u'', u'p', u'P', u'q', u'Q',
u'r', u'R', u's', u'S', u't', u'T', u'u', u'U', u'', u'', u'', u'', u'', u'', u'', u'', u'v', u'V', u'w', u'W', u'x', u'X',
u'y', u'Y', u'z', u'Z'
)

to prevent

Traceback (most recent call last):
File "./sort.py", line 23, in <module>
things_to_sort.sort(key=string2sortlist)
File "./sort.py", line 15, in string2sortlist
return [hashindex[s] for s in string]
KeyError: '\xc3'

Thanks very much for this efficient code.
 
Reply With Quote
 
Pander Musubi
Guest
Posts: n/a
 
      12-24-2012
On Monday, December 24, 2012 7:12:43 PM UTC+1, Joshua Landau wrote:
> On 24 December 2012 16:18, Roy Smith <(E-Mail Removed)> wrote:
>
>
>
>
> In article <(E-Mail Removed)>,
>
> *Pander Musubi <(E-Mail Removed)> wrote:
>
>
>
> > Hi all,

>
>
> >

>
> > I would like to sort according to this order:

>
> >

>
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9','a',

>
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',

>
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',

>
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',

>
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',

>
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',

>
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',

>
>
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

>
> >

>
>
> > How can I do this? The default sorted() does not give the desired result.

>
>
>
> <snip>*
>
>
>
>
> Given all that, I would start by writing some code which turned your
>
> alphabet into a pair of dicts. *One maps from the code point to a
>
> collating sequence number (i.e. ordinals), the other maps back.
>
> Something like (for python 2.7):
>
>
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
>
> * * * * * * '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
>
> * * * * * * [...]
>
>
> * * * * * * 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
>
>
> map1 = {c: n for n, c in enumerate(alphabet)}
>
> map2 = {n: c for n, c in enumerate(alphabet)}
>
>
>
> Next, I would write some functions which encode your strings as lists of
>
> ordinals (and back again)
>
>
>
> def encode(s):
>
> * *"encode('foo') ==> [34, 19, 19]" *# made-up ordinals
>
> * *return [map1[c] for c in s]
>
>
>
> def decode(l):
>
> * *"decode([34, 19, 19]) ==> 'foo'"
>
> * * return ''.join(map2[i] for i in l)
>
>
>
> Use these to convert your strings to lists of ints which will sort as
>
> per your specified collating order, and then back again:
>
>
>
> encoded_strings = [encode(s) for s in original_list]
>
> encoded_strings.sort()
>
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> This isn't needed and the not-so-new way to do this is through .sort's key attribute.
>
>
>
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> changes to
>
>
>
>
> encoded_strings.sort(key=encode)
>
>
>
> [Which happens to be faster </reasonable_guess>]
>
>
>
>
> Hence you neither need map2 or decode:
>
>
> ## CODE ##
>
>
>
>
>
> alphabet = (
> ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', '', '', '', '', '', '',
>
>
> '', '', '', '', 'b', 'B', 'c', 'C', '', '', 'd', 'D', 'e', 'E', '', '', '', '', '', '', '', '',
>
>
> 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', '', '', '', '', '', '', '', '', 'j', 'J', 'k', 'K', 'l', 'L',
>
>
> 'm', 'M', 'n', '', 'N', '', 'o', 'O', '', '', '', '', '', '', '', '', '', '', 'p', 'P', 'q', 'Q',
>
>
> 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', '', '', '', '', '', '', '', '', 'v', 'V', 'w', 'W', 'x', 'X',
>
>
> 'y', 'Y', 'z', 'Z'
> )
>
>
>
> hashindex = {character:index for index, character in enumerate(alphabet)}
>
> def string2sortlist(string):
> return [hashindex[s] for s in string]
>
>
>
>
> # Quickly make some stuff to sort. Let's try 200k, as that's what's suggested.
> import random
> things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6))) for _ in range(200000)]
>
>
>
>
> print(things_to_sort[:15])
>
>
> things_to_sort.sort(key=string2sortlist)
>
>
>
>
> print(things_to_sort[:15])
>
>
> ## END CODE ##
>
>
>
>
> Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to Tomas Bach's method.


With Python2.7 I had to use

alphabet = (
u' ', u'.', u'\'', u'-', u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'a', u'A', u'', u'', u'', u'', u'', u'',
u'', u'', u'', u'', u'b', u'B', u'c', u'C', u'', u'', u'd', u'D', u'e', u'E', u'', u'', u'', u'', u'', u'', u'', u'',
u'f', u'F', u'g', u'G', u'h', u'H', u'i', u'I', u'', u'', u'', u'', u'', u'', u'', u'', u'j', u'J', u'k', u'K', u'l', u'L',
u'm', u'M', u'n', u'', u'N', u'', u'o', u'O', u'', u'', u'',u'', u'', u'', u'', u'', u'', u'', u'p', u'P', u'q', u'Q',
u'r', u'R', u's', u'S', u't', u'T', u'u', u'U', u'', u'', u'', u'', u'', u'', u'', u'', u'v', u'V', u'w', u'W', u'x', u'X',
u'y', u'Y', u'z', u'Z'
)

to prevent

Traceback (most recent call last):
File "./sort.py", line 23, in <module>
things_to_sort.sort(key=string2sortlist)
File "./sort.py", line 15, in string2sortlist
return [hashindex[s] for s in string]
KeyError: '\xc3'

Thanks very much for this efficient code.
 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      12-25-2012
On 12/24/2012 06:19 PM, Pander Musubi wrote:
> <snip>


> to prevent
>
> Traceback (most recent call last):
> File "./sort.py", line 23, in <module>
> things_to_sort.sort(key=string2sortlist)
> File "./sort.py", line 15, in string2sortlist
> return [hashindex[s] for s in string]
> KeyError: '\xc3'
>
> Thanks very much for this efficient code.


Perhaps you missed Ian Kelly's correction of Thomas Bach's approach:

d = { k: v for v, k in enumerate(cs) }


def collate(x):
return list(map(d.get, x))

sorted(data, key=collate)

I'd use Ian Kelly's approach. It's not only more compact, it shouldn't
give an exception for a character not in the table. At least, not for
Python 2.x. I'm not sure about Python 3, since it can give an exception
comparing None to int.


--

DaveA

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom alphabetical sort Pander Musubi Python 8 12-28-2012 10:27 AM
Using Readers for SQL queries in a loop, want to sort results in alphabetical VB.net itfetish@gmail.com ASP .Net 7 04-23-2007 07:35 AM
ListItemCollection Sort Alphabetical =?Utf-8?B?YmVub2l0?= ASP .Net 4 11-03-2005 04:56 PM
Does WordPad have an alphabetical sort function? tomt Computer Support 5 08-16-2004 04:10 AM
Ado sort error-Ado Sort -Relate, Compute By, or Sort operations cannot be done on column(s) whose key length is unknown or exceeds 10 KB. Navin ASP General 1 09-09-2003 07:16 AM



Advertisments