Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to sort a list of strings on a substring

Reply
Thread Tools

How to sort a list of strings on a substring

 
 
Scott
Guest
Posts: n/a
 
      10-05-2009
I create a list of logs called LogList. Here is a sample:

LogList =
["inbound tcp office 192.168.0.125 inside 10.1.0.91 88",
"inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
"inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
"inbound udp office 192.168.0.220 inside 10.1.0.13 53"]

I want to sort the list on index 3 of each string - the first IP
Address.

I only need strings with similar, first IP's to be together. I don't
need all of the IP's to be in order. For example:
either:
SortedList =
["inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
"inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
"inbound udp office 192.168.0.220 inside 10.1.0.13 53",
"inbound tcp office 192.168.0.125 inside 10.1.0.91 88"]
-or-
SortedList =
["inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
"inbound udp office 192.168.0.220 inside 10.1.0.13 53",
"inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
"inbound tcp office 192.168.0.125 inside 10.1.0.91 88"]
-or-
etc.

would be fine.

I'm reading a lot on sort, sorted, cmp, etc. but I'm just not getting
how to use an element of a string as a "key" within a list of strings.
I'm using Python 2.6.2.

Thanks

 
Reply With Quote
 
 
 
 
n00m
Guest
Posts: n/a
 
      10-05-2009
Here you are:

LogList = [\
"inbound tcp office 192.168.0.125 inside 10.1.0.91 88",
"inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
"inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
"inbound udp office 192.168.0.220 inside 10.1.0.13 53"]


LogList.sort(key=lambda x: x[x.index('1'):])

for item in LogList:
print item

================================================== =========

inbound udp lab 172.24.0.110 inside 10.1.0.6 161
inbound tcp office 192.168.0.125 inside 10.1.0.91 88
inbound udp office 192.168.0.220 inside 10.1.0.13 53
inbound tcp office 192.168.0.220 inside 10.1.0.31 2967
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      10-05-2009
Scott wrote:
> I create a list of logs called LogList. Here is a sample:
>
> LogList =
> ["inbound tcp office 192.168.0.125 inside 10.1.0.91 88",
> "inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
> "inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
> "inbound udp office 192.168.0.220 inside 10.1.0.13 53"]
>
> I want to sort the list on index 3 of each string - the first IP
> Address.
>
> I only need strings with similar, first IP's to be together. I don't
> need all of the IP's to be in order. For example:
> either:
> SortedList =
> ["inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
> "inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
> "inbound udp office 192.168.0.220 inside 10.1.0.13 53",
> "inbound tcp office 192.168.0.125 inside 10.1.0.91 88"]
> -or-
> SortedList =
> ["inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
> "inbound udp office 192.168.0.220 inside 10.1.0.13 53",
> "inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
> "inbound tcp office 192.168.0.125 inside 10.1.0.91 88"]
> -or-
> etc.
>
> would be fine.
>
> I'm reading a lot on sort, sorted, cmp, etc. but I'm just not getting
> how to use an element of a string as a "key" within a list of strings.
> I'm using Python 2.6.2.
>

Forget about cmp, just use the 'key' argument of the list's 'sort'
method or the 'sorted' function (the latter is better if you want to
keep the original list). The 'key' argument expects a function (anything
callable, actually) that accepts a single argument (the item) and
returns a value to be used as the key, and the items will be sorted
according to that key. In this case you want the items sorted by the
fourth 'word', so split the item into words and return the one at index
3:

def key_word(item):
return item.split()[3]

SortedList = sorted(LogList, key=key_word)

If the function is short and simple enough, lambda is often used instead
of a named function:

SortedList = sorted(LogList, key=lambda item: item.split()[3])

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-06-2009
On Mon, 05 Oct 2009 15:45:58 -0700, n00m wrote:

> Here you are:
>
> LogList = [\
> "inbound tcp office 192.168.0.125 inside 10.1.0.91 88", "inbound tcp
> office 192.168.0.220 inside 10.1.0.31 2967", "inbound udp lab
> 172.24.0.110 inside 10.1.0.6 161", "inbound udp office 192.168.0.220
> inside 10.1.0.13 53"]
>
>
> LogList.sort(key=lambda x: x[x.index('1'):])



No, that's incorrect. Try it with this data and you will see it fails:


LogList = [
"inbound tcp office1 192.168.0.125 inside 10.1.0.91 88",
"inbound tcp office2 192.168.0.220 inside 10.1.0.31 2967",
"inbound udp lab1 172.24.0.110 inside 10.1.0.6 161",
"inbound udp office2 192.168.0.220 inside 10.1.0.13 53",
"inbound udp lab2 172.24.0.121 inside 10.1.0.6 161",
"inbound udp webby 220.96.0.2 inside 20.2.0.9 54",
]


Worse, if you delete the last item ("webby"), the code silently does the
wrong thing. Code that crashes is bad, but code that silently does the
wrong thing is a nightmare. Your test succeeded by accident -- it was a
fluke of the data that you failed to see both failure modes.

The question asked was how to sort the list according to item 3 of the
strings, *not* how to sort the list according to the first character '1'.
The way to solve this correctly is by extracting item 3 and sorting on
that, not by searching for the first character '1'. That is a hack[1]
that just happened to work for the specific test data you tried it on.




[1] Hack in the bad sense, not in the good sense.



--
Steven
 
Reply With Quote
 
n00m
Guest
Posts: n/a
 
      10-06-2009
> No, that's incorrect. Try it with this data and you will see it fails:

Of course, you are right, but I think the topic-starter is smart
enough
to understand that I suggested only a hint, a sketch, a sample of how
to use "key=" with "lambda", not a ready-to-apply solution.
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-06-2009
On Mon, 05 Oct 2009 20:33:51 -0700, n00m wrote:

>> No, that's incorrect. Try it with this data and you will see it fails:

>
> Of course, you are right, but I think the topic-starter is smart enough
> to understand that I suggested only a hint, a sketch, a sample of how to
> use "key=" with "lambda", not a ready-to-apply solution.


Oh please. That's a ridiculous excuse. Your post started with "Here you
are" -- the implication is that you thought it *was* a solution, not a
hint. A hint would be something like "Write a key function, perhaps using
lambda, and pass it to the sort() method using the key parameter."

There's no shame at writing buggy code. There's not a person here who has
never made a silly mistake, and most of us have done so in public too.
Some real clangers too. What matters is how folks respond to having the
their mistakes pointed out, and whether they learn from it.




--
Steven
 
Reply With Quote
 
n00m
Guest
Posts: n/a
 
      10-06-2009
English language is not my mother toung,
so I can't grasp many subtle nuances of it.
Maybe "here you are" means to me quite a
different thing than to you.
 
Reply With Quote
 
Scott
Guest
Posts: n/a
 
      10-06-2009
On Oct 5, 6:05*pm, MRAB <(E-Mail Removed)> wrote:
> Scott wrote:
> > I create a list of logs called LogList. Here is a sample:

>
> > LogList =
> > ["inbound tcp office 192.168.0.125 inside 10.1.0.91 88",
> > "inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
> > "inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
> > "inbound udp office 192.168.0.220 inside 10.1.0.13 53"]

>
> > I want to sort the list on index 3 of each string - the first IP
> > Address.

>
> > I only need strings with similar, first IP's to be together. I don't
> > need all of the IP's to be in order. For example:
> > either:
> > SortedList =
> > ["inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
> > "inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
> > "inbound udp office 192.168.0.220 inside 10.1.0.13 53",
> > "inbound tcp office 192.168.0.125 inside 10.1.0.91 88"]
> > -or-
> > SortedList =
> > ["inbound tcp office 192.168.0.220 inside 10.1.0.31 2967",
> > "inbound udp office 192.168.0.220 inside 10.1.0.13 53",
> > "inbound udp lab 172.24.0.110 inside 10.1.0.6 161",
> > "inbound tcp office 192.168.0.125 inside 10.1.0.91 88"]
> > -or-
> > etc.

>
> > would be fine.

>
> > I'm reading a lot on sort, sorted, cmp, etc. but I'm just not getting
> > how to use an element of a string as a "key" within a list of strings.
> > I'm using Python 2.6.2.

>
> Forget about cmp, just use the 'key' argument of the list's 'sort'
> method or the 'sorted' function (the latter is better if you want to
> keep the original list). The 'key' argument expects a function (anything
> callable, actually) that accepts a single argument (the item) and
> returns a value to be used as the key, and the items will be sorted
> according to that key. In this case you want the items sorted by the
> fourth 'word', so split the item into words and return the one at index
> 3:
>
> def key_word(item):
> * * *return item.split()[3]
>
> SortedList = sorted(LogList, key=key_word)
>
> If the function is short and simple enough, lambda is often used instead
> of a named function:
>
> SortedList = sorted(LogList, key=lambda item: item.split()[3])


Ok, the lambda worked as advertised. THANK YOU!!

Thanks for giving both a def and lambda example. I'll be saving them.
-Scott
 
Reply With Quote
 
alex23
Guest
Posts: n/a
 
      10-06-2009
Steven D'Aprano <(E-Mail Removed)> wrote:
> Oh please. That's a ridiculous excuse. Your post started with "Here you
> are" -- the implication is that you thought it *was* a solution, not a
> hint. A hint would be something like "Write a key function, perhaps using
> lambda, and pass it to the sort() method using the key parameter."


In n00m's defense, the OP's question was "I'm just not getting how to
use an element of a string as a "key" within a list of strings", which
n00m's post did answer, and which did work with the data set given. If
Scott had asked "could someone show me how to do this", then yes, the
"here you are" would have been wrong.

Ah, semantics and the lack of expression in text
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-06-2009
On Mon, 05 Oct 2009 21:16:38 -0700, n00m wrote:

> English language is not my mother toung, so I can't grasp many subtle
> nuances of it. Maybe "here you are" means to me quite a different thing
> than to you.


It means "here is the thing you were looking for". Anyway, nothing I
wrote was meant as an attack on you.


--
Steven
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM
RegEx search for a substring within a substring colinhumber@gmail.com Perl Misc 3 08-03-2005 04:29 PM
convert list of strings to set of regexes; convert list of strings to trie Klaus Neuner Python 7 07-26-2004 07:25 AM
Ado sort error-Ado Sort -Relate, Compute By, or Sort operations cannot be done on column(s) whose key length is unknown or exceeds 10 KB. Navin ASP General 1 09-09-2003 07:16 AM



Advertisments