Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > toy list processing problem: collect similar terms

Reply
Thread Tools

toy list processing problem: collect similar terms

 
 
WJ
Guest
Posts: n/a
 
      02-17-2011
Xah Lee wrote:

> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>


Solving without hash-tables or "group-by".

Using Guile:

First, sort the groups by the numbers.

(stable-sort groups (lambda(a b)(< (car a) (car b))))

((0 a b) (1 c d) (1 i j) (2 e f) (2 k l) (2 o p) (3 g h)
(4 m n) (4 q r) (5 s t))

Next, flatten the list.

(append-map identity step1)

(0 a b 1 c d 1 i j 2 e f 2 k l 2 o p 3 g h 4 m n 4 q r 5 s t)

Remove duplicate numbers.

(delete-duplicates step2)

(0 a b 1 c d i j 2 e f k l o p 3 g h 4 m n q r 5 s t)

We now need a very useful function called "scan".

;; Yields sublists of contiguous items that satisfy func.
(define (scan func lst)
(let ((tail (find-tail func lst)))
(if tail
(cons (take-while func tail)
(scan func (drop-while func tail)))
'())))

(scan symbol? step3)

((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
 
Reply With Quote
 
 
 
 
Nick Mellor
Guest
Posts: n/a
 
      02-06-2013
You can just flatten the list using itertools and collate the results usingdefaultdict:

import itertools
from collections import defaultdict

sequence = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),( 1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4, 'q','r'),(5,'s','t'))

# flatten the list to (0,'a','b',1,'c','d'...
chain = itertools.chain.from_iterable(sequence)
# use defaultdict to set up the final list and deal with initial empty dictionary key
final_list = defaultdict(list)
for i in chain:
if isinstance(i, int):
number = i
else:
final_list[number].append(i)
print (final_list.items())

HTH,

Nick

On Sunday, September 26, 2010 2:05:13 PM UTC+10, Xah Lee wrote:
> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> a Mathematica solution is here:
> http://xahlee.org/UnixResource_dir/w...tions_mma.html
>
> R5RS Scheme lisp solution:
> http://xahlee.org/UnixResource_dir/w...work_gmail.scm
> by Sourav Mukherjee
>
> also, a Common Lisp solution can be found here:
> http://groups.google.com/group/comp....ded8824bc750b?
>
> anyone care to give a solution in Python, Perl, javascript, or other
> lang? am guessing the scheme solution can be much improved... perhaps
> using some lib but that seems to show scheme is pretty weak if the lib
> is non-standard.
>
> Xah ∑ xahlee.org ☄

 
Reply With Quote
 
 
 
 
Nick Mellor
Guest
Posts: n/a
 
      02-06-2013
Python 3 version:

from collections import defaultdict

data = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),( 1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4, 'q','r'),(5,'s','t'))

register = defaultdict(list)
for number, *letters in data:
register[number].extend(letters)
final = []
for i in sorted(register.keys()):
final.append(register[i])
print (final)

NB sorting is "stable" in Python, so sorted() will always keep the 1's, 2'setc in the same order as they were in the unsorted list.

Nick

On Sunday, 26 September 2010 14:05:13 UTC+10, Xah Lee wrote:
> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> a Mathematica solution is here:
> http://xahlee.org/UnixResource_dir/w...tions_mma.html
>
> R5RS Scheme lisp solution:
> http://xahlee.org/UnixResource_dir/w...work_gmail.scm
> by Sourav Mukherjee
>
> also, a Common Lisp solution can be found here:
> http://groups.google.com/group/comp....ded8824bc750b?
>
> anyone care to give a solution in Python, Perl, javascript, or other
> lang? am guessing the scheme solution can be much improved... perhaps
> using some lib but that seems to show scheme is pretty weak if the lib
> is non-standard.
>
> Xah ∑ xahlee.org ☄

 
Reply With Quote
 
Nick Mellor
Guest
Posts: n/a
 
      02-06-2013
Oops! Not that sort stability is used in this algorithm. Was thinking of something else

N

On Thursday, 7 February 2013 10:25:36 UTC+11, Nick Mellor wrote:
> Python 3 version:
>
>
>
> from collections import defaultdict
>
>
>
> data = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),( 1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4, 'q','r'),(5,'s','t'))
>
>
>
> register = defaultdict(list)
>
> for number, *letters in data:
>
> register[number].extend(letters)
>
> final = []
>
> for i in sorted(register.keys()):
>
> final.append(register[i])
>
> print (final)
>
>
>
> NB sorting is "stable" in Python, so sorted() will always keep the 1's, 2's etc in the same order as they were in the unsorted list.
>
>
>
> Nick
>
>
>
> On Sunday, 26 September 2010 14:05:13 UTC+10, Xah Lee wrote:
>
> > here's a interesting toy list processing problem.

>
> >

>
> > I have a list of lists, where each sublist is labelled by

>
> > a number. I need to collect together the contents of all sublists

>
> > sharing

>
> > the same label. So if I have the list

>
> >

>
> > ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q

>
> > r) (5 s t))

>
> >

>
> > where the first element of each sublist is the label, I need to

>
> > produce:

>
> >

>
> > output:

>
> > ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

>
> >

>
> > a Mathematica solution is here:

>
> > http://xahlee.org/UnixResource_dir/w...tions_mma.html

>
> >

>
> > R5RS Scheme lisp solution:

>
> > http://xahlee.org/UnixResource_dir/w...work_gmail.scm

>
> > by Sourav Mukherjee

>
> >

>
> > also, a Common Lisp solution can be found here:

>
> > http://groups.google.com/group/comp....ded8824bc750b?

>
> >

>
> > anyone care to give a solution in Python, Perl, javascript, or other

>
> > lang? am guessing the scheme solution can be much improved... perhaps

>
> > using some lib but that seems to show scheme is pretty weak if the lib

>
> > is non-standard.

>
> >

>
> > Xah ∑ xahlee.org ☄


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
toy list processing problem: collect similar terms Xah Lee Perl Misc 43 02-17-2011 07:19 AM
Any similar Webcam broadcasting site similar to youtube Chaudhry Nijjhar Computer Support 0 02-19-2008 11:48 PM
Can some explain Context.list() in DNS terms? robert Java 3 12-18-2006 04:48 PM
New toy.. new toy! Shane NZ Computing 9 03-10-2006 06:40 AM
Toy Story 1 & 2 SE vs Toy Box Set byronrobinson@sympatico.ca DVD Video 2 12-30-2005 11:14 PM



Advertisments