Velocity Reviews > toy list processing problem: collect similar terms

toy list processing problem: collect similar terms

WJ
Guest
Posts: n/a

 02-17-2011
Xah Lee wrote:

> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>

Solving without hash-tables or "group-by".

Using Guile:

First, sort the groups by the numbers.

(stable-sort groups (lambda(a b)(< (car a) (car b))))

((0 a b) (1 c d) (1 i j) (2 e f) (2 k l) (2 o p) (3 g h)
(4 m n) (4 q r) (5 s t))

Next, flatten the list.

(append-map identity step1)

(0 a b 1 c d 1 i j 2 e f 2 k l 2 o p 3 g h 4 m n 4 q r 5 s t)

Remove duplicate numbers.

(delete-duplicates step2)

(0 a b 1 c d i j 2 e f k l o p 3 g h 4 m n q r 5 s t)

We now need a very useful function called "scan".

;; Yields sublists of contiguous items that satisfy func.
(define (scan func lst)
(let ((tail (find-tail func lst)))
(if tail
(cons (take-while func tail)
(scan func (drop-while func tail)))
'())))

(scan symbol? step3)

((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

Nick Mellor
Guest
Posts: n/a

 02-06-2013
You can just flatten the list using itertools and collate the results usingdefaultdict:

import itertools
from collections import defaultdict

sequence = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),( 1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4, 'q','r'),(5,'s','t'))

# flatten the list to (0,'a','b',1,'c','d'...
chain = itertools.chain.from_iterable(sequence)
# use defaultdict to set up the final list and deal with initial empty dictionary key
final_list = defaultdict(list)
for i in chain:
if isinstance(i, int):
number = i
else:
final_list[number].append(i)
print (final_list.items())

HTH,

Nick

On Sunday, September 26, 2010 2:05:13 PM UTC+10, Xah Lee wrote:
> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> a Mathematica solution is here:
> http://xahlee.org/UnixResource_dir/w...tions_mma.html
>
> R5RS Scheme lisp solution:
> http://xahlee.org/UnixResource_dir/w...work_gmail.scm
> by Sourav Mukherjee
>
> also, a Common Lisp solution can be found here:
>
> anyone care to give a solution in Python, Perl, javascript, or other
> lang? am guessing the scheme solution can be much improved... perhaps
> using some lib but that seems to show scheme is pretty weak if the lib
> is non-standard.
>
> Xah ∑ xahlee.org ☄

Nick Mellor
Guest
Posts: n/a

 02-06-2013
Python 3 version:

from collections import defaultdict

data = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),( 1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4, 'q','r'),(5,'s','t'))

register = defaultdict(list)
for number, *letters in data:
register[number].extend(letters)
final = []
for i in sorted(register.keys()):
final.append(register[i])
print (final)

NB sorting is "stable" in Python, so sorted() will always keep the 1's, 2'setc in the same order as they were in the unsorted list.

Nick

On Sunday, 26 September 2010 14:05:13 UTC+10, Xah Lee wrote:
> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> a Mathematica solution is here:
> http://xahlee.org/UnixResource_dir/w...tions_mma.html
>
> R5RS Scheme lisp solution:
> http://xahlee.org/UnixResource_dir/w...work_gmail.scm
> by Sourav Mukherjee
>
> also, a Common Lisp solution can be found here:
>
> anyone care to give a solution in Python, Perl, javascript, or other
> lang? am guessing the scheme solution can be much improved... perhaps
> using some lib but that seems to show scheme is pretty weak if the lib
> is non-standard.
>
> Xah ∑ xahlee.org ☄

Nick Mellor
Guest
Posts: n/a

 02-06-2013
Oops! Not that sort stability is used in this algorithm. Was thinking of something else

N

On Thursday, 7 February 2013 10:25:36 UTC+11, Nick Mellor wrote:
> Python 3 version:
>
>
>
> from collections import defaultdict
>
>
>
> data = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),( 1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4, 'q','r'),(5,'s','t'))
>
>
>
> register = defaultdict(list)
>
> for number, *letters in data:
>
> register[number].extend(letters)
>
> final = []
>
> for i in sorted(register.keys()):
>
> final.append(register[i])
>
> print (final)
>
>
>
> NB sorting is "stable" in Python, so sorted() will always keep the 1's, 2's etc in the same order as they were in the unsorted list.
>
>
>
> Nick
>
>
>
> On Sunday, 26 September 2010 14:05:13 UTC+10, Xah Lee wrote:
>
> > here's a interesting toy list processing problem.

>
> >

>
> > I have a list of lists, where each sublist is labelled by

>
> > a number. I need to collect together the contents of all sublists

>
> > sharing

>
> > the same label. So if I have the list

>
> >

>
> > ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q

>
> > r) (5 s t))

>
> >

>
> > where the first element of each sublist is the label, I need to

>
> > produce:

>
> >

>
> > output:

>
> > ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

>
> >

>
> > a Mathematica solution is here:

>
> > http://xahlee.org/UnixResource_dir/w...tions_mma.html

>
> >

>
> > R5RS Scheme lisp solution:

>
> > http://xahlee.org/UnixResource_dir/w...work_gmail.scm

>
> > by Sourav Mukherjee

>
> >

>
> > also, a Common Lisp solution can be found here:

>

>
> >

>
> > anyone care to give a solution in Python, Perl, javascript, or other

>
> > lang? am guessing the scheme solution can be much improved... perhaps

>
> > using some lib but that seems to show scheme is pretty weak if the lib

>
> > is non-standard.

>
> >

>
> > Xah ∑ xahlee.org ☄