toy list processing problem: collect similar terms

W

WJ

Xah said:
here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing
the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

Solving without hash-tables or "group-by".

Using Guile:

First, sort the groups by the numbers.

(stable-sort groups (lambda(a b)(< (car a) (car b))))

((0 a b) (1 c d) (1 i j) (2 e f) (2 k l) (2 o p) (3 g h)
(4 m n) (4 q r) (5 s t))

Next, flatten the list.

(append-map identity step1)

(0 a b 1 c d 1 i j 2 e f 2 k l 2 o p 3 g h 4 m n 4 q r 5 s t)

Remove duplicate numbers.

(delete-duplicates step2)

(0 a b 1 c d i j 2 e f k l o p 3 g h 4 m n q r 5 s t)

We now need a very useful function called "scan".

;; Yields sublists of contiguous items that satisfy func.
(define (scan func lst)
(let ((tail (find-tail func lst)))
(if tail
(cons (take-while func tail)
(scan func (drop-while func tail)))
'())))

(scan symbol? step3)

((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
 
N

Nick Mellor

You can just flatten the list using itertools and collate the results usingdefaultdict:

import itertools
from collections import defaultdict

sequence = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),(1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4,'q','r'),(5,'s','t'))

# flatten the list to (0,'a','b',1,'c','d'...
chain = itertools.chain.from_iterable(sequence)
# use defaultdict to set up the final list and deal with initial empty dictionary key
final_list = defaultdict(list)
for i in chain:
if isinstance(i, int):
number = i
else:
final_list[number].append(i)
print (final_list.items())

HTH,

Nick
 
N

Nick Mellor

Python 3 version:

from collections import defaultdict

data = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),(1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4,'q','r'),(5,'s','t'))

register = defaultdict(list)
for number, *letters in data:
register[number].extend(letters)
final = []
for i in sorted(register.keys()):
final.append(register)
print (final)

NB sorting is "stable" in Python, so sorted() will always keep the 1's, 2'setc in the same order as they were in the unsorted list.

Nick
 
N

Nick Mellor

Oops! Not that sort stability is used in this algorithm. Was thinking of something else :)

N

Python 3 version:



from collections import defaultdict



data = ((0,'a','b'),(1,'c','d'),(2,'e','f'),(3,'g','h'),(1,'i','j'),(2,'k','l'),(4,'m','n'),(2,'o','p'),(4,'q','r'),(5,'s','t'))



register = defaultdict(list)

for number, *letters in data:

register[number].extend(letters)

final = []

for i in sorted(register.keys()):

final.append(register)

print (final)



NB sorting is "stable" in Python, so sorted() will always keep the 1's, 2's etc in the same order as they were in the unsorted list.



Nick



here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists

the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to



((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

a Mathematica solution is here:


R5RS Scheme lisp solution:

by Sourav Mukherjee

also, a Common Lisp solution can be found here:


anyone care to give a solution in Python, Perl, javascript, or other
lang? am guessing the scheme solution can be much improved... perhaps
using some lib but that seems to show scheme is pretty weak if the lib
is non-standard.

Xah ∑ xahlee.org ☄
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top