intersection of 2 list of pairs

L

les_ander

Hi,
I have 2 lists of tuples that look like:
E1=[('a','g'),('r','s')] and
E2=[('g','a'),('r','q'),('f','h')].
In this tuple, the ordering does not
matter, i.e. (u,v) is the same as (v,u).

What I want to do is the following:
given 2 list of tuples, E1 and E2, I want to create another list with
tuples that are common to both. So in the above example I would like
to return ('a','g') as being common.
So far I have the code below but it does not work out properly, I would
be grateful if someone can help me out .
thanks

def list_of_tuples_intersect(E1,E2):

S1=Set()
S2=Set()
for e in E1:
S1.add((e[0],e[1]))
S1.add((e[1],e[0]))
for e in E2:
S2.add((e[0],e[1]))
S2.add((e[1],e[0]))
S= S1 & S2
SS=Set()
done=Set()

for e in S:
if ((e[0],e[1]) in done) or ((e[1],e[0]) in done):
continue
else:
SS.add((e[0],e[1]))
done.add((e[0],e[1]))
done.add((e[1],e[0]))
return SS
 
P

Peter Otten

I have 2 lists of tuples that look like:
E1=[('a','g'),('r','s')] and
E2=[('g','a'),('r','q'),('f','h')].
In this tuple, the ordering does not
matter, i.e. (u,v) is the same as (v,u).

What I want to do is the following:
given 2 list of tuples, E1 and E2, I want to create another list with
tuples that are common to both. So in the above example I would like
to return ('a','g') as being common.

How about
e1 = [('a', 'g'), ('r', 's')]
e2 = [('g', 'a'), ('r', 'q'), ('f', 'h')]
s2 = set(e2)
s2.update((b, a) for (a, b) in e2)
list(set(e1) & s2)
[('a', 'g')]

If you are on 2.3, continue to use Set instead of set and modify the update
line to use a list comprehension:

s2.update([(b, a) for (a, b) in e2])

Peter
 
J

John Machin

Hi,
I have 2 lists of tuples that look like:
E1=[('a','g'),('r','s')] and
E2=[('g','a'),('r','q'),('f','h')].
In this tuple, the ordering does not
matter, i.e. (u,v) is the same as (v,u).

What I want to do is the following:
given 2 list of tuples, E1 and E2, I want to create another list with
tuples that are common to both. So in the above example I would like
to return ('a','g') as being common.
So far I have the code below but it does not work out properly, I would
be grateful if someone can help me out .
[snip]

This is likely to do what you want, provided the types of the objects
in your tuples define an ordering. The idea is to change the
representation of the tuples to a canonical form (a, b) where a <= b.
By the way, have you considered that there might be duplicates *within*
each list?
E1=[('a','g'),('r','s')]
E2=[('g','a'),('r','q'),('f','h')]
s1 = set((min(x,y),max(x,y)) for x, y in E1)
s1 set([('a', 'g'), ('r', 's')])
s2 = set((min(x,y),max(x,y)) for x, y in E2)
s2 set([('a', 'g'), ('q', 'r'), ('f', 'h')])
answer = s1 & s2
answer set([('a', 'g')])

Here's a hint: when you find yourself typing stuff like (x[0],x[1])
..... (x[1],x[0]) more than minimally, the 'wrong way, go back' sign
should flash up.

HTH,

John
 
J

James Stroud

Learned list comprehension today, its cool:

E1=[('a','g'),('r','s')]
E2=[('g','a'),('r','q'),('f','h')]
[x for x in E1 for y in E2 if x == y or (x[1],x[0])==y]
[('a', 'g')]




Hi,
I have 2 lists of tuples that look like:
E1=[('a','g'),('r','s')] and
E2=[('g','a'),('r','q'),('f','h')].
In this tuple, the ordering does not
matter, i.e. (u,v) is the same as (v,u).

What I want to do is the following:
given 2 list of tuples, E1 and E2, I want to create another list with
tuples that are common to both. So in the above example I would like
to return ('a','g') as being common.
So far I have the code below but it does not work out properly, I would
be grateful if someone can help me out .
thanks

def list_of_tuples_intersect(E1,E2):

S1=Set()
S2=Set()
for e in E1:
S1.add((e[0],e[1]))
S1.add((e[1],e[0]))
for e in E2:
S2.add((e[0],e[1]))
S2.add((e[1],e[0]))
S= S1 & S2
SS=Set()
done=Set()

for e in S:
if ((e[0],e[1]) in done) or ((e[1],e[0]) in done):
continue
else:
SS.add((e[0],e[1]))
done.add((e[0],e[1]))
done.add((e[1],e[0]))
return SS
 
P

Pierre Quentel

Another method is to build two sets of sets, one for E1 and one for E2,
then make the intersection of these sets

- with Python 2.3
>>> E1=[('a','g'),('r','s')]
>>> E2=[('g','a'),('r','q'),('f','h')]
>>> from sets import Set,ImmutableSet
>>> f=Set([ImmutableSet(s) for s in E1])& Set([ImmutableSet(s) for s in E2])
>>> [tuple(x) for x in f]
[('a', 'g')]

- with Python 2.4
>>> E1=[('a','g'),('r','s')]
>>> E2=[('g','a'),('r','q'),('f','h')]
>>> f=set([frozenset(s) for s in E1]) & set([frozenset(s) for s in E2])
>>> [tuple(x) for x in f]
[('a', 'g')]

You must use ImmutableSet or frozenset to be able to create a set of sets

Pierre
 
J

John Machin

Pierre said:
Another method is to build two sets of sets, one for E1 and one for E2,
then make the intersection of these sets

- with Python 2.3
E1=[('a','g'),('r','s')]
E2=[('g','a'),('r','q'),('f','h')]
from sets import Set,ImmutableSet
f=Set([ImmutableSet(s) for s in E1])& Set([ImmutableSet(s) for s
in
E2])
[tuple(x) for x in f]
[('a', 'g')]

- with Python 2.4
E1=[('a','g'),('r','s')]
E2=[('g','a'),('r','q'),('f','h')]
f=set([frozenset(s) for s in E1]) & set([frozenset(s) for s in E2])
[tuple(x) for x in f]
[('a', 'g')]

You must use ImmutableSet or frozenset to be able to create a set of sets

Pierre

The problem with using frozenset([x,y)) -- as compared to
(min(x,y),max(x,y))-- is that you don't get an *understandable*
canonical representation of the pair:
frozenset(('a','b')) frozenset(['a', 'b'])
frozenset(('b','c'))
frozenset(['c', 'b'])

i.e. it depends on the Python hash function for the type/class being
used. Try explaining that to the first person you meet at the bus stop.
Exceptions prove the rule: if the OP regularly caught the same bus as
the timbot he wouldn't be asking questions like this :)

In any case the OP still has a problem; when ('a','g') -- or ('g', 'a')
--pops out as the answer, he still doesn't know which list(s) the
duplicates are in, nor which way they are represented. To do something
useful, the output would have to be a list of lists, with the inner
list representing a cluster of duplicates. Each duplicate would be
represented by a unique identifier -- such as a tuple: (list_number,
index_in_that_list) -- so that it can be retrieved, inspected,
jettisoned, ...
 
B

bearophileHUGS

The use of frozenset can okay when sub-sequences are longer, but for
this problem it's slow, and anyway there are situations of repeated
data like this that have to be considered:

frozenset( ('a', 'a') )
==>
frozenset(['a'])

For Py2.4 the faster and better solution seems Peter Otten one.
James Stroud solution is O(n^2) and it can become slow for long
lists...

Bear hugs,
Bearophile
 
J

John Machin

The use of frozenset can okay when sub-sequences are longer, but for
this problem it's slow, and anyway there are situations of repeated
data like this that have to be considered:

Not just for frozenset; this has to be considered whatever the
representation.
frozenset( ('a', 'a') )
==>
frozenset(['a'])

For Py2.4 the faster and better solution seems Peter Otten one.
James Stroud solution is O(n^2) and it can become slow for long
lists...

"Better"? In what sense? My point is that the OP's real problem
requires (a) a workable canonical representation of a pair -- lack of
this is most probably the root cause of the problem that he is trying
to solve!! -- and (b) a result that identifies the duplicate pairs
unambiguously so that he can do something with them.

"Faster"? The solutions proposed by Peter and James do more-or-less
answer the OP's ill-formed question, but speed should be the last
consideration, after the real problem is defined carefully.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,221
Messages
2,571,133
Members
47,747
Latest member
swapote

Latest Threads

Top