comparing two lists and returning "position"

H

hiro

Hi there, I have a 2 lists.. for simplicities sake lets say the are:

l1 = [ 'abc' 'ghi' 'mno' ]

l2 = [ 'abc' 'def' 'ghi' 'jkl 'mno' 'pqr']

what I need to do is compare l1 against l2 and return the "position"
of where each object in l1 is in l2

ie: pos = 0, 2, 4


Thanks in advance, -h
 
N

Neil Cerutti

Hi there, I have a 2 lists.. for simplicities sake lets say the are:

l1 = [ 'abc' 'ghi' 'mno' ]

l2 = [ 'abc' 'def' 'ghi' 'jkl 'mno' 'pqr']

what I need to do is compare l1 against l2 and return the "position"
of where each object in l1 is in l2

ie: pos = 0, 2, 4

Thanks in advance, -h

Come, come! You can try harder than that.
 
P

Paul Rubin

hiro said:
what I need to do is compare l1 against l2 and return the "position"
of where each object in l1 is in l2

ie: pos = 0, 2, 4

Is it September already?

from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)

Heh heh heh.
 
H

hiro

Paul said:
from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)

or probably less efficiently ...
l1 = [ 'abc', 'ghi', 'mno' ]
l2 = [ 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr']
pos = [ l2.index(i) for i in l1 ]
print pos
[0, 2, 4]

Charles

Hey Guys thanks for the feedback and the suggestions.
Charles I got your implementation to work so many thanks for this.

this is what I had so far

for spam in l1:
for eggs in l2:
if spam == eggs:
print "kaka", spam, eggs

so its almost working just need the index, I'll
continue playing with the nested loop approach for a bit more.

Thanks once again guys
 
S

Steven D'Aprano

Hi there, I have a 2 lists.. for simplicities sake lets say the are:

l1 = [ 'abc' 'ghi' 'mno' ]

l2 = [ 'abc' 'def' 'ghi' 'jkl 'mno' 'pqr']

what I need to do is compare l1 against l2 and return the "position" of
where each object in l1 is in l2

ie: pos = 0, 2, 4


Thanks for sharing. Did you have a question, or did you just want to tell
us what you were doing?


Thanks in advance, -h

My pleasure.
 
H

hiro

Paul Rubin wrote:
or probably less efficiently ...
l1 = [ 'abc', 'ghi', 'mno' ]
l2 = [ 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr']
pos = [ l2.index(i) for i in l1 ]
print pos
[0, 2, 4]

Hey Guys thanks for the feedback and the suggestions.
Charles I got your implementation to work so many thanks for this.

this is what I had so far

for spam in l1:
for eggs in l2:
if spam == eggs:
print "kaka", spam, eggs

so its almost working just need the index, I'll
continue playing with the nested loop approach for a bit more.

Thanks once again guys

Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)

python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32

error is : ValueError: list.index(x): x not in list

when using Charles's
pos = [ l2.index(i) for i in l1 ]
print pos

does anybody know of if I have to many data points ? the nested for
loop approach seems to be working(still have get the index "position"
returned though)
Charles's approach works fine with less data.

Cheers, -d
 
M

Marc 'BlackJack' Rintsch

Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)

python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32

error is : ValueError: list.index(x): x not in list

So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.

Ciao,
Marc 'BlackJack' Rintsch
 
H

hiro

Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
error is : ValueError: list.index(x): x not in list

So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.

Ciao,
Marc 'BlackJack' Rintsch

yes I do
 
H

hiro

Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
error is : ValueError: list.index(x): x not in list
So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.
Ciao,
Marc 'BlackJack' Rintsch

yes I do

I doubled, trippled check my data already (even doing a search by hand
using vim) and the data is fine. Still looking into it though
 
H

hiro

Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
error is : ValueError: list.index(x): x not in list
So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.
Ciao,
Marc 'BlackJack' Rintsch

I doubled, trippled check my data already (even doing a search by hand
using vim) and the data is fine. Still looking into it though


hahaha, K found out what was wrong.. in the function computing
the data for l1 there was extra space was being put in.

ie:

l1 = [ 'abc ' 'ghi ' 'mno ' ]

and I didn't strip it properly after splitting it.. silly me,

well.. live and learn.. thanks guys

Cheers, -h
 
C

Charles Sanders

hiro said:
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)

Glad to see you solved the problem with the trailing space.

Just one minor point, I did say
> or probably less efficiently ...

As far as i know, my suggestion's running time is
proportional to len(l1)*len(l2), which gets quite
big for your case where l1 and l2 are large lists.

If I understand how python dictionaries work, Paul Rubin's
suggestion
> from itertools import izip, count
> pos = map(dict(izip(l2, count())).__getitem__, l1)

or the (I think) approximately equivalent

from itertools import izip, count
d = dict(izip(l2,count()))
pos = [ d for i in l1 ]

or the more memory intensive

d = dict(zip(l2,range(len(l2))))
pos = [ d for i in l1 ]

should all take take running time proportional to
(len(l1)+len(l2))*log(len(l2))

For len(l1)=4,000 and len(l2)=10,000,000
Paul's suggestion is likely to take
about 1/100th of the time to run, ie
be about 100 times as fast. I was trying
to point out a somewhat clearer and simpler
(but slower) alternative.

Charles
 
P

Paul Rubin

Charles Sanders said:
from itertools import izip, count
d = dict(izip(l2,count()))
pos = [ d for i in l1 ]

or the more memory intensive

d = dict(zip(l2,range(len(l2))))
pos = [ d for i in l1 ]


If you're itertools-phobic you could alternatively write

d = dict((x,i) for i,x in enumerate(l2))
pos = [ d for i in l1 ]

dict access and update is supposed to take approximately constant time,
btw. They are implemented as hash tables.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top