list problem

P

placid

Hi all,

I have two lists that contain strings in the form string + number for
example
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

and now what ive been trying to do is find the first string that is
available,
i.e a string that is in neither of the two lists so the following code
should only print XXX4 then return.

for i in xrange(1,10):
numpart = str(1) + str("%04i" %i)
str = "XXX" + numpart

for list1_elm in list1:
if list1_elm == str:
break
else:
for list2_elm in list2:
if list2_elm == str:
break
else:
print str
return

Cheer
 
S

Simon Forman

placid said:
Hi all,

I have two lists that contain strings in the form string + number for
example
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

and now what ive been trying to do is find the first string that is
available,
i.e a string that is in neither of the two lists so the following code
should only print XXX4 then return.

for i in xrange(1,10):
numpart = str(1) + str("%04i" %i)
str = "XXX" + numpart

for list1_elm in list1:
if list1_elm == str:
break
else:
for list2_elm in list2:
if list2_elm == str:
break
else:
print str
return

Cheer

Well first off, don't use 'str' for a variable name.

Second, "%04i" % i creates a string, don't call str() on it.

Third, str(1) will always be "1" so just add that to your format string
already "1%04i" % i

(And if the "XXX" part is also constant then add that too: "XXX1%04i" %
i)

Finally, you can say:

for i in xrange(1,10):
s = "XXX1%04i" % i
if s not in list1 and s not in list2:
print s

HTH,
~Simon
 
S

Simon Forman

Simon said:
Finally, you can say:

for i in xrange(1,10):
s = "XXX1%04i" % i
if s not in list1 and s not in list2:
print s

HTH,
~Simon

D'oh! Forgot to break.

for i in xrange(1,10):
s = "XXX1%04i" % i
if s not in list1 and s not in list2:
print s
break

Peace,
~Simon
 
P

placid

Simon said:
placid said:
Hi all,

I have two lists that contain strings in the form string + number for
example
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

and now what ive been trying to do is find the first string that is
available,
i.e a string that is in neither of the two lists so the following code
should only print XXX4 then return.

for i in xrange(1,10):
numpart = str(1) + str("%04i" %i)
str = "XXX" + numpart

for list1_elm in list1:
if list1_elm == str:
break
else:
for list2_elm in list2:
if list2_elm == str:
break
else:
print str
return

Cheer

Well first off, don't use 'str' for a variable name.

Second, "%04i" % i creates a string, don't call str() on it.

Third, str(1) will always be "1" so just add that to your format string
already "1%04i" % i

thanks for the tips
(And if the "XXX" part is also constant then add that too: "XXX1%04i" %
i)

Finally, you can say:

for i in xrange(1,10):
s = "XXX1%04i" % i
if s not in list1 and s not in list2:
print s

But there may be other characters before XXX (which XXX is constant). A
better example would be, that string s is like a file name and the
characters before it are the absolute path, where the strings in the
first list can have a different absolute path then the second list
entries. But the filenames are always exact. So you need to split the
entries bases on "\\" (windows machine) and match on this ?


Cheers
 
S

Simon Forman

placid said:
Simon said:
placid said:
Hi all,

I have two lists that contain strings in the form string + number for
example

list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following

list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

and now what ive been trying to do is find the first string that is
available,
i.e a string that is in neither of the two lists so the following code
should only print XXX4 then return.

for i in xrange(1,10):
numpart = str(1) + str("%04i" %i)
str = "XXX" + numpart

for list1_elm in list1:
if list1_elm == str:
break
else:
for list2_elm in list2:
if list2_elm == str:
break
else:
print str
return

Cheer

Well first off, don't use 'str' for a variable name.

Second, "%04i" % i creates a string, don't call str() on it.

Third, str(1) will always be "1" so just add that to your format string
already "1%04i" % i

thanks for the tips
(And if the "XXX" part is also constant then add that too: "XXX1%04i" %
i)

Finally, you can say:

for i in xrange(1,10):
s = "XXX1%04i" % i
if s not in list1 and s not in list2:
print s

But there may be other characters before XXX (which XXX is constant). A
better example would be, that string s is like a file name and the
characters before it are the absolute path, where the strings in the
first list can have a different absolute path then the second list
entries. But the filenames are always exact. So you need to split the
entries bases on "\\" (windows machine) and match on this ?


Cheers

hmm, a slightly different problem than your OP.

Yeah, I would build a new list (or set) from the contents of BOTH lists
with the prefixes stripped off and test your target string against
that. You might also be able to do something with the endswith()
method of strings.

test = set(n[3:] for n in list1) + set(n[3:] for n in list2)

if s not in test: print s


It's late though, so I may be being stupid. ;-)

Peace,
~Simon
 
B

bearophileHUGS

placid:

This may be a solution:

l1 = ['acXXX1', 'XXX2', 'wXXX3', 'kXXX5']
l2 = [ 'bXXX1', 'xXXX2', 'efXXX3', 'yXXX6', 'zZZZ9']

import re
findnum = re.compile(r"[0-9]+$")
s1 = set(int(findnum.search(el).group()) for el in l1)
s2 = set(int(findnum.search(el).group()) for el in l2)
nmax = max(max(s1), max(s2))
# XXXnmax is surely unavailable
missing = set(range(1, nmax)) - s1 - s2
print ["XXX%d" % i for i in sorted(missing)]

# Output: ['XXX4', 'XXX7', 'XXX8']

If you need more speed you can replace some of those sets (like the
range one) with fors.

Bye,
bearophile
 
Z

zutesmog

placid said:
Hi all,

I have two lists that contain strings in the form string + number for
example
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

and now what ive been trying to do is find the first string that is
available,
i.e a string that is in neither of the two lists so the following code
should only print XXX4 then return.

for i in xrange(1,10):
numpart = str(1) + str("%04i" %i)
str = "XXX" + numpart

for list1_elm in list1:
if list1_elm == str:
break
else:
for list2_elm in list2:
if list2_elm == str:
break
else:
print str
return

Cheer

Just a thought

I would probably use sets and see if the value that you are looking for
using the
union of the two lists. (Yes it won't scale to really big lists)
for instance if you do the following
set1 = set(list1)
set2 = set(list2)


you can then do a single check

if "XXX%d" % i not in set1.union(set2): # or set1 | set2
# do something

Rgds

Tim
 
S

Simon Forman

placid said:
But there may be other characters before XXX (which XXX is constant). A
better example would be, that string s is like a file name and the
characters before it are the absolute path, where the strings in the
first list can have a different absolute path then the second list
entries. But the filenames are always exact. So you need to split the
entries bases on "\\" (windows machine) and match on this ?


Cheers

If you're actually working with filenames and paths then you should use
os.path.basename() to get just the filename parts of the paths.

test = set(map(os.path.basename, list1))
test |= set(map(os.path.basename, list2))

(Note: I *was* being stupid last night, the + operator doesn't work for
sets. You want to use | )

Peace,
~Simon
 
G

Gerard Flanagan

placid said:
Hi all,

I have two lists that contain strings in the form string + number for
example
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

and now what ive been trying to do is find the first string that is
available,
i.e a string that is in neither of the two lists so the following code
should only print XXX4 then return.

for i in xrange(1,10):
numpart = str(1) + str("%04i" %i)
str = "XXX" + numpart

for list1_elm in list1:
if list1_elm == str:
break
else:
for list2_elm in list2:
if list2_elm == str:
break
else:
print str
return

Cheer

I don't know how close the following is to what you want ( or how
efficient etc...). If both lists are the same up to a certain point,
then the first function should do, if not, try the second function.

Gerard

from itertools import izip, dropwhile

def get_first_missing1( seq1, seq2 ):
i = int( seq1[0][-1] )
for x1, x2 in izip( seq1, seq2 ):
if int(x1[-1]) != i and int(x2[-1]) != i:
return x1[:-1] + str(i)
i += 1
return -1

def get_first_missing2( seq1, seq2 ):
i = int( seq1[0][-1] )
j = int( seq2[0][-1] )
if j < i:
seq1, seq2 = seq2, seq1
i, j = j, i
return get_first_missing1( list(dropwhile(lambda s: int(s[-1]) < j,
seq1)), seq2 )

L1 = [ 'XXX1', 'XXX2', 'XXX3', 'XXX5']
L2 = [ 'YYY1', 'YYY2', 'YYY3', 'YYY6']

print get_first_missing1(L1, L2)
print get_first_missing2(L1, L2)
'XXX4'
'XXX4'

L1 = [ 'XXX1', 'XXX2', 'XXX3', 'XXX5']
L2 = [ 'YYY2', 'YYY3', 'YYY5', 'YYY6']

print get_first_missing1(L1, L2)
print get_first_missing2(L1, L2)
'XXX4'
'XXX4'
 
G

Gerard Flanagan

Gerard said:
placid said:
Hi all,

I have two lists that contain strings in the form string + number for
example
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

and now what ive been trying to do is find the first string that is
available,
i.e a string that is in neither of the two lists so the following code
should only print XXX4 then return.

for i in xrange(1,10):
numpart = str(1) + str("%04i" %i)
str = "XXX" + numpart

for list1_elm in list1:
if list1_elm == str:
break
else:
for list2_elm in list2:
if list2_elm == str:
break
else:
print str
return

Cheer

I don't know how close the following is to what you want ( or how
efficient etc...). If both lists are the same up to a certain point,
then the first function should do, if not, try the second function.

Gerard

from itertools import izip, dropwhile

def get_first_missing1( seq1, seq2 ):
i = int( seq1[0][-1] )
for x1, x2 in izip( seq1, seq2 ):
if int(x1[-1]) != i and int(x2[-1]) != i:
return x1[:-1] + str(i)
i += 1
return -1

def get_first_missing2( seq1, seq2 ):
i = int( seq1[0][-1] )
j = int( seq2[0][-1] )
if j < i:
seq1, seq2 = seq2, seq1
i, j = j, i
return get_first_missing1( list(dropwhile(lambda s: int(s[-1]) < j,
seq1)), seq2 )

L1 = [ 'XXX1', 'XXX2', 'XXX3', 'XXX5']
L2 = [ 'YYY1', 'YYY2', 'YYY3', 'YYY6']

print get_first_missing1(L1, L2)
print get_first_missing2(L1, L2)
'XXX4'
'XXX4'

ehm...a bit limited in what it will handle, now that I look at it! like
more than ten items in a list - '11'[-1] == '1'...no time to test
further, sorry:(

Gerard
 
P

Paul Rubin

placid said:
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']

the second list contains strings that are identical to the first list,
so lets say the second list contains the following
list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

I think you meant list2 for the second one. So:


import re

list1 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX5']
list2 = [ ' XXX1', 'XXX2', 'XXX3', 'XXX6']

def num(s):
# get the number out of one of the strings
# (throw exception if no number is there)
digits = re.search('\d+$', s)
return int(digits.group(0))

def get_lowest_unused(list1, list2):
prev = 0
for n in sorted(set(map(num,list1+list2))):
if n != prev+1:
return prev+1
prev = n

print get_lowest_unused(list1, list2)

You could do all this with iterators and save a little memory, but
that's more confusing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top