In that case, the problem can be solved in O(n) time by a simple loop
which counts the number of differences and notes the position of the
first (if any) difference. Any full-distance Levenshtein method that
does no pre-processing must take O(n**2) time. It is possible to take
advantage of the fact that the OP doesn't care about what the distance
is exactly if it is not 1; 0 is just as bad as 2, 3, 4, etc.
Here is a class that implements 4 different approaches to this comparison
problem, with a configurable number of maximum mismatches (where a and b are
the strings being compared):
1. use sum(map(lambda (x,y): x!=y, itertools.izip(a,b))) to count
mismatches - no short-circuiting
2. use sum(map(abs, itertools.starmap(cmp, itertools.izip(a,b)))) to count
mismatches - no short-circuiting
3. use explicit for loop to compare each tuple returned from
itertools.izip(a,b) - short-circuits when number of mismatches exceeds
allowable number
4. use for loop over itertools.takewhile to count mismatches -
short-circuits when number of mismatches exceeds allowable number
Also note the short-circuiting in the initializer - if the strings being
compared are shorter in length then the number of allowed mismatches, than
they will always match, no comparisons required. (In the OP's specific
case, any two one-character strings will match within one character).
Of course this does nothing to address the larger issue of the program
design - I just wanted to learn a little more about itertools.
-- Paul
from itertools import izip,starmap,takewhile
class offByNoMoreThanOneCharacter(object):
def __init__(self,a,b,maxMismatches=1):
len_a = len(a)
self.val = None
if len_a != len(b):
self.val = False
elif len_a <= maxMismatches:
self.val = True
else:
self.a = a
self.b = b
self.maxMismatches = maxMismatches
def eval1(self):
# one-liner using map with lambda - sum counts mismatches
self.val = sum(map(lambda (x,y): x!=y, izip(self.a,self.b))) <=
self.maxMismatches
def eval2(self):
# one-liner using map with abs and cmp - sum counts mismatches
self.val = sum(map(abs, starmap(cmp, izip(self.a,self.b)))) <=
self.maxMismatches
def eval3(self):
# explicit traversal, with short-circuit when too many mismatches
found
mismatches = 0
for (ac,bc) in izip(self.a,self.b):
if ac != bc:
mismatches += 1
if mismatches > self.maxMismatches:
self.val = False
break
else:
self.val = True
def eval4(self):
# traversal using takewhile, also short-circuits when too many
mismatches found
numMismatches = 0
maxMismatches = self.maxMismatches
stillOk = lambda (s,t)=(None,None) : numMismatches <= maxMismatches
for (ac,bc) in takewhile(stillOk, izip(self.a,self.b)):
numMismatches += (ac != bc)
self.val = stillOk()
# special instance method to return "boolean-ness" of this object
def __nonzero__(self):
if self.val is None:
# change this line to use whichever eval method you want to test
self.eval1()
return self.val
def test(a,b):
print a,b
print bool(offByNoMoreThanOneCharacter(a,b))
print
test("abc","abd")
test("abc","axd")
test("yaqtil","yaqtel")
test("yiqtol","yaqtel")