XORing long strings opimization?

Noen · Nov 4, 2003

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for c1 in s1:
for c2 in s2:
output += chr(ord(c1) ^ ord(c2))
return output

This way is very slow for large strings.
Anyone know of a better and faster way to do it?

Peter Otten · Nov 4, 2003

Noen said:
def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for c1 in s1:
for c2 in s2:
output += chr(ord(c1) ^ ord(c2))
return output

This way is very slow for large strings.
Anyone know of a better and faster way to do it?

Before we start optimizing:

len(XOR(s1, s2)) == len(s1) * len(s2)

Bug or feature?

Peter

Noen · Nov 4, 2003

Peter said:
Noen wrote:

Before we start optimizing:

len(XOR(s1, s2)) == len(s1) * len(s2)

Bug or feature?

Peter

Oh, didnt notice that as I wrote it. Thanks...
Anyway, this is how it should be.

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

Peter Otten · Nov 4, 2003

Noen said:
Oh, didnt notice that as I wrote it. Thanks...
Anyway, this is how it should be.

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

Oops, I missed one:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "xor2.py", line 7, in XOR
if len(s1) != len(s2):
TypeError: len() of unsized object

Did you expect this?

Peter

Terry Reedy · Nov 4, 2003

Noen said:
def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

You fell into the O(n**2) immutable sequence repeated addition trap.
For O(n) behavior, try output = [] and return ''.join(output).
Replacing output+=stuff with output.append(stuff) might be faster
still.

Terry J. Reedy

Noen · Nov 4, 2003

Peter said:
Noen wrote:

Oh, didnt notice that as I wrote it. Thanks...
Anyway, this is how it should be.

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

Click to expand...

Oops, I missed one:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "xor2.py", line 7, in XOR
if len(s1) != len(s2):
TypeError: len() of unsized object

Did you expect this?

Peter

nope, thought the Argument check would do it, but I used and opeartor
instead of or. Well, here is XOR v.0.3b Any comments how to make it
faster?

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) or type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

Peter Otten · Nov 4, 2003

Noen said:
def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) or type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

I keep nagging:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "xor3.py", line 7, in XOR
if len(s1) != len(s2):
TypeError: len() of unsized object

Must be in a nasty mood this evening
Seriously, I think that "first make it right, then make it fast" is a
reasonable design principle.
You can check the type

if type(s1) != type("") or type(s2) != type(""):
....

or

if not isinstance(s1, str) or not isinstance(s2, str):
....

or even

if not type(s1) == type(s2) == type(""):
....

but personally I would omit the check altogether. The second check could be
relaxed to len(s1) <= len(s2), assuming that s1 is the data to be encrypted
and s2 the random character sequence that makes your encryption NSA-proof.
For (much) weaker encryption schemes, you could omit it completely and
cycle over the characters of s2 (not recommended).

As to speed, I would suppose the following to be somewhat faster:

def xor(s1, s2):
return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip(s1,
s2)])

It favours "".join(character list) over consecutive s += tail operations,
which need to build "many" strings as intermediate results.

I think, further speedups could be achieved building a lookup table for all
character combinations.

Peter

Ulrich Petri · Nov 4, 2003

Peter Otten said:
Noen said:

Oh, didnt notice that as I wrote it. Thanks...
Anyway, this is how it should be.

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

Click to expand...

Oops, I missed one:

Therese even more

i guess he meant:
Ciao Ulrich

Francis Avila · Nov 4, 2003

Noen said:
nope, thought the Argument check would do it, but I used and opeartor
instead of or. Well, here is XOR v.0.3b Any comments how to make it
faster?

How about using the struct module?
Caveat: I don't know if struct limits the repeat count.

import struct
def XOR(data, key):
if len(data) != len(key):
raise ValueError, "data and key not of equal length"
ldata = struct.unpack('=%sb' % len(data), data)
lkey = struct.unpack('=%sb' % len(key), key)
lxored = [d^k for d,k in zip(ldata, lkey)]
xored = struct.pack('=%sb' % len(lxored), *lxored)
return xored

Just an idea.

Alex Martelli · Nov 4, 2003

Peter said:
Noen said:

Oh, didnt notice that as I wrote it. Thanks...
Anyway, this is how it should be.

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

Click to expand...

Oops, I missed one:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "xor2.py", line 7, in XOR
if len(s1) != len(s2):
TypeError: len() of unsized object

Checks apart -- on to optimization as per subject:

[alex@lancelot krb5module]$ timeit.py -c -s'from string import letters as S'
'"".join([chr(ord(S)^ord(S)) for i in xrange(len(S))])'
10000 loops, best of 3: 113 usec per loop

i'm not sure i can do much better with strings. arrays are cool, though:

[alex@lancelot krb5module]$ timeit.py -c -s'from string import letters as S'
-s'import array' 'x=array.array("b",S)' 'for i,v in enumerate(x): x^=v'
'x.tostring()'
10000 loops, best of 3: 57 usec per loop

Morale: when you find yourself fiddling with lots of chr and ord, and care
about performance, think about "array('b', ... -- it could buy you an easy
speedup by a factor of 2, like here.

Alex

Francis Avila · Nov 4, 2003

about performance, think about "array('b', ... -- it could buy you an easy
speedup by a factor of 2, like here.

Why we should read a module a day: just today I learned about the array
module, when I *had* been using struct for similar tasks....

Sigh; if only the standard library weren't so *big*

.

Noen · Nov 4, 2003

Here is the result, if anyone wants it

It was quite quick indeed, It took me 10.3279999495 seconds to XOR
"a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
if anyone know how to make it even faster, please tell me

import string
def xorstring(s1,s2):
""" XOR string s1 with s2 """
# Argument check
if not (type(s1) == type("")) or not (type(s2) == type("")):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xorlist,"") # Backward compatible

Peter Otten · Nov 5, 2003

Francis said:
Noen said:

nope, thought the Argument check would do it, but I used and opeartor
instead of or. Well, here is XOR v.0.3b Any comments how to make it
faster?

Click to expand...

How about using the struct module?
Caveat: I don't know if struct limits the repeat count.

import struct
def XOR(data, key):
if len(data) != len(key):
raise ValueError, "data and key not of equal length"
ldata = struct.unpack('=%sb' % len(data), data)
lkey = struct.unpack('=%sb' % len(key), key)
lxored = [d^k for d,k in zip(ldata, lkey)]
xored = struct.pack('=%sb' % len(lxored), *lxored)
return xored

Just an idea.

Nope. Both your and my approach make it worse. Only Alex Martelli did it
right:

import itertools, array, struct

def xor_noen(s1,s2):
""" XOR string s1 with s2 """
output = ""
for i in range(len(s1)):
output += chr(ord(s1) ^ ord(s2))
return output

def xor_po(s1, s2):
return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip(s1,
s2)])

def xor_am(s1, s2):
x = array.array("b", s2)
for i, v in enumerate(array.array("b", s1)):
x ^= v
return x.tostring()

def xor_fa(data, key):
ldata = struct.unpack('=%sb' % len(data), data)
lkey = struct.unpack('=%sb' % len(key), key)
lxored = [d^k for d,k in zip(ldata, lkey)]
xored = struct.pack('=%sb' % len(lxored), *lxored)
return xored

# 35 usec
def pro_noen():
return xor_noen("sgnirts ni sreffid eziS", "Size differs in strings")

# 40 usec
def pro_po():
return xor_po("sgnirts ni sreffid eziS", "Size differs in strings")

# 23 usec
def pro_am():
return xor_am("sgnirts ni sreffid eziS", "Size differs in strings")

# 46 usec
def pro_fa():
return xor_fa("sgnirts ni sreffid eziS", "Size differs in strings")

assert pro_po() == pro_am() == pro_fa() == pro_noen()

I was surprised how good pro_noen() does, but that might change for large
strings. By the way, results varied significantly (several usecs) in
various test runs.

Peter

Casey Whitelaw · Nov 5, 2003

Noen said:
Here is the result, if anyone wants it
It was quite quick indeed, It took me 10.3279999495 seconds to XOR
"a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
if anyone know how to make it even faster, please tell me

import string
def xorstring(s1,s2):
""" XOR string s1 with s2 """
# Argument check
if not (type(s1) == type("")) or not (type(s2) == type("")):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xorlist,"") # Backward compatible

Just fiddling around with the (excellent) suggestions already posted,
and found a couple of things.

Alex's suggestion using array is the fastest, but was slightly faster
when using xrange() rather than enumerate():

def xor6(s1, s2):
a1 = array.array("b", s1)
a2 = array.array("b", s2)

for i in xrange(len(a1)):
a1 ^= a2
return a1.tostring()

On my system, this XORs 500,000 characters in 0.39s, with Noen's
original taking 0.67s.

Using psyco produced a further ~3x speedup, reducing this to 0.14s.
That's pretty fast!

Casey

Georgy Pruss · Nov 5, 2003

Well, nothing to add actually, but my results:

import string
import array

def xorstring0(s1,s2): # the original method
# argument tests were here

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xorlist,"") # Backward compatible

def xorstring1(s1,s2):
xorlist = [chr(ord(x) ^ ord(y)) for (x, y) in zip(s1, s2)]
return string.join(xorlist,"") # Backward compatible

def xorstring2(s1,s2):
xorlist = [chr(ord(s1) ^ ord(s2)) for i in range(len(s1))] # range
return string.join(xorlist,"") # Backward compatible

def xorstring3(s1,s2):
xorlist = [chr(ord(s1) ^ ord(s2)) for i in xrange(len(s1))] # xrange
return string.join(xorlist,"") # Backward compatible

def xorstring4(s1,s2):
a1 = array.array("B", s1)
a2 = array.array("B", s2)
for i in xrange(len(a1)):
a1 ^= a2
return a1.tostring()

s1 = 'abcew'*200000
s2 = 'xyzqa'*200000

fns = [ xorstring0, xorstring1, xorstring2, xorstring3, xorstring4 ]

# check if all works OK for some short data --
# protection from some rough error or typo
sx = fns[0]( s1[:100], s2[:100] )
for fn in fns:
assert sx == fn( s1[:100], s2[:100] )

import time

tms = [60] * len(fns) # assume some unreal big times

for n in range(30): # loop 30 times

for i in range(len(fns)):

tb = time.time()
sx = fns( s1, s2 ) # do it!
tm = time.time() - tb
tms = min( tms, tm )

for i in range(len(fns)):
print "fn%d -- %.3f sec -- %.2f" % (i,tms,tms/tms[-1])

# fn0 -- 5.578 sec -- 6.37
# fn1 -- 4.593 sec -- 5.25
# fn2 -- 2.609 sec -- 2.98
# fn3 -- 2.531 sec -- 2.89
# fn4 -- 0.875 sec -- 1.00

BTW, for the same million char strings, a C function runs 0.0035 sec, or 250 times faster!

G-:

Alex Martelli · Nov 5, 2003

Georgy said:
Well, nothing to add actually, but my results:

...but of course when in a hurry one should remember...
psyco
!!!

So I gave your script a little tweak...:

fns = [ xorstring0, xorstring1, xorstring2, xorstring3, xorstring4 ]

i.e. right here I added:

import psyco
psyfns = [ psyco.proxy(f) for f in fns ]
fns.extend(psyfns)

so each function is tried in both the plain and psycotic form --
the latter is at an index 5 above (fn5 is the psycoized xorstring0,
etc etc up to fn9 which is the psycoed xorstring4).

# fn0 -- 5.578 sec -- 6.37
# fn1 -- 4.593 sec -- 5.25
# fn2 -- 2.609 sec -- 2.98
# fn3 -- 2.531 sec -- 2.89
# fn4 -- 0.875 sec -- 1.00

BTW, for the same million char strings, a C function runs 0.0035 sec, or
250 times faster!

My measurements give:

fn0 -- 19.229 sec -- 347.98
fn1 -- 12.489 sec -- 226.01
fn2 -- 4.719 sec -- 85.39
fn3 -- 4.359 sec -- 78.88
fn4 -- 2.046 sec -- 37.02

fn5 -- 16.736 sec -- 302.86
fn6 -- 9.675 sec -- 175.08
fn7 -- 1.045 sec -- 18.90
fn8 -- 1.048 sec -- 18.96
fn9 -- 0.055 sec -- 1.00

so, my machine is quite a bit slower (it IS getting long in the tooth --
over 30 months -- and wasn't the absolute speediest even then;-) by a
factor that's quite variable, 1.72 to 3.45 times, median 2.34 times.

psyco helps marginally on some approaches, 4+ times on others, and
almost 40 times on the fn4 approach, which happens to be the fastest
even without it (luckily...

. Still not a match for your reported
C function, but just 6-7 times slower (guessing), not 250 times.

One thing worth checking is whether the end result IS needed as a
string, because some more time might be saved by e.g. using buffer(x)
rather than x.tostring() for the result array x, in many cases.

Another possibility to be always kept in mind for bulk array operations
is Numeric. xor-ing 10,000-long byte arrays takes about 10 msec with
array.array (needing a Python-level loop) versus 50 microseconds with
Numeric.array (no loop needed), so a factor of 200 (over the fastest
python-cum-standard-library only approach) is easily available and may
be getting even closer than psyco to bare-C speeds.

Alex

Paul Rubin · Nov 5, 2003

Noen said:
This way is very slow for large strings.
Anyone know of a better and faster way to do it?

Yes, use the array module.

Georgy Pruss · Nov 6, 2003

I didn't even know about psyco. Thanks!
G-:

Alex Martelli said:
...but of course when in a hurry one should remember...
psyco
!!!

<...>

How to speed this code	3	Nov 16, 2022
Python - Caeser Cipher Not Giving Right Output	9	Mar 21, 2014
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
Jarow-Winkler algorithm: Measuring similarity between strings	5	Dec 19, 2008
looping over more than one list	4	Feb 16, 2006
Register Question	0	Oct 21, 2024
Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)	46	Sep 13, 2007
Don't understand SequenceMatcher from difflib	0	Jun 21, 2011

XORing long strings opimization?

Noen

Peter Otten

Noen

Peter Otten

Terry Reedy

Noen

Peter Otten

Ulrich Petri

Francis Avila

Alex Martelli

Francis Avila

Noen

Peter Otten

Casey Whitelaw

Georgy Pruss

Alex Martelli

Paul Rubin

Georgy Pruss

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads