A
Aaron D. Gifford
What's your fastest pure-ruby method to do a byte-by-byte XOR to two
input strings, both containing 8-bit binary data?
I'm assuming both input strings are of the same length (length is
variable but non-zero), and assuming that in Ruby 1.9 the input
strings' encoding is ASCII-8BIT/BINARY, and that input variables may
or may not be destructively altered and/or used as output variables.
In the past I've used things like:
# Destructive Ruby 1.9 version alters input a and uses it as output:
a.bytesize.times{|i| a = (a.ord ^ b.ord).chr}
# Non-destructive Ruby 1.9:
a.bytes.zip(b.bytes).map{|x,y| (x^y).chr}.join
# Destructive Ruby 1.8 version alters input a and uses it as output:
a.size.times{|i| a ^= b}
# Non-destructive Ruby 1.8:
a.scan(/./).zip(b.scan(/./)).map{|x,y| (x[0] ^ y[0]).chr}.join
# And the (best?) non-destructive universal version using unpack &
pack instead of scan & chr/join:
a.unpack('C*').zip(b.unpack('C*')).map{|x,y| x ^ y}.pack('C*')
Overall, the pack/unpack universal version is the best I've
seen--EXCEPT that the destructive Ruby 1.8 version seems a bit faster,
or so a casual bench I ran on an old Ruby 1.8.6 machine reported.
Using pack/unpack and padding the input data to an even multiple of 8
bytes in length, then using 'Q*' to do 64-bit packing/unpacking made
things about three times faster, even with the overhead of padding the
inputs (if not even multiples of 8 bytes) and truncating output:
# For Ruby 1.9 'a' MUST be binary encoded and 0.chr.bytesize MUST be exactly 1,
# otherwise the padding and truncation sizes will be off:
pad = 0.chr * (8 - a.size% 8)
(a + pad).unpack('Q*').zip((b + pad).unpack('Q*')).map{|a,b|
a^b}.pack('Q*')[0,a.size]
String encoding newbie question:
Is it wiser to use [0].pack('C') which should guarantee a binary
encoded string of exactly one byte in length (a zero byte) instead of
0.chr ? Are there source code encodings where 0.chr results in a
multi-byte character?
Anything faster out there?
Aaron out.
input strings, both containing 8-bit binary data?
I'm assuming both input strings are of the same length (length is
variable but non-zero), and assuming that in Ruby 1.9 the input
strings' encoding is ASCII-8BIT/BINARY, and that input variables may
or may not be destructively altered and/or used as output variables.
In the past I've used things like:
# Destructive Ruby 1.9 version alters input a and uses it as output:
a.bytesize.times{|i| a = (a.ord ^ b.ord).chr}
# Non-destructive Ruby 1.9:
a.bytes.zip(b.bytes).map{|x,y| (x^y).chr}.join
# Destructive Ruby 1.8 version alters input a and uses it as output:
a.size.times{|i| a ^= b}
# Non-destructive Ruby 1.8:
a.scan(/./).zip(b.scan(/./)).map{|x,y| (x[0] ^ y[0]).chr}.join
# And the (best?) non-destructive universal version using unpack &
pack instead of scan & chr/join:
a.unpack('C*').zip(b.unpack('C*')).map{|x,y| x ^ y}.pack('C*')
Overall, the pack/unpack universal version is the best I've
seen--EXCEPT that the destructive Ruby 1.8 version seems a bit faster,
or so a casual bench I ran on an old Ruby 1.8.6 machine reported.
Using pack/unpack and padding the input data to an even multiple of 8
bytes in length, then using 'Q*' to do 64-bit packing/unpacking made
things about three times faster, even with the overhead of padding the
inputs (if not even multiples of 8 bytes) and truncating output:
# For Ruby 1.9 'a' MUST be binary encoded and 0.chr.bytesize MUST be exactly 1,
# otherwise the padding and truncation sizes will be off:
pad = 0.chr * (8 - a.size% 8)
(a + pad).unpack('Q*').zip((b + pad).unpack('Q*')).map{|a,b|
a^b}.pack('Q*')[0,a.size]
String encoding newbie question:
Is it wiser to use [0].pack('C') which should guarantee a binary
encoded string of exactly one byte in length (a zero byte) instead of
0.chr ? Are there source code encodings where 0.chr results in a
multi-byte character?
Anything faster out there?
Aaron out.