D
David Garamond
I use 128bit GUID values a lot, and on my Guid class there's the
Guid.from_base36 constructor and Guid#to_base36 instance method.
Base36 is a representation of GUID value using the digits "a".."z",
"0".."9" (in that particular order). Some of the nice properties of this
representation:
- shorter than the hexdigit representation (25 chars instead of 32/36).
- can be conveniently used as identifiers (variable & function names,
URL query parameters, table & column names, etc); note that the
first digit always falls between "a".."p", because 36**25 > 2**128.
- can be conveniently used as filenames (even on Windows, since it
does not depend on case differences).
The Guid class stores the GUID value as 16-byte string (@val). Here's
the conversion code mentioned above:
def Guid.from_base36(val)
val = val.downcase
raise ArgumentError unless val =~ /\A[a-z][a-z0-9]{24}\z/
n = 0
mult = 1
val.reverse.each_byte {|c|
n += @@rd36[c] * mult
mult *= 36
}
Guid.from_i(n)
end
def to_base36
self.to_i.to_s(36).tr('0-9a-z', 'a-z0-9').rjust(25, 'a')
end
Benchmark.measure shows that, on my box, I can do around 2000-2500 of
roundtrip conversions per second, which is not too bad. But I wonder if
it can be made more efficient. The Ruby profiler shows the top 4 methods:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
33.59 0.43 0.43 50 8.60 17.60 String#each_byte
14.06 0.61 0.18 1550 0.12 0.17 Fixnum#*
9.38 0.73 0.12 1900 0.06 0.06 Bignum#*
8.59 0.84 0.11 50 2.20 3.00 Guid#to_i
which is kind of disappointing since I was hoping to still be able to
store GUID values as 16-byte strings, for compactness. Here's the
complete code (91 lines, hope it's not too long...)
===================================================================
class Guid
@@d36 = ('a'..'z').to_a + ('0'..'9').to_a
@@rd36 = {}
@@d36.each_with_index {|d, i| @@rd36[d[0]] = i}
attr_reader :val
def initialize(val=nil)
if val
raise ArgumentError unless val.length==16
else
val = (1..16).collect{|c| rand(256).chr}.join
end
@val = val
end
def to_s
self.to_hex
end
def ==(other)
@val == other.val
end
def Guid.from_hex(val)
h = '[0-9A-Fa-f]'
raise ArgumentError unless
val =~ /\A#{h}{8}-#{h}{4}-#{h}{4}-#{h}{4}-#{h}{12}\z/
val.gsub! /-/, ''
Guid.new([val].pack('H32'))
end
def to_hex
@val.unpack('H8H4H4H4H12').join '-'
end
def Guid.from_i(val)
Guid.new([
(val & 0xffffffff000000000000000000000000) >> 96,
(val & 0x00000000ffffffff0000000000000000) >> 64,
(val & 0x0000000000000000ffffffff00000000) >> 32,
(val & 0x000000000000000000000000ffffffff)
].pack('NNNN'))
end
def to_i
(@val[ 0 .. 3].unpack('N')[0] << 96) +
(@val[ 4 .. 7].unpack('N')[0] << 64) +
(@val[ 8 .. 11].unpack('N')[0] << 32) +
(@val[12 .. 15].unpack('N')[0])
end
def Guid.from_base36(val)
val = val.downcase
raise ArgumentError unless val =~ /\A[a-z][a-z0-9]{24}\z/
n = 0
mult = 1
val.reverse.each_byte {|c|
n += @@rd36[c] * mult
mult *= 36
}
Guid.from_i(n)
end
def to_base36
self.to_i.to_s(36).tr('0-9a-z', 'a-z0-9').rjust(25, 'a')
end
end
guids = [
#'00000000-0000-0000-0000-000000000000',
#'ffffffff-ffff-ffff-ffff-ffffffffffff',
#'00000000-0000-0000-0000-000000000001',
#'10000000-0000-0000-0000-000000000000',
'7d77a27f-542c-4edb-9642-f0e324ae23a2',
'c44e6638-ed47-4ce9-9bb6-ba41bca2e535',
'aae9946e-64b2-4628-8f57-d0da26d29c86',
'6cf2eba1-2f9d-463d-8538-6396f00b3f68',
'6c518ced-1d56-46f9-af24-dd82726502ef',
].collect {|h| Guid.from_hex(h)}
require 'benchmark'
puts Benchmark.measure {
1000.times {
guids.each {|g|
raise SystemError unless g == Guid.from_base36(g.to_base36)
}
}
}
===================================================================
Regards,
dave
Guid.from_base36 constructor and Guid#to_base36 instance method.
Base36 is a representation of GUID value using the digits "a".."z",
"0".."9" (in that particular order). Some of the nice properties of this
representation:
- shorter than the hexdigit representation (25 chars instead of 32/36).
- can be conveniently used as identifiers (variable & function names,
URL query parameters, table & column names, etc); note that the
first digit always falls between "a".."p", because 36**25 > 2**128.
- can be conveniently used as filenames (even on Windows, since it
does not depend on case differences).
The Guid class stores the GUID value as 16-byte string (@val). Here's
the conversion code mentioned above:
def Guid.from_base36(val)
val = val.downcase
raise ArgumentError unless val =~ /\A[a-z][a-z0-9]{24}\z/
n = 0
mult = 1
val.reverse.each_byte {|c|
n += @@rd36[c] * mult
mult *= 36
}
Guid.from_i(n)
end
def to_base36
self.to_i.to_s(36).tr('0-9a-z', 'a-z0-9').rjust(25, 'a')
end
Benchmark.measure shows that, on my box, I can do around 2000-2500 of
roundtrip conversions per second, which is not too bad. But I wonder if
it can be made more efficient. The Ruby profiler shows the top 4 methods:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
33.59 0.43 0.43 50 8.60 17.60 String#each_byte
14.06 0.61 0.18 1550 0.12 0.17 Fixnum#*
9.38 0.73 0.12 1900 0.06 0.06 Bignum#*
8.59 0.84 0.11 50 2.20 3.00 Guid#to_i
which is kind of disappointing since I was hoping to still be able to
store GUID values as 16-byte strings, for compactness. Here's the
complete code (91 lines, hope it's not too long...)
===================================================================
class Guid
@@d36 = ('a'..'z').to_a + ('0'..'9').to_a
@@rd36 = {}
@@d36.each_with_index {|d, i| @@rd36[d[0]] = i}
attr_reader :val
def initialize(val=nil)
if val
raise ArgumentError unless val.length==16
else
val = (1..16).collect{|c| rand(256).chr}.join
end
@val = val
end
def to_s
self.to_hex
end
def ==(other)
@val == other.val
end
def Guid.from_hex(val)
h = '[0-9A-Fa-f]'
raise ArgumentError unless
val =~ /\A#{h}{8}-#{h}{4}-#{h}{4}-#{h}{4}-#{h}{12}\z/
val.gsub! /-/, ''
Guid.new([val].pack('H32'))
end
def to_hex
@val.unpack('H8H4H4H4H12').join '-'
end
def Guid.from_i(val)
Guid.new([
(val & 0xffffffff000000000000000000000000) >> 96,
(val & 0x00000000ffffffff0000000000000000) >> 64,
(val & 0x0000000000000000ffffffff00000000) >> 32,
(val & 0x000000000000000000000000ffffffff)
].pack('NNNN'))
end
def to_i
(@val[ 0 .. 3].unpack('N')[0] << 96) +
(@val[ 4 .. 7].unpack('N')[0] << 64) +
(@val[ 8 .. 11].unpack('N')[0] << 32) +
(@val[12 .. 15].unpack('N')[0])
end
def Guid.from_base36(val)
val = val.downcase
raise ArgumentError unless val =~ /\A[a-z][a-z0-9]{24}\z/
n = 0
mult = 1
val.reverse.each_byte {|c|
n += @@rd36[c] * mult
mult *= 36
}
Guid.from_i(n)
end
def to_base36
self.to_i.to_s(36).tr('0-9a-z', 'a-z0-9').rjust(25, 'a')
end
end
guids = [
#'00000000-0000-0000-0000-000000000000',
#'ffffffff-ffff-ffff-ffff-ffffffffffff',
#'00000000-0000-0000-0000-000000000001',
#'10000000-0000-0000-0000-000000000000',
'7d77a27f-542c-4edb-9642-f0e324ae23a2',
'c44e6638-ed47-4ce9-9bb6-ba41bca2e535',
'aae9946e-64b2-4628-8f57-d0da26d29c86',
'6cf2eba1-2f9d-463d-8538-6396f00b3f68',
'6c518ced-1d56-46f9-af24-dd82726502ef',
].collect {|h| Guid.from_hex(h)}
require 'benchmark'
puts Benchmark.measure {
1000.times {
guids.each {|g|
raise SystemError unless g == Guid.from_base36(g.to_base36)
}
}
}
===================================================================
Regards,
dave