Xavier Noria said:
If the data is a fixed-width record String#unpack is a compact idiom,
and it's usually fast as well. For instance:
record = "aaaaabbcccccddeee"
fields = record.unpack("a5a2a5a2a3")
If one only needs portions of the line, String#[] might be even better:
line = "aaaaabbcccccddeee"
name,street,city,phone,country =
line[0...5],line[5...7],line[7...12],line[12...14],line[14...17]
Then you can omit the fields you don't need. Regular expressions are not
in order if you know the field widths.
However, a quick performance test reveals superiority of unpack
performance wise over both forms of String#[] (see below).
Regards
robert
COUNT = 10000
def test1(line)
for i in 1...COUNT
name,street,city,phone,country =
line[0...5],line[5...7],line[7...12],line[12...14],line[14...17]
end
end
def test2(line)
for i in 1...COUNT
name,street,city,phone,country =
line[0,5],line[5,2],line[7,5],line[12,2],line[14,3]
end
end
def test3(line)
for i in 1...COUNT
name,street,city,phone,country = line.unpack("a5a2a5a2a3")
end
end
line = "aaaaabbcccccddeee"
test1 line
test2 line
test3 line
09:03:57 [ruby]: ruby -rprofile splitter.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.68 6.04 6.04 3 2014.00 3213.33 Range#each
34.21 9.34 3.30 99990 0.03 0.03 String#[]
3.11 9.64 0.30 9999 0.03 0.03 String#unpack
0.32 9.67 0.03 1 31.00 31.00
Profiler__.start_profile
0.00 9.67 0.00 3 0.00 0.00 Module#method_added
0.00 9.67 0.00 1 0.00 9640.00 #toplevel
0.00 9.67 0.00 1 0.00 1031.00 Object#test3
0.00 9.67 0.00 1 0.00 4297.00 Object#test1
0.00 9.67 0.00 1 0.00 4312.00 Object#test2
09:04:21 [ruby]: