splitting a line by columns

M

Mike Campbell

I have a line of text output in columnar form; what's the best way to split it
into its requisite parts?

Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow


Thoughts?
 
G

Gavin Sinclair

I have a line of text output in columnar form; what's the best way to split it
into its requisite parts?
Say I have lines of

I can do something like:
md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow


I'm not sure what you mean by columns, given your example. Columns,
to me, suggests columns in a newspaper.

But in your example, there's not much wrong with what you've done. A
slight improvement is

data = /(.{5})(.{2})(.{5})(.{2})(.{3})/.match(line).captures

The reason I suggest this is that you can easily generalise it
(replace the literal numbers by variables) to accept different column
widths.

However, I suspect you had something more complicated in mind.

Gavin
 
M

Martin DeMello

Ketil Kristiansen said:
I have a line of text output in columnar form; what's the best way to split it
into its requisite parts?

Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

widths = [5,2,5,2,3]
md = Regex.compile(widths.map {|i| "(" + '.'*i + ")"}.join).match(line)
md = md.to_a; md.shift # if you want the array

martin
 
R

Rob Partington

Say I have lines of
aaaaabbcccccddeee

I can do something like:
md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

# Some people, when confronted with a problem, think ``I know, I'll use
# regular expressions.'' Now they have two problems.
# -- jwz

irb(main):001:0> string="aaaabbcccccddeee"
=> "aaaabbcccccddeee"
irb(main):002:0> string.unpack("a4a2a5a2a3")
=> ["aaaa", "bb", "ccccc", "dd", "eee"]

In a similar vein to Martin DeMello, you can make it configurable.

irb(main):003:0> widths=[4,2,5,2,3]
=> [4, 2, 5, 2, 3]
irb(main):004:0> string.unpack(widths.map{|x| "a#{x}"}.join(nil))
=> ["aaaa", "bb", "ccccc", "dd", "eee"]

Absolutely no need for, or sense in, using regular expressions.
 
J

Jason Williams

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

widths = [5,2,5,2,3]
md = Regex.compile(widths.map {|i| "(" + '.'*i + ")"}.join).match(line)
md = md.to_a; md.shift # if you want the array

Isn't regex a bit overkill?

widths = [5,2,5,2,3]
i = 0
list = []
widths.each { |n| list << line[i,n] ; i += n }
 
X

Xavier Noria

I have a line of text output in columnar form; what's the best way to
split it into its requisite parts?

Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy
somehow

If the data is a fixed-width record String#unpack is a compact idiom,
and it's usually fast as well. For instance:

record = "aaaaabbcccccddeee"
fields = record.unpack("a5a2a5a2a3")

-- fxn
 
J

Josef 'Jupp' SCHUGT

Hi!

* Mike Campbell; 2003-10-12, 11:44 UTC:
I have a line of text output in columnar form; what's the best way
to split it into its requisite parts?

That's one of those 'it only takes n programmers to get n+1 results'
qestions. It strongly depends on what you mean by 'best way'.
Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

Thoughts?

Here are mine:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#!/usr/bin/env ruby

class Cutter < Array
def cut(line)
map { |range| line[range] }
end
end

knife = Cutter.new([0..4, 5..6, 7..11, 12..13, 14..16])
md = knife.cut(line)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

I am a bit surprised how powerful that ad hoc solution is: It
supports overlapping columns and columns can be arranged in arbitrary
order, ... The solution also makes it easy to programmatically select
the columns of interest before cutting anything:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
knife = Cutter.new()
knife.push(0..4) if rand < 0.5
knife.push(5..6) if rand < 0.5
knife.push(7..11) if rand < 0.5
knife.push(12..13) if rand < 0.5
knife.push(14..16) if rand < 0.5
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Comments?

Please take notice of signature! / Bitte Signature beachten!

Josef 'Jupp' Schugt
 
R

Robert Klemme

Xavier Noria said:
If the data is a fixed-width record String#unpack is a compact idiom,
and it's usually fast as well. For instance:

record = "aaaaabbcccccddeee"
fields = record.unpack("a5a2a5a2a3")

If one only needs portions of the line, String#[] might be even better:

line = "aaaaabbcccccddeee"
name,street,city,phone,country =
line[0...5],line[5...7],line[7...12],line[12...14],line[14...17]

Then you can omit the fields you don't need. Regular expressions are not
in order if you know the field widths.

However, a quick performance test reveals superiority of unpack
performance wise over both forms of String#[] (see below).

Regards

robert



COUNT = 10000

def test1(line)
for i in 1...COUNT
name,street,city,phone,country =
line[0...5],line[5...7],line[7...12],line[12...14],line[14...17]
end
end

def test2(line)
for i in 1...COUNT
name,street,city,phone,country =
line[0,5],line[5,2],line[7,5],line[12,2],line[14,3]
end
end

def test3(line)
for i in 1...COUNT
name,street,city,phone,country = line.unpack("a5a2a5a2a3")
end
end

line = "aaaaabbcccccddeee"

test1 line
test2 line
test3 line

09:03:57 [ruby]: ruby -rprofile splitter.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.68 6.04 6.04 3 2014.00 3213.33 Range#each
34.21 9.34 3.30 99990 0.03 0.03 String#[]
3.11 9.64 0.30 9999 0.03 0.03 String#unpack
0.32 9.67 0.03 1 31.00 31.00
Profiler__.start_profile
0.00 9.67 0.00 3 0.00 0.00 Module#method_added
0.00 9.67 0.00 1 0.00 9640.00 #toplevel
0.00 9.67 0.00 1 0.00 1031.00 Object#test3
0.00 9.67 0.00 1 0.00 4297.00 Object#test1
0.00 9.67 0.00 1 0.00 4312.00 Object#test2
09:04:21 [ruby]:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,137
Messages
2,570,797
Members
47,342
Latest member
eixataze

Latest Threads

Top