splitting a line by columns

Mike Campbell · Oct 12, 2003

I have a line of text output in columnar form; what's the best way to split it
into its requisite parts?

Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

Thoughts?

Gavin Sinclair · Oct 12, 2003

I have a line of text output in columnar form; what's the best way to split it
into its requisite parts?

Say I have lines of

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

I'm not sure what you mean by columns, given your example. Columns,
to me, suggests columns in a newspaper.

But in your example, there's not much wrong with what you've done. A
slight improvement is

data = /(.{5})(.{2})(.{5})(.{2})(.{3})/.match(line).captures

The reason I suggest this is that you can easily generalise it
(replace the literal numbers by variables) to accept different column
widths.

However, I suspect you had something more complicated in mind.

Gavin

Martin DeMello · Oct 12, 2003

Ketil Kristiansen said:
I have a line of text output in columnar form; what's the best way to split it
into its requisite parts?

Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

widths = [5,2,5,2,3]
md = Regex.compile(widths.map {|i| "(" + '.'*i + ")"}.join).match(line)
md = md.to_a; md.shift # if you want the array

martin

Rob Partington · Oct 12, 2003

Say I have lines of
aaaaabbcccccddeee

I can do something like:
md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

# Some people, when confronted with a problem, think ``I know, I'll use
# regular expressions.'' Now they have two problems.
# -- jwz

irb(main):001:0> string="aaaabbcccccddeee"
=> "aaaabbcccccddeee"
irb(main):002:0> string.unpack("a4a2a5a2a3")
=> ["aaaa", "bb", "ccccc", "dd", "eee"]

In a similar vein to Martin DeMello, you can make it configurable.

irb(main):003:0> widths=[4,2,5,2,3]
=> [4, 2, 5, 2, 3]
irb(main):004:0> string.unpack(widths.map{|x| "a#{x}"}.join(nil))
=> ["aaaa", "bb", "ccccc", "dd", "eee"]

Absolutely no need for, or sense in, using regular expressions.

Jason Williams · Oct 12, 2003

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

Click to expand...

widths = [5,2,5,2,3]
md = Regex.compile(widths.map {|i| "(" + '.'*i + ")"}.join).match(line)
md = md.to_a; md.shift # if you want the array

Isn't regex a bit overkill?

widths = [5,2,5,2,3]
i = 0
list = []
widths.each { |n| list << line[i,n] ; i += n }

Xavier Noria · Oct 12, 2003

I have a line of text output in columnar form; what's the best way to
split it into its requisite parts?

Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy
somehow

If the data is a fixed-width record String#unpack is a compact idiom,
and it's usually fast as well. For instance:

record = "aaaaabbcccccddeee"
fields = record.unpack("a5a2a5a2a3")

-- fxn

Josef 'Jupp' SCHUGT · Oct 12, 2003

Hi!

* Mike Campbell; 2003-10-12, 11:44 UTC:

I have a line of text output in columnar form; what's the best way
to split it into its requisite parts?

That's one of those 'it only takes n programmers to get n+1 results'
qestions. It strongly depends on what you mean by 'best way'.

Say I have lines of

aaaaabbcccccddeee

I can do something like:

md = /(.....)(..)(.....)(..)(...)/.match(line); # seems klugy somehow

Thoughts?

Here are mine:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#!/usr/bin/env ruby

class Cutter < Array
def cut(line)
map { |range| line[range] }
end
end

knife = Cutter.new([0..4, 5..6, 7..11, 12..13, 14..16])
md = knife.cut(line)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

I am a bit surprised how powerful that ad hoc solution is: It
supports overlapping columns and columns can be arranged in arbitrary
order, ... The solution also makes it easy to programmatically select
the columns of interest before cutting anything:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
knife = Cutter.new()
knife.push(0..4) if rand < 0.5
knife.push(5..6) if rand < 0.5
knife.push(7..11) if rand < 0.5
knife.push(12..13) if rand < 0.5
knife.push(14..16) if rand < 0.5
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Comments?

Please take notice of signature! / Bitte Signature beachten!

Josef 'Jupp' Schugt

Robert Klemme · Oct 13, 2003

Xavier Noria said:
If the data is a fixed-width record String#unpack is a compact idiom,
and it's usually fast as well. For instance:

record = "aaaaabbcccccddeee"
fields = record.unpack("a5a2a5a2a3")

If one only needs portions of the line, String#[] might be even better:

line = "aaaaabbcccccddeee"
name,street,city,phone,country =
line[0...5],line[5...7],line[7...12],line[12...14],line[14...17]

Then you can omit the fields you don't need. Regular expressions are not
in order if you know the field widths.

However, a quick performance test reveals superiority of unpack
performance wise over both forms of String#[] (see below).

Regards

robert

COUNT = 10000

def test1(line)
for i in 1...COUNT
name,street,city,phone,country =
line[0...5],line[5...7],line[7...12],line[12...14],line[14...17]
end
end

def test2(line)
for i in 1...COUNT
name,street,city,phone,country =
line[0,5],line[5,2],line[7,5],line[12,2],line[14,3]
end
end

def test3(line)
for i in 1...COUNT
name,street,city,phone,country = line.unpack("a5a2a5a2a3")
end
end

line = "aaaaabbcccccddeee"

test1 line
test2 line
test3 line

09:03:57 [ruby]: ruby -rprofile splitter.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.68 6.04 6.04 3 2014.00 3213.33 Range#each
34.21 9.34 3.30 99990 0.03 0.03 String#[]
3.11 9.64 0.30 9999 0.03 0.03 String#unpack
0.32 9.67 0.03 1 31.00 31.00
Profiler__.start_profile
0.00 9.67 0.00 3 0.00 0.00 Module#method_added
0.00 9.67 0.00 1 0.00 9640.00 #toplevel
0.00 9.67 0.00 1 0.00 1031.00 Object#test3
0.00 9.67 0.00 1 0.00 4297.00 Object#test1
0.00 9.67 0.00 1 0.00 4312.00 Object#test2
09:04:21 [ruby]:

Splitting a file from specific column content	14	Jan 22, 2012
Splitting a line while keeping quoted items together	1	Nov 20, 2012
Splitting up and Reassembling A File	5	Mar 14, 2011
Splitting on '^' ?	10	Aug 14, 2009
Counting Tabs and splitting by that number	6	Sep 28, 2008
Sequence splitting	32	Jul 3, 2009
Reportlab - Splitting table by column	0	Sep 29, 2008
Splitting a CSV file into 40,000 line chunks	33	Nov 29, 2006

splitting a line by columns

Mike Campbell

Gavin Sinclair

Martin DeMello

Rob Partington

Jason Williams

Xavier Noria

Josef 'Jupp' SCHUGT

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads