Text file parsing in ruby

Paul van Delst · Jan 24, 2007

Hello,

As I use ruby more and more for things, I find myself creating "Config" classes, filling
them with data read from a simple text file, and then passing instances of config around
to do all the work. What I would like to get some advice on, or links to, is ruby-ish
methods of reading/parsing text files.

A lot of text files have, for example, some sort of header that says how much data is
coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
...etc..

Or, svn log output where the header line says how many lines of log message follow.

I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much actual data
follows before the next header? I.e. I discover that I need to read 5 point so I read 5
points and the next line that is parsed in the above iteration is the next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv

ara.t.howard · Jan 24, 2007

Hello,

As I use ruby more and more for things, I find myself creating "Config"
classes, filling them with data read from a simple text file, and then
passing instances of config around to do all the work. What I would like to
get some advice on, or links to, is ruby-ish methods of reading/parsing text
files.

A lot of text files have, for example, some sort of header that says how much
data is coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
..etc..

Or, svn log output where the header line says how many lines of log message
follow.

I find I'm struggling to figure out a tidy way to read these sorts of files.
If, for example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much
actual data follows before the next header? I.e. I discover that I need to
read 5 point so I read 5 points and the next line that is parsed in the above
iteration is the next header line. Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag
what is to come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv

yaml is your good friend:

harp:~ > cat a.rb
require 'yaml'
points = YAML.load(IO.read('points.yml'))
p points.size
points.each{|point| p point}

harp:~ > cat points.yml
---
- [10, 20]
- [11, 21]

harp:~ > ruby a.rb
2
[10, 20]
[11, 21]

and so much more...

regards.

-a

William James · Jan 24, 2007

Hello,

As I use ruby more and more for things, I find myself creating "Config" classes, filling
them with data read from a simple text file, and then passing instances of config around
to do all the work. What I would like to get some advice on, or links to, is ruby-ish
methods of reading/parsing text files.

A lot of text files have, for example, some sort of header that says how much data is
coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
..etc..

Or, svn log output where the header line says how many lines of log message follow.

I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much actual data
follows before the next header? I.e. I discover that I need to read 5 point so I read 5
points and the next line that is parsed in the above iteration is the next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv

open('data1'){|handle|
while header = handle.gets do
header[ /\d+/ ].to_i.times {
p handle.gets
}
end
}

Robert Klemme · Jan 24, 2007

Hello,

As I use ruby more and more for things, I find myself creating "Config" classes, filling
them with data read from a simple text file, and then passing instances of config around
to do all the work. What I would like to get some advice on, or links to, is ruby-ish
methods of reading/parsing text files.

A lot of text files have, for example, some sort of header that says how much data is
coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
..etc..

Or, svn log output where the header line says how many lines of log message follow.

I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much actual data
follows before the next header? I.e. I discover that I need to read 5 point so I read 5
points and the next line that is parsed in the above iteration is the next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv

Click to expand...

open('data1'){|handle|
while header = handle.gets do
header[ /\d+/ ].to_i.times {
p handle.gets
}
end
}

Or test after the fact:

# untested
sets = []
current = nil
items = nil

File.foreach('data1') do |line|
case line
when /Number of data points: (\d+)/
raise "Wrong amount" if current && current.size != items
items = $1.to_i
current = []
else
current << line.scan(/\d+/).map! {|x| x.to_i}
end
end

raise "Wrong amount" if current && current.size != items

Regards

robert

Paul van Delst · Jan 24, 2007

[snip example]

I find I'm struggling to figure out a tidy way to read these sorts of
files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me
how much actual data
follows before the next header? I.e. I discover that I need to read 5
point so I read 5
points and the next line that is parsed in the above iteration is the
next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values
that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Click to expand...

open('data1'){|handle|
while header = handle.gets do
header[ /\d+/ ].to_i.times {
p handle.gets
}
end
}

Click to expand...

Or test after the fact:

# untested
sets = []
current = nil
items = nil

File.foreach('data1') do |line|
case line
when /Number of data points: (\d+)/
raise "Wrong amount" if current && current.size != items
items = $1.to_i
current = []
else
current << line.scan(/\d+/).map! {|x| x.to_i}
end
end

raise "Wrong amount" if current && current.size != items

To all responders, as always, thanks very much. You guys are great. One day I will grok
this much better (but I have some unlearning to do...)

cheers,

paulv

p.s. Ara, I do use YAML for some things, but I don't always (actually, quite rarely) have
control of how the file is created.

(

Dynamic block parsing + scrolling	0	May 30, 2024
Dynamic block parsing + scrolling	0	May 30, 2024
Php combine identical lines in text file	4	Oct 11, 2023
Parsing the Ruby File	14	Dec 30, 2010
Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019
Parsing non-delimited text file	6	Oct 2, 2008
Parsing an text file	3	Apr 1, 2007
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022

Text file parsing in ruby

Paul van Delst

ara.t.howard

William James

Robert Klemme

Paul van Delst

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads