Input file, change data, write to file

P

Paul Br

I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt's “Enterprise
Integration with Ruby†book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

For what it's worth, I'll be working with files that contain between
20,000 – 60,000 rows.

Below is a data sample:

01234567890123456789012345678901234567890123456789012

00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45

Here's what I want the file to look like with tabs between each section:

01234567890123456789012345678901234567890123456789012
123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45

Filename: fixtest1.rb

class FixedLengthRecordFile
def FixedLengthRecordFile.open(filename, field_sizes)

if field_sizes.nil? or field_sizes.empty?
raise ArgumentError, "Empty field sizes not allowed!"
end

field_pattern = 'a' + field_sizes.join('a')
IO.foreach(filename) do |line|
record = line.chomp.unpack(field_pattern)
record.map { |f| f.strip! }
yield record
end
end
end

Filename: rw1.rb

require 'fixtest1'

FixedLengthRecordFile.open('test1.abc', [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
puts
"#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}"

Any feedback is greatly appreciated!
 
M

matt neuburg

Paul Br said:
I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt's "Enterprise
Integration with Ruby" book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

I think the most newbie-appealing approach to files is with

open() do |f|
end

because when the block finishes the file closes automatically. The docs
on the modes for opening files, and on the methods you need after that,
are here:

<http://www.ruby-doc.org/core/classes/IO.html>

With big data that comes in lines, where each line is to be processed
independently, you presumably want two files, reading and writing a line
at a time, so the whole operation could be structured like this:

def munge(s)
return s.gsub(/[aeiou]/, '') # but do your own task here instead
end
open("path1", "r") do |f1|
open("path2", "w") do |f2|
f1.each { |line| f2.puts munge(line) }
end
end

m.
 
D

dblack

---2049402039-19209365-1162438286=:19812
Content-Type: MULTIPART/MIXED; BOUNDARY="-2049402039-19209365-1162438286=:19812"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

---2049402039-19209365-1162438286=:19812
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt's =E2=80=9CEnterpr= ise
Integration with Ruby=E2=80=9D book but it lacks a means to write the out= put to
a file.

How can I write the output to a file using the below code?

For what it's worth, I'll be working with files that contain between
20,000 =E2=80=93 60,000 rows.

Below is a data sample:

01234567890123456789012345678901234567890123456789012

00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45

Here's what I want the file to look like with tabs between each section:

01234567890123456789012345678901234567890123456789012
123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45

You might find scanf helpful. Here's a little example. Note that the
lines of data come from the DATA array, which is automatically read
from after __END__. Also, I'm using values_at to manipulate the order
in which the values get inserted into the printf string, so that I can
put the years first.

require 'scanf'
DATA.each do |line|
values =3D line.scanf("%5d %11c %4f %d/%d/%4d%d/%d/%d %f")
printf("%3d%12s %6.2f %04d-%02d-%02d %04d-%02d-%02d %3.2f\n",
*values.values_at(0,1,2,5,3,4,8,6,7,9))
end

__END__
00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45


Output:

123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45


David

--=20
David A. Black | (e-mail address removed)
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] http://www.manning.com/black | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org
---2049402039-19209365-1162438286=:19812--
---2049402039-19209365-1162438286=:19812--
 
S

Stefano Crocco

I'm not sure I understood what you want to do. Do you want to write the
modified data to another file or to the same file?

In the first case, all you need to do is the following (in file rw1.rb):

File.open('output_file','w'){|f|
FixedLengthRecordFile.open('test1.abc', [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
f.write
"#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}\n"
end
}

Instead, if you want to write the data back to the same file, you could
write your FixedLengthRecordFile.open method as

def FixedLengthRecordFile.open(filename, field_sizes)
if field_sizes.nil? or field_sizes.empty?
raise ArgumentError, "Empty field sizes not allowed!"
end

field_pattern = 'a' + field_sizes.join('a')
File.open(filename, 'r+'){|file|
IO.foreach(filename) do |line|
record = line.chomp.unpack(field_pattern)
record.map { |f| f.strip! }
file.write(yield(record))
end
}
end

or you could write

def FixedLengthRecordFile.open(filename, field_sizes)

if field_sizes.nil? or field_sizes.empty?
raise ArgumentError, "Empty field sizes not allowed!"
end

field_pattern = 'a' + field_sizes.join('a')
lines=File.readlines(filename)
File.open(filename, 'w'){|file|
lines.each do |line|
record = line.chomp.unpack(field_pattern)
record.map { |f| f.strip! }
file.write(yield(record))
end
}
end

I don't know whether this approach would lead to worst performances,
given the length of your files.

In both cases, the block you pass to the open method should return the
string to write:
FixedLengthRecordFile.open('test1.abc', [2, 3, 12, 7, 2, 1, 2, 1, 4,
2,
1, 2, 1, 4, 10]) do |row|
"#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}\n"
end

A couple of notes:
* you need to add the "\n" at the end of your string in the rw1 file,
otherwise all the rows in the original file will be written as one line
* this method will only work when all the lines of the data file have
the same structure (for example, it won't work with the first line of
your data file example above)
 
P

Paul Br

Stefano,

Thanks for your reply!

I want to write the modified data to another file. Your solution was
terrific!

I should have been clearer in the initial post about the long row of
numbers. That shouldn't have been part of the data sample, as its
purpose was to document character spacing.

Thanks for the alternate solutions too. You've provided this ruby
newbie with lots of valuable tidbits!

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top