Gen. Row objects from tab-delim text lines

  • Thread starter danbernier+ruby
  • Start date
D

danbernier+ruby

This is nothing ground-breaking, but I thought I'd share it anyway. It
was a fun little exercise, and I've found it pretty useful.

RowParser.parse takes a String (or will read in an IO), assumes it's
\n- and \t-delimited, and that the first line is \t-delimited column
headers. It generates a Row class with attributes named for the
columns, and returns an array of populated Row objects. It does a bit
of name munging, and allows for empty columns.

It's been useful during this data conversion I'm working on -- both SQL
output and MS Word/Excel tables save to tab-delim text files easily.

Example: here's a table of nutritional info for different beers (from
http://www.realbeer.com/edu/health/calories.php). I hope the tabs come
out alright -- notice that some of the columns are empty:

Brewery/Brand Beer Alcohol % Calories Carbs
Amstel Light Amstel Light 3.5 95 5
Alaskan Brewing Alaskan Amber 5
Alaskan Brewing Alaskan Pale Ale 4.6
Alaskan Brewing Alaskan Stout 5.7
Alaskan Brewing Alaskan ESB 5
Alaskan Brewing Alaskan Smoked Porter 6.1
Alaskan Brewing Alaskan Winter Ale 6.2
Anchor Anchor Steam 4.9 152
Anchor Liberty Ale 6 188
Anchor Anchor Porter 5.6 205


A bit of ruby:
rp = RowParser.new
data = rp.parse(File.open("beer.txt"))
data.each { |d|
puts "#{d.beer}, #{d.calories}"
}


The output:
Amstel Light, 95
Alaskan Amber,
Alaskan Pale Ale,
Alaskan Stout,
Alaskan ESB,
Alaskan Smoked Porter,
Alaskan Winter Ale,
Anchor Steam, 152
Liberty Ale, 188
Anchor Porter, 205


Here's the source:

class String
def conservative_split(delim)
str = self.to_s
bits = []
while str.include? delim
bits << str.slice!(0..str.index(delim)).strip
end
bits << str
return bits
end
end

class RowParser
attr_reader :code

def parse(data)
# If it's an IO, read it in. Else, assume it's a String.
data = data.read if data.kind_of? IO
data = data.split("\n") # Lines!

# Create the Row class, based on the header columns
headers = data.shift.split("\t") # First line is the column names
headers.collect! do |h|
h = h.slice!(0).chr.downcase + h # Downcase the first letter...
h.gsub(/[^\w\d_]/, '') # ...and purge problematic punctuation.
end

symbs = headers.collect { |h| ":#{h}" }.join(', ') # For attr_reader
args = headers.join(', ') # For the constructor
attrs = headers.collect { |h| "@#{h}" }.join(', ') # For the attrs
@code = <<CODE
class Row
attr_reader #{symbs}
def initialize(#{args})
#{attrs} = #{args}
end
def to_s
[#{attrs}].join(', ')
end
end
CODE
eval @code

# Now, create a Row from each remaining line
rows = []

data.each { |line|
line = line.conservative_split("\t")
rows << Row.new(*line)
}

return rows
end
end

Hope it's useful...
Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,839
Latest member
MartinaBur

Latest Threads

Top