help with refactor and db/ar advice

B

ball

Hi all,
I have a number of delimited data files that I would like to import
into MySql. I am using ActiveRecord to create the schema and do the
import of the data. Code snippets follows for how I am doing one of
the many files. However, the code bloats when I do it for all of them.
Now, of course, I can't get around creating a schema for each table,
but it seems that the function that loads the code is a bit redundant.
So, my end goal in this post is just to learn a few things and create
some cleaner code, so I would love your comments.

My questions are:
1. Is there a way to take advantage of the schema so that I don't have
to re-write it in the load function?
2. Is there a way I can write a generic load function (taking
advantage of #1 if available, or not if not).
3. What is the "better" way (than ][1..-2]) to strip a string of a
starting/ending ~
4. Should I just use an AR generated primary key, rather than the data
sets ASCII primary key? (many of the tables in the datasets have
composite keys, in which I will not implement, so I am curious if I
just create all of the primary keys through AR insted).
5. Any other advice?

Thanks in advance!

===========> CURRENT CODE <=============

class FoodDescription < ActiveRecord::Base
end

def create_food_descriptions
ActiveRecord::Schema.define do
create_table:)food_descriptions, :primary_key =>
'NDB_No', :eek:ptions => 'DEFAULT CHARSET=utf8') do |table|
table.column :NDB_No, :string, :limit => 5, :null => false
table.column :FdGrp_Cd, :string, :limit => 4, :null => false
table.column :Long_Desc, :string, :limit => 200, :null =>
false
table.column :Shrt_Desc, :string, :limit => 60, :null =>
false
table.column :CommName, :string, :limit => 100, :null =>
true
table.column :ManufacName, :string, :limit => 65, :null =>
true
table.column :Survey, :string, :limit => 1, :null => true
table.column :Ref_desc, :string, :limit => 135, :null =>
true
table.column :Refuse, :string, :limit => 2, :null => true
table.column :SciName, :string, :limit => 65, :null => true
table.column :N_Factor, :float, :null => true
table.column :pro_Factor, :float, :null => true
table.column :Fat_Factor, :float, :null => true
table.column :CHO_Factor, :float, :null => true
end
puts " ... food_descriptions created"
end
end

def load_food_descriptions( food_desc_file_name )
# Loop through the file, assuming the following format:
# ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
# ndbnum, fdgrp, long_desc, shrt_desc, common_name,
manufacturer_name, survey,
# refuse_desc, percent_refuse, science_name, n_fac, pro_fac,
fat_fac, CHO_fac

food_desc_file = File.open(food_desc_file_name)
food_desc_file.each do |line|
# caret (^) is the delimeter
# squiglies are the string delimeters: ~string~
# hence below, we split with 1..-2 to remove them if a string
record_split = line.split('^')

FoodDescription.create(
:NDB_No => record_split[0][1..-2],
:FdGrp_Cd => record_split[1][1..-2],
:Long_Desc => record_split[2][1..-2],
:Shrt_Desc => record_split[3][1..-2],
:CommName => record_split[4][1..-2],
:ManufacName => record_split[5][1..-2],
:Survey => record_split[6][1..-2],
:Ref_desc => record_split[7][1..-2],
:Refuse => record_split[8][1..-2],
:SciName => record_split[9][1..-2],
:N_Factor => record_split[10],
:pro_Factor => record_split[11],
:Fat_Factor => record_split[12],
:CHO_Factor => record_split[13]
);
end
end
 
B

Brian Candler

ball said:
1. Is there a way to take advantage of the schema so that I don't have
to re-write it in the load function?

puts FoodDescription.columns.inspect
puts FoodDescription.map { |c| c.name }.inspect
2. Is there a way I can write a generic load function (taking
advantage of #1 if available, or not if not).

As long as the columns in the text source appear in the same order as
the columns in the database definition you should be fine. I think this
ordering is guaranteed by the database, otherwise you couldn't
meaningfully do "INSERT INTO foo VALUES (x,y,z)" (without naming the
columns) or "SELECT * FROM foo"
3. What is the "better" way (than ][1..-2]) to strip a string of a
starting/ending ~

Do it all in one go. A couple of options:

records = line.split(/^/).map { |rec| rec[1..-2] }
records = line.scan(/~([^~^]*)~/)

but they won't work unless *all* your fields are ~ delimited. So this
might work better for you:

records = line.split(/^/).map { |rec| rec.sub(/~(.*)~) { $1 } }
4. Should I just use an AR generated primary key, rather than the data
sets ASCII primary key? (many of the tables in the datasets have
composite keys, in which I will not implement, so I am curious if I
just create all of the primary keys through AR insted).

ActiveRecord has some hard-coded assumptions about primary keys. If you
override its assumption that the key is called "id" then it won't
generate keys automatically for new records. If you have non-integer
primary key then it may or may not work. It certainly doesn't handle
composite primary keys; I saw a plugin which claimed to do this some
time ago, but I don't know if it's been kept up to date.

So my advice with AR is: if you are working with an existing
database/schema that you don't control then try using non-standard
primary keys, but if you control the schema yourself, follow AR's
assumption of an integer primary key.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top