Parsing CSV

R

Rafael George

Hi guys, im a newbie in Ruby i have to parse two CSV files to compare
2 columns of the given files. My problem is that i tried a lot of
different methods to handle this, i tried to put the entire column in
an array and the other one two then test for the bigger array to make
a loop thought it and compare both files like that. It did not work, i
was thinking in using CSV but its limited and then i came a cross with
fasterCSV which is the module than im stuck right now, if somebody can
make a suggestion i really appreciate it.

Thanks in advance.

PS: I was told to make this tool in Java but, AFAIK Ruby is better for
handling file text.
 
B

Brian Candler

Hi guys, im a newbie in Ruby i have to parse two CSV files to compare
2 columns of the given files. My problem is that i tried a lot of
different methods to handle this, i tried to put the entire column in
an array and the other one two then test for the bigger array to make
a loop thought it and compare both files like that. It did not work

Well, posting your code might allow someone to help you spot what's wrong.

I'd suggest first you check that the two arrays are being read in properly -
if they are called a1 and a2, then "puts a1.inspect" and "puts a2.inspect"
will print them to the screen. Then you know whether the problem is in
reading them, or in comparing them.

Posting a more precise description of what you're trying to do, along with
some sample data and what output you expect, would also make it easier for
someone to help you.
PS: I was told to make this tool in Java but, AFAIK Ruby is better for
handling file text.

The better language is the one which you can actually use to get the job
done :)

How you do this in Ruby depends on what exactly you mean by 'compare', since
you didn't define exactly what you're trying to do. I'm guessing you mean
check for values which are in the first file but not in the second, or vice
versa. For a simple solution, have a look at Array#include?

For a more efficient solution, you could first sort the two arrays and then
walk down them with two pointers i and j. When a1 == a2[j] then you
increment both i and j. When a1 < a2[j] then you know an item is missing
in a2, and just increment i. When a1 > a2[j] then you know an item is
missing in a1, and just increment j.

Incidentally, you don't even need Ruby to do this; then shell command 'join'
can do this for you (as long as you use 'sort' to pre-sort your input)

HTH,

Brian.
 
S

Stephane Elie

This code might get you started:

require 'FasterCSV'

def read_csv(filename)
return FasterCSV::Table.new( FasterCSV.read(filename) ).by_col
end

data1 = read_csv("data1.csv")
data2 = read_csv("data2.csv")

compare_column_idx = 1
unless data1[compare_column_idx] == data2[compare_column_idx]
puts "column #{compare_column_idx} is different"
end

Regards,
Stephane
 
R

Rafael George

passvalues = []
i = 0
IO.foreach(fsource) do |line|
cols = []
cols=CSV::parse_line line.chomp
sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
IO.foreach(tdest) do |line|
tcols = []
tcols=CSV::parse_line line.chomp
testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
if sourceval == testval
passvalues = sourceval
i += 1
end
end
end

Here is what i got

This code might get you started:

require 'FasterCSV'

def read_csv(filename)
return FasterCSV::Table.new( FasterCSV.read(filename) ).by_col
end

data1 = read_csv("data1.csv")
data2 = read_csv("data2.csv")

compare_column_idx = 1
unless data1[compare_column_idx] == data2[compare_column_idx]
puts "column #{compare_column_idx} is different"
end

Regards,
Stephane
 
J

James Edward Gray II

passvalues = []
i = 0
IO.foreach(fsource) do |line|
cols = []
cols=CSV::parse_line line.chomp
sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
IO.foreach(tdest) do |line|
tcols = []
tcols=CSV::parse_line line.chomp
testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
if sourceval == testval
passvalues = sourceval
i += 1
end
end
end


The direct translation of this code to FasterCSV is:

passvalues = Array.new
FCSV.foreach(fsource) |s_row|
source = s_row[scomp_args[0]..scomp_args[1]].join(" ")
FCSV.foreach(tdest) |t_row|
if source == t_row[scomp_args[0]..scomp_args[1]].join(" ")
passvalues << source
end
end
end

If you can afford to read one of the files into memory because it's
not too large, you can probably speed that up quite a bit:

require "set"

allowed = Set.new
FCSV.foreach(tdest) do |row|
allowed.add(row[scomp_args[0]..scomp_args[1]].join(" "))
end

passvalues = FCSV.open(fsource) do |source|
source.select do |row|
allowed.include? row[scomp_args[0]..scomp_args[1]].join(" ")
end
end

Hope that gives you some fresh ideas.

James Edward Gray II
 
J

James Edward Gray II

If you can afford to read one of the files into memory because it's
not too large, you can probably speed that up quite a bit:

require "set"

allowed = Set.new
FCSV.foreach(tdest) do |row|
allowed.add(row[scomp_args[0]..scomp_args[1]].join(" "))
end

passvalues = FCSV.open(fsource) do |source|
source.select do |row|
allowed.include? row[scomp_args[0]..scomp_args[1]].join(" ")
end
end

The above destroys the field order. If you need to keep the order,
use an Array instead:

allowed = Array.new
FCSV.foreach(dtest) do |row|
allowed << row[scomp_args[0]..scomp_args[1]].join(" ")
end

# ...

James Edward Gray II
 
J

James Edward Gray II

If you can afford to read one of the files into memory because
it's not too large, you can probably speed that up quite a bit:

require "set"

allowed = Set.new
FCSV.foreach(tdest) do |row|
allowed.add(row[scomp_args[0]..scomp_args[1]].join(" "))
end

passvalues = FCSV.open(fsource) do |source|
source.select do |row|
allowed.include? row[scomp_args[0]..scomp_args[1]].join(" ")
end
end

The above destroys the field order.

Sorry, I meant row order.

James Edward Gray II
 
J

James Edward Gray II

passvalues = []
i = 0
IO.foreach(fsource) do |line|
cols = []
cols=CSV::parse_line line.chomp
sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
IO.foreach(tdest) do |line|
tcols = []
tcols=CSV::parse_line line.chomp
testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
if sourceval == testval
passvalues = sourceval
i += 1
end
end
end


The direct translation of this code to FasterCSV is:

passvalues = Array.new
FCSV.foreach(fsource) |s_row|
source = s_row[scomp_args[0]..scomp_args[1]].join(" ")
FCSV.foreach(tdest) |t_row|
if source == t_row[scomp_args[0]..scomp_args[1]].join(" ")
passvalues << source


break # performance enhancement
end
end
end

James Edward Gray II
 
R

Rafael George

Thanks, James and the other guys i think i found the solution for my problem :)

passvalues = []
i = 0
IO.foreach(fsource) do |line|
cols = []
cols=CSV::parse_line line.chomp
sourceval = cols[scomp_args[0]] + " " + cols[scomp_args[1]]
IO.foreach(tdest) do |line|
tcols = []
tcols=CSV::parse_line line.chomp
testval = tcols[tcomp_args[0]] + " " + tcols[tcomp_args[1]]
if sourceval == testval
passvalues = sourceval
i += 1
end
end
end


The direct translation of this code to FasterCSV is:

passvalues = Array.new
FCSV.foreach(fsource) |s_row|
source = s_row[scomp_args[0]..scomp_args[1]].join(" ")
FCSV.foreach(tdest) |t_row|
if source == t_row[scomp_args[0]..scomp_args[1]].join(" ")
passvalues << source


break # performance enhancement
end
end
end

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top