Output unique values in CSV columns to a text file

D

Drew Olson

What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
UniqueVal1
UniqueVal2
Column: ColHeader2
UniqueVal1
UniqueVal2
...

What I'm currently getting is output that looks as follows:

Column: ColHeader1

ColHeader1UniqueVal1
ColHeader1UniqueVal2
Column: ColHeader2

ColHeader2UniqueVal1
ColHeader2UniqueVal2
...

For some reason, it is appending the column header to each value and
also printing a blank row to start each column. My code is below. Any
help is much appreciated. Essentially I read the CSV into a hash where
the key is the column header and the element is an array of values from
that column. I then run .uniq! on each array in the hash and print the
results to a file.

require 'rubygems'
require 'faster_csv'

infile = "xyz.csv"

uniques = {}

FCSV.open(infile, :headers => true).each do |row|
row.each_with_index do |element,j|
uniques[row.headers[j]] ||= []
uniques[row.header[j]] << element
end
end

uniques.each do |key,element|
element.uniq!
end

File.open("unique_output.txt","w+") do |out|
uniques.each_key do |key|
out.write "Column: #{key}\n"
uniques[key].each do |element|
out.write " #{element}\n"
end
end
end
 
J

James Edward Gray II

What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
UniqueVal1
UniqueVal2
Column: ColHeader2
UniqueVal1
UniqueVal2
...

Well, if it all fits in memory it's super easy using FCSV's Tables:

#!/usr/bin/env ruby -w

require "rubygems"
require "faster_csv"

table = FCSV.parse(DATA.read, :headers => true)
table.by_col!.each do |header, col|
puts "#{header}:"
puts " #{col.uniq.join(', ')}"
end

__END__
nums,letters
1,a
2,b
2,b
3,c
3,c
3,c

James Edward Gray II
 
W

William James

Drew said:
What I want to do is read in a CSV file and produce an output which
lists the unique values from each column in the following format:

Column: ColHeader1
UniqueVal1
UniqueVal2
Column: ColHeader2
UniqueVal1
UniqueVal2
..

What I'm currently getting is output that looks as follows:

Column: ColHeader1

ColHeader1UniqueVal1
ColHeader1UniqueVal2
Column: ColHeader2

ColHeader2UniqueVal1
ColHeader2UniqueVal2
..

For some reason, it is appending the column header to each value and
also printing a blank row to start each column. My code is below. Any
help is much appreciated. Essentially I read the CSV into a hash where
the key is the column header and the element is an array of values from
that column. I then run .uniq! on each array in the hash and print the
results to a file.

require 'rubygems'
require 'faster_csv'

infile = "xyz.csv"

uniques = {}

FCSV.open(infile, :headers => true).each do |row|
row.each_with_index do |element,j|
uniques[row.headers[j]] ||= []
uniques[row.header[j]] << element
end
end

uniques.each do |key,element|
element.uniq!
end

File.open("unique_output.txt","w+") do |out|
uniques.each_key do |key|
out.write "Column: #{key}\n"
uniques[key].each do |element|
out.write " #{element}\n"
end
end
end

data = DATA.readlines.map{|s| s.chomp.split(",")}
header = data.shift.map{|s| "Column: " + s}

data = data.transpose.map{|ary| ary.uniq.map{|s| " " + s} }

puts header.zip(data)


__END__
It's,so,simple!
a,b,c
a,b,c
d,e,f
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top