Regexp for CSV header

Paul Shapiro · Jun 17, 2009

My script currently is processing various csv files. The top row/header
resembles this format:

Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you answered
the online timed retrieval quiz?,3) B19. If you want your product to be
easy to find in the supermarket then you should make its container,"4)
C19. So that he can shift attention between the radio and his
incessantly talking girl friend when she is in the car, Joe adjusts his
radio",5) B20. Early selection is most likely to occur for,6) C20.
Early selection for a red target is most likely to occur when there
is,"7) B21. In a lexical decision task, when the target is a bird name,
e.g. robin, it is usually preceded by the prime BODY but is sometimes
preceded by the prime BIRD."

Most of the headers begin '1)', '5)', etc. I need to remove this from
the csv files. Another problem I've encountered while doing this is that
some of the headers are encased in double quotes like, '"4)4) C19. So
that he can shift attention between the radio and his incessantly
talking girl friend when she is in the car, Joe adjusts his radio", 5)
B20'

I have tried connveting the top row from an array to a string and then
gsub(/[\d]+\)/,''). This kinda works. It is unable to deal with the
double quote problem. It also replaces with whitespace, which I don't
want. Also, I can't figure out how to put it back in the array as it was
then write it back to the csv.

Help would be appreciated. Thanks.

James Gray · Jun 17, 2009

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,"4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio",5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,"7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II

James Gray · Jun 17, 2009

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,"4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio",5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,"7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II

Paul Shapiro · Jun 18, 2009

James said:
I recommend using a CSV parser so it can worry about all of those
little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,"4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio",5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,"7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II

#!/usr/bin/env ruby

require 'rubygems'
require 'roo'
require 'csv'
require 'fileutils'
require 'rio'
require 'fastercsv'

FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/xls"
FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/tmp"
FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/csv"

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/*.xls"]
for file in @filesxls
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/xls")
end

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls"]
@filetmp = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls_tmp"]

for file in @filesxls
convert = Excel.new(file)
convert.default_sheet = convert.sheets[0]
convert.to_csv(file+"_tmp")
end

@filestmp = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls_tmp"]

for file in @filestmp
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/tmp")
end

dir = "/Users/pshapiro/Desktop/Excel/tmp/"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end

files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + f + ".csv"
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/tmp/*.csv"]

for file in @filescsv
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/csv")
end

FileUtils.rm_rf("/Users/pshapiro/Desktop/Excel/tmp")

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
5.times {
text=""
File.open(file,"r"){|f|f.gets;text=f.read}
File.open(file,"w+"){|f| f.write(text)}
}
end

dir = "/Users/pshapiro/Desktop/Excel/csv/"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*') + ".tmp"
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
csv = FasterCSV.read(file, :headers => true)
lastc = csv.headers.length-1
# puts lastc
rio(file).csv.skipcolumns(1..2,lastc) > rio(file+".csv").csv(',')
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
FileUtils.remove(file)
end

dir = "/Users/pshapiro/Desktop/Excel/csv"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end

2.times {
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end
}

files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + f + ".csv"
File.rename(oldFile, newFile)
end

#####################################

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
csv = FasterCSV.read(file, :headers => true)
csv = csv.to_s
fields = FCSV.parse_line(csv)

fields.each do |f|
f.sub!(/[\d]+\)+[\s]/,'')
end

puts fields

wline = FCSV.generate_line(fields)
astring = rio(file).contents
rio(file).csv.print(astring).close

text=""
File.open(file,"r"){|f|f.gets;text=f.read}
File.open(file,"w+"){|f| f.write(text)}

astring = rio(file).contents
rio(file).csv.print(wline+astring).close
end

Again, Thanks!!!!!!!!!!

if column header contain regexp, delete column	9	Jun 21, 2009
Telepathy: the story of a synchronization, explained	13	Feb 18, 2012
Ruby Weekly News 24th - 30th October 2005	0	Nov 1, 2005
plz read it	2	Feb 21, 2008
i=infinity;0= isin kpi, 1=cos kpi, k=m/n, n=4,m=0-00; cG=20=const, 1/sgrt2>G>0.5, 6<N = NA ^2su	13	Aug 8, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Regexp for CSV header

Paul Shapiro

James Gray

James Gray

Paul Shapiro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads