Regexp for CSV header

P

Paul Shapiro

My script currently is processing various csv files. The top row/header
resembles this format:

Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you answered
the online timed retrieval quiz?,3) B19. If you want your product to be
easy to find in the supermarket then you should make its container,"4)
C19. So that he can shift attention between the radio and his
incessantly talking girl friend when she is in the car, Joe adjusts his
radio",5) B20. Early selection is most likely to occur for,6) C20.
Early selection for a red target is most likely to occur when there
is,"7) B21. In a lexical decision task, when the target is a bird name,
e.g. robin, it is usually preceded by the prime BODY but is sometimes
preceded by the prime BIRD."

Most of the headers begin '1)', '5)', etc. I need to remove this from
the csv files. Another problem I've encountered while doing this is that
some of the headers are encased in double quotes like, '"4)4) C19. So
that he can shift attention between the radio and his incessantly
talking girl friend when she is in the car, Joe adjusts his radio", 5)
B20'

I have tried connveting the top row from an array to a string and then
gsub(/[\d]+\)/,''). This kinda works. It is unable to deal with the
double quote problem. It also replaces with whitespace, which I don't
want. Also, I can't figure out how to put it back in the array as it was
then write it back to the csv.

Help would be appreciated. Thanks.
 
J

James Gray

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,"4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio",5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,"7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II
 
J

James Gray

My script currently is processing various csv files.

I recommend using a CSV parser so it can worry about all of those
little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,"4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio",5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,"7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II
 
P

Paul Shapiro

James said:
I recommend using a CSV parser so it can worry about all of those
little details for you. Here's an example script to give you ideas:

#!/usr/bin/env ruby -wKU

require "rubygems"
require "faster_csv"

# read a line of CSV
fields = FCSV.parse_line(DATA.read)

# edit the fields
fields.each do |f|
f.sub!(/\A\d+\)\s*/, "")
end
# show fields
puts fields

# write back out as CSV
puts FCSV.generate_line(fields)

__END__
Device ID,1) S31 Which best describes how you answered the online
reading comprehension quiz?,2) S32 Which best describes how you
answered the online timed retrieval quiz?,3) B19. If you want your
product to be easy to find in the supermarket then you should make its
container,"4) C19. So that he can shift attention between the radio
and his incessantly talking girl friend when she is in the car, Joe
adjusts his radio",5) B20. Early selection is most likely to occur
for,6) C20. Early selection for a red target is most likely to occur
when there is,"7) B21. In a lexical decision task, when the target is
a bird name, e.g. robin, it is usually preceded by the prime BODY but
is sometimes preceded by the prime BIRD."

Hope that helps.

James Edward Gray II

#!/usr/bin/env ruby

require 'rubygems'
require 'roo'
require 'csv'
require 'fileutils'
require 'rio'
require 'fastercsv'

FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/xls"
FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/tmp"
FileUtils.mkdir_p "/Users/pshapiro/Desktop/Excel/csv"

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/*.xls"]
for file in @filesxls
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/xls")
end

@filesxls = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls"]
@filetmp = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls_tmp"]

for file in @filesxls
convert = Excel.new(file)
convert.default_sheet = convert.sheets[0]
convert.to_csv(file+"_tmp")
end

@filestmp = Dir["/Users/pshapiro/Desktop/Excel/xls/*.xls_tmp"]

for file in @filestmp
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/tmp")
end

dir = "/Users/pshapiro/Desktop/Excel/tmp/"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end

files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + f + ".csv"
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/tmp/*.csv"]

for file in @filescsv
FileUtils.move(file,"/Users/pshapiro/Desktop/Excel/csv")
end

FileUtils.rm_rf("/Users/pshapiro/Desktop/Excel/tmp")

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
5.times {
text=""
File.open(file,"r"){|f|f.gets;text=f.read}
File.open(file,"w+"){|f| f.write(text)}
}
end

dir = "/Users/pshapiro/Desktop/Excel/csv/"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*') + ".tmp"
File.rename(oldFile, newFile)
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
csv = FasterCSV.read(file, :headers => true)
lastc = csv.headers.length-1
# puts lastc
rio(file).csv.skipcolumns(1..2,lastc) > rio(file+".csv").csv(',')
end

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.tmp"]

for file in @filescsv
FileUtils.remove(file)
end

dir = "/Users/pshapiro/Desktop/Excel/csv"
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end

2.times {
files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + File.basename(f, '.*')
File.rename(oldFile, newFile)
end
}

files = Dir.entries(dir)
files.each do |f|
next if f == "." or f == ".."
oldFile = dir + "/" + f
newFile = dir + "/" + f + ".csv"
File.rename(oldFile, newFile)
end

#####################################

@filescsv = Dir["/Users/pshapiro/Desktop/Excel/csv/*.csv"]

for file in @filescsv
csv = FasterCSV.read(file, :headers => true)
csv = csv.to_s
fields = FCSV.parse_line(csv)

fields.each do |f|
f.sub!(/[\d]+\)+[\s]/,'')
end

puts fields

wline = FCSV.generate_line(fields)
astring = rio(file).contents
rio(file).csv.print(astring).close

text=""
File.open(file,"r"){|f|f.gets;text=f.read}
File.open(file,"w+"){|f| f.write(text)}

astring = rio(file).contents
rio(file).csv.print(wline+astring).close
end

Again, Thanks!!!!!!!!!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,996
Messages
2,570,237
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top