Nuby problem w/CSV, tab-delimited files & embedded double-quotes

R

rpardee

Hey All,

I've got a file of tab-delimited data that I need to read in. Up until
today this approach has worked wonderfully:

this_file = CSV.open(decrypted_file, "r", "\t")
header = this_file.shift
this_file.each do |line|
# do stuff w/line here
end
this_file.close

But today's file has an entry w/a pair of double-quotes around it. So
now I get:

c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)
from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'

I've looked through the rubydocs on CSV & am not finding a method for
telling CSV to expect double-quotes in the file. Is there such a
thing?

Thanks!

-Roy

P.S. I believe the following illustrates the problem--the "F" street
entry line does not seem to parse:

require "CSV" # Lib for working with comma-separated-values
files

somedata = <<END_OF_FILE
userid line1
1-2700 1313 Mockingbird Lane
2-2706 7100 58th Ave SE
4-2718 128 S. "F" Street
3-2712 45 600th Ave. NE
END_OF_FILE

somedata.each_line do |l|
x = CSV.parse_line(l, "\t")[0]
puts x
end

puts "Finished!"
 
M

Maik Schmidt

Hey All,

I've got a file of tab-delimited data that I need to read in. Up until
today this approach has worked wonderfully:

this_file = CSV.open(decrypted_file, "r", "\t")
header = this_file.shift
this_file.each do |line|
# do stuff w/line here
end
this_file.close

But today's file has an entry w/a pair of double-quotes around it. So
now I get:

c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)
from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'

I've looked through the rubydocs on CSV & am not finding a method for
telling CSV to expect double-quotes in the file. Is there such a
thing?

Thanks!

-Roy

P.S. I believe the following illustrates the problem--the "F" street
entry line does not seem to parse:

require "CSV" # Lib for working with comma-separated-values
files

somedata = <<END_OF_FILE
userid line1
1-2700 1313 Mockingbird Lane
2-2706 7100 58th Ave SE
4-2718 128 S. "F" Street
3-2712 45 600th Ave. NE
END_OF_FILE

The problem is that there's no CSV standard. The Ruby CSV library
requires you to put values containing quotes into quotes themselves. In
addition, you have to double the quotes within the quotes, i.e.

4-2718 "128 S. ""F"" Street"

will do it.

Cheers

Maik
 
A

Ara.T.Howard

Hey All,

I've got a file of tab-delimited data that I need to read in. Up until
today this approach has worked wonderfully:

this_file = CSV.open(decrypted_file, "r", "\t")
header = this_file.shift
this_file.each do |line|
# do stuff w/line here
end
this_file.close

But today's file has an entry w/a pair of double-quotes around it. So
now I get:

c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)
from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'

I've looked through the rubydocs on CSV & am not finding a method for
telling CSV to expect double-quotes in the file. Is there such a
thing?

Thanks!

-Roy

P.S. I believe the following illustrates the problem--the "F" street
entry line does not seem to parse:

require "CSV" # Lib for working with comma-separated-values
files

somedata = <<END_OF_FILE
userid line1
1-2700 1313 Mockingbird Lane
2-2706 7100 58th Ave SE
4-2718 128 S. "F" Street
3-2712 45 600th Ave. NE
END_OF_FILE

somedata.each_line do |l|
x = CSV.parse_line(l, "\t")[0]
puts x
end

puts "Finished!"

i think that, to be legitimate, this file would have to be
4-2718 "128 S. ""F"" Street"

so the data is corrupt and may need to be pre-munged. if you file is always
tab delimited why not

table = []
parse = proc{|line| line.split(%r/\t/).map{|c| c.strip}}
IO::readlines(decrypted_file){|line| table << parse[line]}
header = table.shift

or do you sometimes have escaped tabs?

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
R

rpardee

Woah--you are giving me the freak-out with that code. 8^)

I shouldn't have any escaped tabs--so that should suit.

Thanks!

-Roy

Hey All,

I've got a file of tab-delimited data that I need to read in. Up until
today this approach has worked wonderfully:

this_file = CSV.open(decrypted_file, "r", "\t")
header = this_file.shift
this_file.each do |line|
# do stuff w/line here
end
this_file.close

But today's file has an entry w/a pair of double-quotes around it. So
now I get:

c:/program files/ruby/lib/ruby/1.8/CSV.rb:639:in `get_row':
CSV::IllegalFormatError (CSV::IllegalFormatError)
from c:/program files/ruby/lib/ruby/1.8/CSV.rb:556:in `each'

I've looked through the rubydocs on CSV & am not finding a method for
telling CSV to expect double-quotes in the file. Is there such a
thing?

Thanks!

-Roy

P.S. I believe the following illustrates the problem--the "F" street
entry line does not seem to parse:

require "CSV" # Lib for working with comma-separated-values
files

somedata = <<END_OF_FILE
userid line1
1-2700 1313 Mockingbird Lane
2-2706 7100 58th Ave SE
4-2718 128 S. "F" Street
3-2712 45 600th Ave. NE
END_OF_FILE

somedata.each_line do |l|
x = CSV.parse_line(l, "\t")[0]
puts x
end

puts "Finished!"

i think that, to be legitimate, this file would have to be
4-2718 "128 S. ""F"" Street"

so the data is corrupt and may need to be pre-munged. if you file is always
tab delimited why not

table = []
parse = proc{|line| line.split(%r/\t/).map{|c| c.strip}}
IO::readlines(decrypted_file){|line| table << parse[line]}
header = table.shift

or do you sometimes have escaped tabs?

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
R

rpardee

Come to think of it--I don't really need CSV at all for this. I can
just .each_line the file to get my rows and then .split("\t") each line
to get my fields. Or whatever the heck I'm trying to say...

Thanks again everyone!

-Roy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,261
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top