This is the best I've come up with so far. It should handle any CSV
record
(i.e., fields may contain commas, double quotes, and newlines).
class String
def csv
if include? '"'
ary =
"#{chomp},".scan( /\G"([^"]*(?:""[^"]*)*)",|\G([^,"]*),/ )
raise "Bad csv record:\n#{self}" if $' != ""
ary.map{|a| a[1] || a[0].gsub(/""/,'"') }
else
ary = chomp.split( /,/, -1)
## "".csv ought to be [""], not [], just as
## ",".csv is ["",""].
if [] == ary
[""]
else
ary
end
end
end
end
You are pretty much rewriting FasterCSV here. Why do that when we
could just use it instead?
That is a dishonest comment.
Not honest? I guess I'm not sure how you meant that.
FasterCSV's parser uses a very similar regular expression. Quoting
from the source:
# prebuild Regexps for faster parsing
@parsers = {
:leading_fields =>
/\A(?:#{Regexp.escape(@col_sep)})+/, # for empty leading
fields
:csv_row =>
### The Primary Parser ###
/ \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
(?: "((?>[^"]*)(?>""[^"]*)*)" # find quoted fields
| # ... or ...
([^"#{Regexp.escape(@col_sep)}]*) # unquoted fields
)/x,
### End Primary Parser ###
:line_end =>
/#{Regexp.escape(@row_sep)}\z/ # safer than chomp!()
}
I felt they were similar enough to say you were recreating it. I can
live with it if you don't agree though.
What if someone had said to you when you released "FasterCSV":
"You are pretty much rewriting CSV here. Why do that when we
could just use it instead?"
They did. I said it was too slow and I didn't care for the
interface, though some do prefer it. Pretty much what you just said
to me, so I look forward to using your EvenFasterCSV library on my
next project.
Parsing CSV isn't very difficult.
Yeah, it's not too tough.
I'm a little bothered by how your solution makes me slurp the data
into a String though. Today I was working with a CSV file with over
35,000 records in it, so I'm not too comfortable with that. You
might consider adding a little code to ease that.
Also, I really prefer to work with CSV by headers, instead of column
indices. That's easier and more robust, in my opinion. You might
want to add some code for that too.
Of course, then we're just getting closer and closer to FasterCSV, so
maybe not...
"FasterCSV" is too slow and far too large.
FasterCSV is mostly interface code to make the user experience as
nice as possible. There's also a lot of documentation in there. The
core parser is still way smaller than the standard library's parser.
James Edward Gray II