Ryan said:
There is test_csv.rb in the ruby tarball. Can you run your new code
against it to make sure it is complete? With good profile numbers I
doubt it'd be hard to get the slower code replaced.
Wow. test_csv.rb is beyond my comprehension. I don't know how
to use it.
I did lift a very complex test string from it to use in testing
my program. One of the fields in that csv string is defective;
I don't know whether that was intentional or not:
"\r\n"\r\nNaHi,
The " in the field isn't doubled, and the field doesn't end
with a quote.
Incidentally, when my program converts that string to an array
and then back to a csv string, it's not the same as
the original string because ,"", is shortened to ,, .
I corrected a minor bug in my code by moving
",".is_fs if $csv_fs.nil?
to its proper location.
The program conforms to the csv specification at this site:
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
and it handles the sample csv records shown there.
All my program can do is read a text file containing csv records,
convert those records (strings) into arrays of strings, and
convert the arrays back into csv strings. I suppose that the
csv library that comes with Ruby may do more than that.
% ## Read, parse, and create csv records.
% ## Has a minor bug fix; discard previous versions.
% ## 2005-02-01.
%
% class Array
% def to_csv
% ",".is_fs if $csv_fs.nil?
% s = ''
% self.map { |item|
% str = item.to_s
% # Quote the string if it contains the field-separator or
% # a " or a newline, or if it has leading or trailing
whitespace.
% if str.index($csv_fs) or /^\s|"|\n|\s$/.match(str)
% str = '"' + str.gsub( /"/, '""' ) + '"'
% end
% str
% }.join($csv_fs)
% end
% def unescape
% self.map{|x| x.gsub( /""/, '"' ) }
% end
% end
%
% class String
% # Set regexp for parse_csv.
% # self is the field-separator, which must be
% # a single character.
% def is_fs
% $csv_fs = self
% if "^" == $csv_fs
% fs = "\\^"
% else
% fs = $csv_fs
% end
% $csv_re = \
% ## Assumes embedded quotes are escaped as "".
% %r{ \s*
% (?:
% "( [^"]* (?: "" [^"]* )* )" |
% ( .*? )
% )
% \s*
% [#{fs}]
% }mx
% end
%
% def parse_string
% ",".is_fs if $csv_fs.nil?
% (self + $csv_fs).scan( $csv_re ).flatten.compact.unescape
% end
% end
%
% def get_rec( file )
% $csv_s = ""
% begin
% if file.eof?
% raise "The csv file is malformed." if $csv_s.size>0
% return nil
% end
% $csv_s += file.gets
% end until $csv_s.count( '"' ) % 2 == 0
% $csv_s.chomp!
% $csv_s.parse_string
% end
%
%
% # while rec = get_rec( ARGF )
% # puts "----------------"
% # puts $csv_s
% # p rec
% # puts rec.to_csv
% # end
%
% ## Here is my breakdown of the test string from test-csv.rb.
% # foo,
% # """foo""",
% # "foo,bar",
% # """""",
% # "",
% # ,
% # "\r",
% # "\r\n""\r\nNaHi", <---<< Corrected.
% # """Na""",
% # "Na,Hi",
% # "\r.\n",
% # "\r\n\n",
% # """",
% # "\n",
% # "\r\n"
%
% # Original.
% csvStr = ("foo,!!!foo!!!,!foo,bar!,!!!!!!,!!,," +
% "!\r!,!\r\n!\r\nNaHi,!!!Na!!!,!Na,Hi!," +
% "!\r.\n!,!\r\n\n!,!!!!,!\n!,!\r\n!").gsub('!', '"')
%
% # Corrected?
% csvStr = ("foo,!!!foo!!!,!foo,bar!,!!!!!!,!!,," +
% "!\r!,!\r\n!!\r\nNaHi!,!!!Na!!!,!Na,Hi!," +
% "!\r.\n!,!\r\n\n!,!!!!,!\n!,!\r\n!").gsub('!', '"')
%
% p csvStr
% arry = csvStr.parse_string
% p arry
% newCsvStr = arry.to_csv
% p newCsvStr
% arry2 = newCsvStr.parse_string
% puts "Arrays match." if arry == arry2