CSV Goes M17n

J

James Gray

I've just finished an extensive reworking of the standard CSV library =20=

in Ruby 1.9 (formerly FasterCSV). CSV's parser and generator are now =20=

m17n aware. This means they should work naturally with your data in =20
any non-"dummy" Encoding Ruby 1.9 supports.

Everything is documented so it should be pretty easy to figure out how =20=

to use the new system, but generally you just set the Encoding for =20
your IO or String objects correctly and CSV should do the rest:

# reading example
CSV.foreach(=85, :encoding =3D> "=85") do |row|
# row will be parsed but not transcoded here
end

# writing example
CSV.open(=85, "wb:=85") do |csv|
csv << data
# data will be quoted and separated with characters
# in the proper encoding
end

Encodings default to Encoding.default_external if not provided.

I had to change quite a bit of code to support this. I tried to test =20=

well, but it's possible I introduced some new bugs. Please let me =20
know if you find any issues.

I suspect this is probably one of the first full m17n compatible =20
implementations, so I hope it can serve as a guide to others wanting =20
to provide similar support in their libraries. I know I learned a ton =20=

just figuring out how to do this. Feel free to ask me questions about =20=

mulit-encoding support. I'll sure try to answer them if I can.

Finally, here's some fun news to look forward to: even with the m17n =20=

support, CSV on Ruby 1.9 is over three times faster than FasterCSV on =20=

Ruby 1.8 thanks to the speed of the new VM and the switch to =20
Oniguruma. Three cheers to the core team for giving us a much faster =20=

Ruby!

James Edward Gray II
 
J

Jeremy Hinegardner

I've just finished an extensive reworking of the standard CSV library in
Ruby 1.9 (formerly FasterCSV). CSV's parser and generator are now m17n
aware. This means they should work naturally with your data in any
non-"dummy" Encoding Ruby 1.9 supports.

Everything is documented so it should be pretty easy to figure out how to
use the new system, but generally you just set the Encoding for your IO or
String objects correctly and CSV should do the rest:

# reading example
CSV.foreach(?, :encoding => "?") do |row|
# row will be parsed but not transcoded here
end

# writing example
CSV.open(?, "wb:?") do |csv|
csv << data
# data will be quoted and separated with characters
# in the proper encoding
end

Encodings default to Encoding.default_external if not provided.

I had to change quite a bit of code to support this. I tried to test well,
but it's possible I introduced some new bugs. Please let me know if you
find any issues.

I suspect this is probably one of the first full m17n compatible
implementations, so I hope it can serve as a guide to others wanting to
provide similar support in their libraries. I know I learned a ton just
figuring out how to do this. Feel free to ask me questions about
mulit-encoding support. I'll sure try to answer them if I can.

Finally, here's some fun news to look forward to: even with the m17n
support, CSV on Ruby 1.9 is over three times faster than FasterCSV on Ruby
1.8 thanks to the speed of the new VM and the switch to Oniguruma. Three
cheers to the core team for giving us a much faster Ruby!

Awesome James!

FasterCSV is under very heavy utilization over here and we're always glad you
made such a fine library.

enjoy,

-jeremy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top