File.new and encoding

Achim Domma (SyynX Solutions GmbH) · Nov 29, 2005

Hi,

I'm still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?

How can I tell ruby which encoding to use, if I write to textfiles?

Any pointers to documentation are wellcome, but I didn't find something
usefull using google.

regards,
Achim

Robert Klemme · Nov 29, 2005

Achim said:
Hi,

I'm still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?

What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?

How can I tell ruby which encoding to use, if I write to textfiles?

Any pointers to documentation are wellcome, but I didn't find
something usefull using google.

Encoding is not an easy issue with ruby - I guess by default it uses the
default enconding of your environment. But you can specify certain
(Japanese) encodings with command line option -K. HTH

Kind regards

robert

nobu · Nov 29, 2005

Hi,

At Wed, 30 Nov 2005 00:17:29 +0900,
Robert Klemme wrote in [ruby-talk:167988]:

What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?

It would be UTF-8 encoded BOM, but ruby itself never write it
automatically.

Can't you show the code?

Achim Domma (SyynX Solutions GmbH) · Nov 29, 2005

It would be UTF-8 encoded BOM, but ruby itself never write it
automatically. [...]
Can't you show the code?

Trying to reproduce the problem in a smaller example, I figured out,
that I'm reading the BOM from one of my source files. Sorry for the
confusion. I'm doing something like:

File.open("target","w") do |target|
File.open("source","r") do |source|
source.each_line do |line|
... some processing ...
target.write(line)
end
end
end

source seems to contain the BOM and it is writen to target. Any hint on
how to strip the BOM?

regards,
Achim

Alex Fenton · Nov 29, 2005

I'm doing something like:

File.open("target","w") do |target|
File.open("source","r") do |source|
source.each_line do |line|
... some processing ...
target.write(line)
end
end
end

Have you looked at 'iconv' in the standard library?

http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/classes/Iconv.html

Assuming all your input files were ISO-8859-1, and you wanted your output file in UTF-8, your example might look something like (untested):

File.open("target","w") do |target|
Iconv.open('UTF-8', 'ISO-8859-1') do | converter |
File.open("source","r") do |source|
source.each_line do |line|
# ... some processing ...
target.write( converter.iconv(line) )
end
end
target << converter.iconv(nil)
end
end

Iconv should deal with BOMs, stripping them out or adding them in where necessary. I'm not sure if it will complain if it finds a BOM mid-stream (as you open your second and subsequent input file) - if so you could just instantiate a new Iconv to deal with each input.

HTH
alex

Mechanize and encoding	1	Nov 22, 2008
File.new and non-existent directories	3	Jun 13, 2007
Ruby1.9 Encoding	2	Sep 10, 2009
Reading a CSV file with UTF-16LE encoding	4	Jan 13, 2011
How to create a file with UTF-8 encoding	4	Sep 21, 2009
A few questiosn about encoding	103	Jun 9, 2013
Short question about encoding.	6	Nov 10, 2010
[ENCODING] UTF8 hell	12	Feb 2, 2010

File.new and encoding

Achim Domma (SyynX Solutions GmbH)

Robert Klemme

nobu

Achim Domma (SyynX Solutions GmbH)

Alex Fenton

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads