Saving an UTF-8 file

M

Miquel Oliete

Hi All

I have a problem (newbie problem).

I don't know how to write a file using utf-8 encoding. Can you help
me.

Thanks in advance

Kind regards

--

Miquel (a.k.a. Ton)
Linux User #286784
GPG Key : 4D91EF7F
Debian GNU/Linux (Linux Wolverine 2.6.14)

Welcome to the jungle, we got fun and games
Guns n' Roses


______________________________________________
LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com
 
D

David Vallner

--------------enig36A9690EDCEC40E778E1437B
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Paul said:
It isn't something you can specify in a
plain-text file.=20

Byte order mark?

A specification it is not, but generally a good hint. There are gotchas
though if you process it with software that's not Unicode-unaware.

David Vallner


--------------enig36A9690EDCEC40E778E1437B
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFVxsOy6MhrS8astoRAjl0AJ43wIhwUKUEMQuacwNrG4k/2K2u+wCfZ+Ml
XymvLNBfkTBgpx5QCNI5o+M=
=4xi8
-----END PGP SIGNATURE-----

--------------enig36A9690EDCEC40E778E1437B--
 
D

David Vallner

--------------enigFDFC7DAF641778185DD2931D
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Austin said:
=20
Not meaningful in UTF-8, since it's all a defined series of bytes
(it's always the same order on all platforms).
=20
-austin

Yes, but it can be used as a "this file is UTF-8" marker by convention.
And cause problems in software that doesn't recognize the convention,
for added hilarity.

David Vallner


--------------enigFDFC7DAF641778185DD2931D
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFV092y6MhrS8astoRAmNNAJ9BBzoLkBGs4/m9szlsnFk/sMo8kQCfeK7M
9BIkq6oY+lUEyq8YrCawXk4=
=DNHt
-----END PGP SIGNATURE-----

--------------enigFDFC7DAF641778185DD2931D--
 
A

Austin Ziegler

Yes, but it can be used as a "this file is UTF-8" marker by convention.
And cause problems in software that doesn't recognize the convention,
for added hilarity.

It's a bad convention, because it adds meaningless bytes to the
beginning of a file. I'm not saying that an unadorned document is
better, but better to do something that has actual meaning than doing
a pointless BOM.

-austin
 
D

Dido Sevilla

Hi All

I have a problem (newbie problem).

I don't know how to write a file using utf-8 encoding. Can you help
me.

Well, how are you storing the Unicode characters are you using
internally? If your Unicode string within Ruby is stored as an array
of ints, then

File.open("output_file.utf8") do |fp|
fp.puts(data.pack("U*"))
end

should be sufficient. If you have a Ruby string that uses some other
encoding (e.g. ISO-8859-1), then you must use the iconv library to
convert the string to UTF-8:

require 'iconv'

cd = Iconv.new('utf-8', 'iso-8859-1')
File.open("output_file.utf8") do |fp|
fp.puts(cd.iconv(data))
end

When you do i18n, l10n, and m17n, strings become meaningless unless
they have an attached encoding.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Ampersand entities 3
Charset detection 4
Problem with Amrita 2
encoding question 1
ruby gtk editable cells in treeview 2
Class & modifiers modifiers 4
Big project in ruby 3
ruby indentantion 20

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top