chinese encoded in UTF-8 and XML

Knackeback · Sep 25, 2003

Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Now I tried to parse the file with xmllint. xmllint is a
small xml-parser program which comes with libxml2.
The parser complains that the second "chinese line" is not proper
UTF-8.

==>

uhu:4: error: Input is not proper UTF-8, indicate encoding !
<chinese>ÄÎ</chinese>
^
uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F
<chinese>ÄÎ</chinese>

It is interesting that the parser only grumbles about the second
chinese line.

I'm anxious to see an explanation !

Andreas Prilop · Sep 25, 2003

Knackeback said:
Content-Type: text/plain; charset=big5

Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
[...]
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!

Probably not.

uhu:4: error: Input is not proper UTF-8, indicate encoding !
uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F

It seems your text was Big5-encoded, not UTF-8-encoded.

Micah Cowan · Sep 26, 2003

Knackeback said:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Now I tried to parse the file with xmllint. xmllint is a
small xml-parser program which comes with libxml2.
The parser complains that the second "chinese line" is not proper
UTF-8.

==>

FWICT, Emacs doesn't have a chinese input method which supports
unicode output... :-( ...I've had similar troubles with
Japanese. I've also noted that, e.g. for greek, there are input
methods which explicitly support unicode, and others which do
not.

-Micah

Stefan Monnier · Sep 26, 2003

and then I used "C-x RET f" and then I choosed utf-8.

Then I typed "C-x C-s" to save my file.

Click to expand...

[...]
FWICT, Emacs doesn't have a chinese input method which supports
unicode output... :-( ...I've had similar troubles with

But since he specified utf-8, Emacs should have complained rather than
silently use some other coding-system.
Please report the bug with M-x report-emacs-bug.

Stefan

Albert Chun-Chieh Huang · Sep 30, 2003

Knackeback said:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

In my Gnus on Emacs 21.3, I saw the Chinese characters in BIG5.
Maybe you should download MULE-UCS package and install it. With the
package, I can just enter BIG5 encoded Chinese characters, and specify
coding to utf-8, and I got utf-8 encoding text file.
Download mule-ucs from ftp://ftp.m17n.org, and add the lines below
to your .emacs file. The function of BIG5 to UTF-8 conversion is
defined in big5c-ucs.el, which is located in mule-ucs/lisp/big5conv

(add-to-list 'load-path "/path/to/your/mule-ucs/")
(add-to-list 'load-path "/path/to/your/mule-ucs/lisp")

(require 'un-define)
(require 'big5c-ucs)

--
Chun-Chieh Huang, aka Albert | E-mail: jjhuang AT cm.nctu.edu.tw
¶À«T³Ç |
Department of Computer Science |
National Tsing Hua University | MIME/ASCII/PDF/PostScript are welcome!
HsinChu, Taiwan | NO MS WORD DOC FILE, PLEASE!

problem parsing utf-8 encoded xml - minidom	2	Jul 4, 2008
XML not well formed and UTF-8 encoding	8	Jan 19, 2007
[ruby 1.9] reading an UTF-8 encoded file	12	Mar 10, 2010
UTF-8 read & print?	6	Nov 25, 2012
UTF-8 to Unicode conversion in ajax response	9	May 17, 2011
How could I convert plain UTF-8 XML to Outlook HTML format ?	1	Oct 14, 2010
form post URL encoded	4	Jun 26, 2013
UTF-8 and strings	44	Jun 7, 2011

chinese encoded in UTF-8 and XML

Knackeback

Andreas Prilop

Micah Cowan

Stefan Monnier

Albert Chun-Chieh Huang

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads