chinese encoded in UTF-8 and XML

K

Knackeback

Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Now I tried to parse the file with xmllint. xmllint is a
small xml-parser program which comes with libxml2.
The parser complains that the second "chinese line" is not proper
UTF-8.

==>

uhu:4: error: Input is not proper UTF-8, indicate encoding !
<chinese>ÄÎ</chinese>
^
uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F
<chinese>ÄÎ</chinese>

It is interesting that the parser only grumbles about the second
chinese line.

I'm anxious to see an explanation !
 
A

Andreas Prilop

Knackeback said:
Content-Type: text/plain; charset=big5

Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
[...]
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!

Probably not.
uhu:4: error: Input is not proper UTF-8, indicate encoding !
uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F

It seems your text was Big5-encoded, not UTF-8-encoded.
 
M

Micah Cowan

Knackeback said:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Now I tried to parse the file with xmllint. xmllint is a
small xml-parser program which comes with libxml2.
The parser complains that the second "chinese line" is not proper
UTF-8.

==>

FWICT, Emacs doesn't have a chinese input method which supports
unicode output... :-( ...I've had similar troubles with
Japanese. I've also noted that, e.g. for greek, there are input
methods which explicitly support unicode, and others which do
not.

-Micah
 
S

Stefan Monnier

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
[...]
FWICT, Emacs doesn't have a chinese input method which supports
unicode output... :-( ...I've had similar troubles with

But since he specified utf-8, Emacs should have complained rather than
silently use some other coding-system.
Please report the bug with M-x report-emacs-bug.


Stefan
 
A

Albert Chun-Chieh Huang

Knackeback said:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

In my Gnus on Emacs 21.3, I saw the Chinese characters in BIG5.
Maybe you should download MULE-UCS package and install it. With the
package, I can just enter BIG5 encoded Chinese characters, and specify
coding to utf-8, and I got utf-8 encoding text file.
Download mule-ucs from ftp://ftp.m17n.org, and add the lines below
to your .emacs file. The function of BIG5 to UTF-8 conversion is
defined in big5c-ucs.el, which is located in mule-ucs/lisp/big5conv

(add-to-list 'load-path "/path/to/your/mule-ucs/")
(add-to-list 'load-path "/path/to/your/mule-ucs/lisp")

(require 'un-define)
(require 'big5c-ucs)

--
Chun-Chieh Huang, aka Albert | E-mail: jjhuang AT cm.nctu.edu.tw
¶À«T³Ç |
Department of Computer Science |
National Tsing Hua University | MIME/ASCII/PDF/PostScript are welcome!
HsinChu, Taiwan | NO MS WORD DOC FILE, PLEASE!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,814
Latest member
SpicetreeDigital

Latest Threads

Top