Question about using SHIFT-JIS encoding with libxml2

S

saumya.agarwal

Hi,

I am using libxml2 for xml parsing. When the client application sends
data to libxml2 in UTF-8 format, it works fine.

But, I have a scenarion in which the client application sends data to
libxml2 parser in SHIFT-JIS format.

The following error is thrown by libxml2 -

"Parsing error in results: Input is not proper UTF-8, indicate
encoding !

In libxml2 documentation at http://www.xmlsoft.org/encoding.html I
read that libxml2 can support any encoding by calling the
xmlSwitchEncoding() routine.
What do I have to do to make libxml2 support SHIFT-JIS format? I want
to continue supporting UTF-8 also.


Thanks,
Saumya
 
M

Martin Honnen

But, I have a scenarion in which the client application sends data to
libxml2 parser in SHIFT-JIS format.

The following error is thrown by libxml2 -

"Parsing error in results: Input is not proper UTF-8, indicate
encoding !

Does the XML contain an XML declaration indicating the encoding e.g.
<?xml version="1.0" encoding="SHIFT-JIS"?>
 
S

saumya.agarwal

Does the XML contain an XML declaration indicating the encoding e.g.
<?xml version="1.0" encoding="SHIFT-JIS"?>

Yes, it does. I thought that should that be enough to tell the libxml2
parser that the encoding format is SHIFT-JIS.
Does libxml2 support SHIFT-JIS encoding ? I want to keep the support
for UTF-8 intact too. Is it possible?
Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is supplied
in XML declaration as above?

Thanks,
Saumya
 
M

Matej Cepl

Yes, it does. I thought that should that be enough to tell the libxml2
parser that the encoding format is SHIFT-JIS. Does libxml2 support
SHIFT-JIS encoding ? I want to keep the support for UTF-8 intact too. Is
it possible? Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is
supplied in XML declaration as above?

This looks promising (and yes, do read both referenced tutorials)
http://xmlsoft.org/encoding.html

Matej
 
J

Joe Kesselman

Does libxml2 support SHIFT-JIS encoding ?

I don't know offhand. Find its documentation?
Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is supplied
in XML declaration as above?

Most Java-based XML processors actually convert to UTF-16 internally,
since that's a native character representation in Java. I don't know
what libxml2 is using, but I would expect they're doing something
similar -- convert to some standardized internal form, process that,
then convert back. Some tools have tried to avoid the double conversion
when data is being passed straight through, but recognizing and taking
advantage of that optimization is not easy.
 
A

Arndt Jonasson

Does libxml2 support SHIFT-JIS encoding ? I want to keep the support
for UTF-8 intact too. Is it possible?

For what it's worth, the source code contains the following (in
version 2.6.27):

case XML_CHAR_ENCODING_2022_JP:
__xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
"encoding not supported %s\n",
BAD_CAST "ISO-2022-JP", NULL);
break;
case XML_CHAR_ENCODING_SHIFT_JIS:
__xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
"encoding not supported %s\n",
BAD_CAST "Shift_JIS", NULL);
break;
case XML_CHAR_ENCODING_EUC_JP:
__xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
"encoding not supported %s\n",
BAD_CAST "EUC-JP", NULL);
break;
 
M

Matej Cepl

Arndt Jonasson said:
For what it's worth, the source code contains the following (in
version 2.6.27):

However, according to the webpage (link to which I sent to this
thread) libxml can use iconv and all its supported codepages
(i.e., whatever you have even dreamed about).

Matej
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,008
Messages
2,570,268
Members
46,867
Latest member
Lonny Petersen

Latest Threads

Top