XMLParser error with unicode characters in XML file.

M

Manoj Nair

I am getting a XML parsing error from weblogic.apache.xerces when I parse a XML
document which contains accented characters.
This is what I am doing

1) Some database columns have accented data for spanish,japanese etc languages
like Número de identificação: and número de identificación.

2) I am reading this data and creating a XML file using some processing and
then writing the file on the disc using weblogic.xml.stream.XMLOutputStream
flush() method.

3) Then I am using FOP to render this XML in PDF. In the rendering process
the weblogic.apache.xerces.XMLparser gets called to parse the XML. Here the
parser throws a org.xml.sax.SAXParserException ( An invalid XML character
(Unicode: 0xfa) was found in the element content of the document).
I was under the impression that XMLParser should take care of the accented
characters. When I open the XML file which I created in XML SPY I see "box"
characters like "cliente n? de identificaci".

Please let me know how should i handle my code here.

Thanks
 
P

Patrick TJ McPhee

% I am getting a XML parsing error from weblogic.apache.xerces when I parse a XML
% document which contains accented characters.

An example might be good. The problem is most likely that you're not
specifying an encoding, and that your data is not encoded in utf-8.

Try including an XML declaration:

<?xml version='1.0' encoding='iso-8859-1'?>

at the start of your XML files. This ought to work for the spanish data.
You'll need to find out how the japanese data is encoded and either
put that encoding in the XML declaration, or convert the data to UTF-8.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top