Error while parsing local languages using SAX/DOM parser.

S

Sidhartha

Hi,
I am facing a problem while parsing local language characters using
sax parser. We use DOM to parse and SAX to read the source. But when
our application parses strings with local language especially
czech,polish,turkish in place of local language character some other
word is comming.

Eg:
Input string :ahoj, jak se máš
Output string :ahoj, jak se máš
OS: Solaris.

We persist this xml in the database. This issue was not comming when
the parser was that of IBM and os NT.The local language character is
getting replaced by "&aacute". This causing problem when we tranlsate
it back.Can anyone please help me.

Stack Trace

class org.xml.sax.SAXException message = Parser reported fatal error
while parsing : Input Source/DTD
Stack Trace:
org.xml.sax.SAXParseException: The entity "aacute" was referenced, but
not declared.
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown
Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown
Source)
at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl
$ContentDispatcher.scanRootElementHook(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl
$FragmentContentDispatcher.dispatch(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)

Thanks,
Sidhartha
 
M

Martin Honnen

Sidhartha said:
Hi,
I am facing a problem while parsing local language characters using
sax parser. We use DOM to parse and SAX to read the source. But when
our application parses strings with local language especially
czech,polish,turkish in place of local language character some other
word is comming.

Eg:
Input string :ahoj, jak se máš
Output string :ahoj, jak se máš
OS: Solaris.

We persist this xml in the database. This issue was not comming when
the parser was that of IBM and os NT.The local language character is
getting replaced by "&aacute". This causing problem when we tranlsate
it back.Can anyone please help me.

It is rather odd that you get an XHTML entity reference 'á' in
your XML. I am not sure why that happens. Are you using XSLT for
instance to serialize XML?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top