Problem in parsing xml document with japanese text

P

Prakash

Hi all,

I am trying a parse a xml document containing japanese text by
constructing a DOMBuilder object. The document created after parsing
is empty. If the xml document does not contain japanese characters,
the parsing goes thro' properly.

Here is the sample document that is causing the problem.

<root>
<aolist>
<ao>
<attribute id="Identifier"
dictName="Identifier"><value>3</value></attribute>
<attribute id="EventTime" dictName="EventTime"><value>2003ǯ06·î26Æü
13»þ</value></attribute>
</ao>
</aolist>
</root>

Here is the sample code written to parse the xml document. The above
xml string is present in newformatedstr and is passed to DOMBuilder
parse method, after wrapping it into a xml structure using
MemBufInputSource/Wrapper4InputSource.

DOMBuilder *parser_p = NULL;

{
// Wraps formattedMsg_r to create input structure for the
parser
MemBufInputSource memBuf_p(
(const XMLByte*)newformattedstr.data()
, newformattedstr.length()
, "info"
, false
);
Wrapper4InputSource msg(&memBuf_p);

// Sets up the parser
static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
DOMImplementation *impl_p =
DOMImplementationRegistry::getDOMImplementation(gLS);
assert(impl_p);

parser_p = ((DOMImplementationLS*)impl_p)->createDOMBuilder(

DOMImplementationLS::MODE_SYNCHRONOUS,
0);
assert(parser_p);

parser_p->setFeature(XMLUni::fgDOMNamespaces, false);
parser_p->setFeature(XMLUni::fgXercesSchema, false);
parser_p->setFeature(XMLUni::fgXercesSchemaFullChecking,
false);
parser_p->setFeature(XMLUni::fgDOMValidateIfSchema, true);
parser_p->setFeature(XMLUni::fgDOMDatatypeNormalization,
true);

// Pointer to the temporary xml strucutre
DOMDocument* tempDoc_p = NULL;

try
{
parser_p->resetDocumentPool();
tempDoc_p = parser_p->parse(msg);
}
catch (const XMLException& e)
{
......
}
catch (...)
{
...............
}

// Root node of the temporal document.
DOMNode* tempRootNode_p =
(DOMNode*)tempDoc_p->getDocumentElement();
assert(tempRootNode_p);

XMLCh *tempDoc = theSerializer_p->writeToString(*tempDoc_p);
assert(tempDoc);

char *output_tempDoc = XMLString::transcode(tempDoc);
outfile<<output_tempDoc<<endl;

parser_p->release();
}


I am using xerces 2.2.0. As I understand from the documents, this
version supports internationalization. But not use why it is not able
to parse. I tried both UTF-8 and UTF-16 encoding, it doesn't help.

Any pointers on how to solve this problem will be of great help.

Best Regards
Prakash
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top