P
Prakash
Hi all,
I am trying a parse a xml document containing japanese text by
constructing a DOMBuilder object. The document created after parsing
is empty. If the xml document does not contain japanese characters,
the parsing goes thro' properly.
Here is the sample document that is causing the problem.
<root>
<aolist>
<ao>
<attribute id="Identifier"
dictName="Identifier"><value>3</value></attribute>
<attribute id="EventTime" dictName="EventTime"><value>2003ǯ06·î26Æü
13»þ</value></attribute>
</ao>
</aolist>
</root>
Here is the sample code written to parse the xml document. The above
xml string is present in newformatedstr and is passed to DOMBuilder
parse method, after wrapping it into a xml structure using
MemBufInputSource/Wrapper4InputSource.
DOMBuilder *parser_p = NULL;
{
// Wraps formattedMsg_r to create input structure for the
parser
MemBufInputSource memBuf_p(
(const XMLByte*)newformattedstr.data()
, newformattedstr.length()
, "info"
, false
);
Wrapper4InputSource msg(&memBuf_p);
// Sets up the parser
static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
DOMImplementation *impl_p =
DOMImplementationRegistry::getDOMImplementation(gLS);
assert(impl_p);
parser_p = ((DOMImplementationLS*)impl_p)->createDOMBuilder(
DOMImplementationLS::MODE_SYNCHRONOUS,
0);
assert(parser_p);
parser_p->setFeature(XMLUni::fgDOMNamespaces, false);
parser_p->setFeature(XMLUni::fgXercesSchema, false);
parser_p->setFeature(XMLUni::fgXercesSchemaFullChecking,
false);
parser_p->setFeature(XMLUni::fgDOMValidateIfSchema, true);
parser_p->setFeature(XMLUni::fgDOMDatatypeNormalization,
true);
// Pointer to the temporary xml strucutre
DOMDocument* tempDoc_p = NULL;
try
{
parser_p->resetDocumentPool();
tempDoc_p = parser_p->parse(msg);
}
catch (const XMLException& e)
{
......
}
catch (...)
{
...............
}
// Root node of the temporal document.
DOMNode* tempRootNode_p =
(DOMNode*)tempDoc_p->getDocumentElement();
assert(tempRootNode_p);
XMLCh *tempDoc = theSerializer_p->writeToString(*tempDoc_p);
assert(tempDoc);
char *output_tempDoc = XMLString::transcode(tempDoc);
outfile<<output_tempDoc<<endl;
parser_p->release();
}
I am using xerces 2.2.0. As I understand from the documents, this
version supports internationalization. But not use why it is not able
to parse. I tried both UTF-8 and UTF-16 encoding, it doesn't help.
Any pointers on how to solve this problem will be of great help.
Best Regards
Prakash
I am trying a parse a xml document containing japanese text by
constructing a DOMBuilder object. The document created after parsing
is empty. If the xml document does not contain japanese characters,
the parsing goes thro' properly.
Here is the sample document that is causing the problem.
<root>
<aolist>
<ao>
<attribute id="Identifier"
dictName="Identifier"><value>3</value></attribute>
<attribute id="EventTime" dictName="EventTime"><value>2003ǯ06·î26Æü
13»þ</value></attribute>
</ao>
</aolist>
</root>
Here is the sample code written to parse the xml document. The above
xml string is present in newformatedstr and is passed to DOMBuilder
parse method, after wrapping it into a xml structure using
MemBufInputSource/Wrapper4InputSource.
DOMBuilder *parser_p = NULL;
{
// Wraps formattedMsg_r to create input structure for the
parser
MemBufInputSource memBuf_p(
(const XMLByte*)newformattedstr.data()
, newformattedstr.length()
, "info"
, false
);
Wrapper4InputSource msg(&memBuf_p);
// Sets up the parser
static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
DOMImplementation *impl_p =
DOMImplementationRegistry::getDOMImplementation(gLS);
assert(impl_p);
parser_p = ((DOMImplementationLS*)impl_p)->createDOMBuilder(
DOMImplementationLS::MODE_SYNCHRONOUS,
0);
assert(parser_p);
parser_p->setFeature(XMLUni::fgDOMNamespaces, false);
parser_p->setFeature(XMLUni::fgXercesSchema, false);
parser_p->setFeature(XMLUni::fgXercesSchemaFullChecking,
false);
parser_p->setFeature(XMLUni::fgDOMValidateIfSchema, true);
parser_p->setFeature(XMLUni::fgDOMDatatypeNormalization,
true);
// Pointer to the temporary xml strucutre
DOMDocument* tempDoc_p = NULL;
try
{
parser_p->resetDocumentPool();
tempDoc_p = parser_p->parse(msg);
}
catch (const XMLException& e)
{
......
}
catch (...)
{
...............
}
// Root node of the temporal document.
DOMNode* tempRootNode_p =
(DOMNode*)tempDoc_p->getDocumentElement();
assert(tempRootNode_p);
XMLCh *tempDoc = theSerializer_p->writeToString(*tempDoc_p);
assert(tempDoc);
char *output_tempDoc = XMLString::transcode(tempDoc);
outfile<<output_tempDoc<<endl;
parser_p->release();
}
I am using xerces 2.2.0. As I understand from the documents, this
version supports internationalization. But not use why it is not able
to parse. I tried both UTF-8 and UTF-16 encoding, it doesn't help.
Any pointers on how to solve this problem will be of great help.
Best Regards
Prakash