Joe Kesselman said:
If you set the output to XML -- or HTML -- problematic characters
should get converted back to entity (or numeric-character-reference)
representation. If you set text output mode, you're on your own.
Want to provide an example that demonstrates the problem you're
seeing?
Well, the following is not the real case but a pretty close example
of it:
Here is the original XML file from which two elements (releaseStatement
and releaseCode) will be extracted and the element names changed in the
process:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="SVCM_Transform.xsl"?>
<svcm xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="SVCMEdition_V1.0.xsd">
<edition>
<editionid>E12A60Z_20060910</editionid>
<editionType>New</editionType>
<editionNumber>11-00</editionNumber>
</edition>
<publication>
<productid>E12A60Z</productid>
<productType>00-14</productType>
<title>Service Manual</title>
<issueLevel>17</issueLevel>
<issued>2006-09-10</issued>
<available>2006-09-10</available>
<releaseStatement>No effort was spared to present accurate information
in this
Service Manual at the time of issue. Any errors & updates
that could
not wait till the next issue will be published in errata and
periodic
bulletins distributed to authorized
dealers.</releaseStatement>
<releaseCode scheme="DLR-1">Dealer Distribution</releaseCode>
<updateFreq scheme="YR">2</updateFreq>
</publication>
</svcm>
Here is the XSLT stylesheet I wrote to do the transform (I am pretty new
to XSLT, BTW):
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="
http://www.w3.org/1999/XSL/Transform">
<xsl
utput omit-xml-declaration="yes" indent="no" />
<xsl:template match="/">
<xsl:text><RELEASE_INFO></xsl:text>
<xsl:apply-templates select="*/publication/releaseStatement" />
<xsl:apply-templates select="*/publication/releaseCode" />
<xsl:text></RELEASE_INFO></xsl:text>
</xsl:template>
<xsl:template match="releaseStatement">
<RELEASE_STATEMENT>
<xsl:value-of select="."/>
</RELEASE_STATEMENT>
</xsl:template>
<xsl:template match="releaseCode">
<xsl:text><RELEASE_CODE SCHEME="</xsl:text>
<xsl:value-of select="@scheme"/>
<xsl:text>"></xsl:text>
<xsl:value-of select="."/></RELEASE_CODE>
</xsl:template>
</xsl:stylesheet>
The output from the XSLT process is this:
<RELEASE_INFO>
<RELEASE_STATEMENT>
No effort was spared to present accurate information in this
Service Manual at the time of issue. Any errors
& updates that could
not wait till the next issue will be published
in errata and periodic
bulletins distributed to authorized dealers.
</RELEASE_STATEMENT>
<RELEASE_CODE SCHEME="DLR-1">Dealer Distribution</RELEASE_CODE>
</RELEASE_INFO>
Note the '&' in the RELEASE_STATEMENT.
Unfortunately, putting the method="xml" attribute into xsl
utput is not
much help either
as it screws up some of the tag delimiters as so:
<RELEASE_INFO>
<RELEASE_STATEMENT>
No effort was spared to present accurate information in this
Service Manual at the time of issue. Any errors
& updates that could
not wait till the next issue will be published
in errata and periodic
bulletins distributed to authorized dealers.
</RELEASE_STATEMENT>
<RELEASE_CODE SCHEME="DLR-1">Dealer Distribution</RELEASE_CODE>
</RELEASE_INFO>
Perhaps Pavel is right and this Unix command line xml tool from Oracle
is broken and should try something else.
Rudy