Bekkali Hicham said:
i have already used xalan several times with success, but i have a error
message that i don't understand : :
javax.xml.transform.TransformerException: java.io.UTFData
FormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
This looks like a Java I/O exception that Xalan is just passing along. UTF-8
is an encoding that sometimes refers to multi-byte character sequences (MBCS).
If I recall correctly when the first-byte is 0x80-0x9f (and there's another span of
values in addition to this span) then it's the lead byte of a multi-byte sequence
representing one Unicode character. This allows many commonly occuring
characters to be encoded with one byte while some less frequent chars are
encoded with multiple bytes.
The error message, "Invalid byte 2 of 3-byte UTF-8 sequence" means that
a Java I/O streaming object expected, from the first byte, that this was a 3
byte sequence and when it examined the second byte, it determined that the
second byte was an illegal value (for instance, a value contradicting the first
byte).
What does this mean for you, the programmer?
Two possibilities:
1. There is no encoding attribute in the document's XML declaration, and
Xalan is assuming it is UTF-8 when the document is not UTF-8.
2. The document may have been UTF-8 and was corrupted in transmission
(was it sent over the network?)
If there is no encoding attribute in the document's XML declaration, put one
there. For example, if there are Traditional Chinese (Taiwanese) characters
in the XML document, you might try:
<?xml version="1.0" encoding="Big5" ?>
if they are Simplified Chinese, then try GB2312, if it's Japanese, try JIS.
etc. When Xalan reads one of these encodings, I think Xerces will transcode
them to Unicode, or at least use a non-UTF8 streaming source.
If one or more bytes of the document were corrupted, you may be able to
simply edit the document and look for any glyphs that look out-of-place
at the point in the document where the error occured.
HTH,
Derek Harmon