SaxParseException mapping from column/line number

K

KN

Am have a problem with a SAX parser - but am attempting to recover by
proe-processing
the input byte array after each exception - am parsing using the code
below ...

org.apache.commons.httpclient.HttpMethod method = new
GetMethod("http://someurl");

org.apache.commons.httpclient.HttpClient client = new HttpClient();
int status = client.executeMethod(method);


final byte[] responseBody = method.getResponseBody();

com.sun.org.apache.xerces.internal.parsers.DOMParser domParser = new
DOMParser()
domParser.parse(new InputSource(new
ByteArrayInputStream(responseBody)));


- which gives me a SaxParseExceptions with the message
"Invalid byte 2 of 3-byte UTF-8 sequence"

and also gives me the row & column the error occurred. I am totally
confused by this
as there seems to be no way of mapping this to the original input
byte[]. The only way of
so is to write the array to disc and access the file. Does anyone
have ay other ideas ?

Regards
 
J

Joe Kesselman

KN said:
"Invalid byte 2 of 3-byte UTF-8 sequence"

UTF-8 "characters" can be up to three bytes long, to handle the full
Unicode range. Something in your file is not legal UTF-8. Either tell
the system what encoding your file was actually written in, or fix the
file so it's legal UTF-8.

The troublesome sequence will be something involving bytes over
the ASCII range (ie, bytes with their high bit set). Scanning for that
may be the fastest way to find the problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,250
Members
46,848
Latest member
Graciela Mitchell

Latest Threads

Top