U
User
I am trying to use JDOM's SAXBuilder to parse an XML document that contains
encoded latin-1 characters. After I parse the document, the special
character Strings seem to be replaced with their unicode characters (e.g.,
the String "®" is replaced with a character that has a decimal value of
174); I was expecting that the SAXBuilder would preserve the String
"®". Is it possible to instruct the SAX parser to preserve the special
character encodings?
The following is sample code that illustrates the issue that I am observing:
import java.io.ByteArrayInputStream;
import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
public class TestProductBuilder {
public static void main(String[] args) {
ByteArrayInputStream bis = null;
try {
String product = "<?xml version=\"1.0\"?>" +
"<product>" +
" <name>My Product ®</name>" +
"</product>";
bis = new ByteArrayInputStream(product.getBytes());
SAXBuilder builder = new SAXBuilder(false);
Document productDoc = builder.build(bis);
XMLOutputter outputter = new XMLOutputter("\t", true);
String productFromSAXBuilder = outputter.outputString(productDoc));
} catch (Exception e) {
System.err.println(e.getMessage());
} finally {
if (bis != null) { try { bis.close(); } catch (Exception e) {}}
}
}
}
The following is the value for "productFromSAXBuilder":
<?xml version="1.0" encoding="UTF-8"?>
<product>
<name>My Product ®</name>
</product>
encoded latin-1 characters. After I parse the document, the special
character Strings seem to be replaced with their unicode characters (e.g.,
the String "®" is replaced with a character that has a decimal value of
174); I was expecting that the SAXBuilder would preserve the String
"®". Is it possible to instruct the SAX parser to preserve the special
character encodings?
The following is sample code that illustrates the issue that I am observing:
import java.io.ByteArrayInputStream;
import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
public class TestProductBuilder {
public static void main(String[] args) {
ByteArrayInputStream bis = null;
try {
String product = "<?xml version=\"1.0\"?>" +
"<product>" +
" <name>My Product ®</name>" +
"</product>";
bis = new ByteArrayInputStream(product.getBytes());
SAXBuilder builder = new SAXBuilder(false);
Document productDoc = builder.build(bis);
XMLOutputter outputter = new XMLOutputter("\t", true);
String productFromSAXBuilder = outputter.outputString(productDoc));
} catch (Exception e) {
System.err.println(e.getMessage());
} finally {
if (bis != null) { try { bis.close(); } catch (Exception e) {}}
}
}
}
The following is the value for "productFromSAXBuilder":
<?xml version="1.0" encoding="UTF-8"?>
<product>
<name>My Product ®</name>
</product>