javax.xml.transform.Transformer and HTML entities

A

Aéris

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I have a problem with Java XML Transformer escaping.

I use Transformer to create a HTML file from an DOM Document.
But in generated HTML, all « & » on text nodes in the document, which
are parts of already escaped HTML entities like «   », are
re-escaped by Transformer.

See this sample : http://pastebin.com/LfGpWMai
Instead of expected
<div>&mdash;</div>
I get
<div>&amp;mdash;</div>

I search on doc and Google, but nothing found to disable escaping.
Is there anybody to help me ?

Thanks

- --
Aeris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOlMm6AAoJEK8zQvxDY4P9S5gIAJ9deHSrFhHAnbxgyhCHHRYB
sVSUx7G2Wr1CpkM0SMRxhAvzKy09yONqeXaByuTRWwrPzKGRXHoKXTN9hC0jb04C
QrBKKZq0SSut3KbAcSgaOY2eCHSyPeI6vrQMLyanGUVpvr9J7kzZ7rp7CS2Z+bcY
9HIOdo93wwvzzRZvdAIaLc3VrkUa4TebXEb+j5QULwlmUnPuRpEEdCCfIJBg2Vmq
1tYL2XkKUA+xiW5sLK3VVhKskNhlWYop9J2IfoNdg5zS5wQsNNk5Z7KEtDcPoie5
zUftWJS6j8rvEuhpuDYXezFDVqAdgyQ8gpxnMyUELVOC41YV8oQuByJNjswUMks=
=0LA3
-----END PGP SIGNATURE-----
 
A

Arne Vajhøj

I use Transformer to create a HTML file from an DOM Document.
But in generated HTML, all «& » on text nodes in the document, which
are parts of already escaped HTML entities like «&nbsp; », are
re-escaped by Transformer.

See this sample : http://pastebin.com/LfGpWMai
Instead of expected
<div>&mdash;</div>
I get
<div>&amp;mdash;</div>

I search on doc and Google, but nothing found to disable escaping.
Is there anybody to help me ?

The code does exactly what it is supposed to do.

document.createTextNode("&mdash;")

creates a text node with those 7 characters.

Try:

document.createTextNode("\u2014")

Arne
 
J

Jeff Higgins

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I have a problem with Java XML Transformer escaping.

I use Transformer to create a HTML file from an DOM Document.
But in generated HTML, all «& » on text nodes in the document, which
are parts of already escaped HTML entities like «&nbsp; », are
re-escaped by Transformer.

See this sample : http://pastebin.com/LfGpWMai
Instead of expected
<div>&mdash;</div>
I get
<div>&amp;mdash;</div>

I search on doc and Google, but nothing found to disable escaping.
Is there anybody to help me ?

Thanks

I'm sorry I cannot help. I only comment that I am experiencing
the opposite problem with javax.xml.stream.EventReader. Either I
haven't figured out how to configure the reader or haven't grokked
the XML. Best of luck.
 
M

markspace

I use Transformer to create a HTML file from an DOM Document.
But in generated HTML, all «& » on text nodes in the document, which
are parts of already escaped HTML entities like «&nbsp; », are
re-escaped by Transformer.

See this sample : http://pastebin.com/LfGpWMai
Instead of expected
<div>&mdash;</div>
I get
<div>&amp;mdash;</div>


I tried this:


final Writer out = new StringWriter();
final Source in = new StreamSource(
new StringReader( "<test><div>&mdash;</div></test>") );

transformer.transform( in, new StreamResult( out ) );
System.out.println( out );

And got an error:

[Fatal Error] :1:19: The entity "mdash" was referenced, but not declared.
ERROR: 'The entity "mdash" was referenced, but not declared.'

So it's been a rather long while since I played with XSLT, but it seems
to me that it might be your document builder that is protecting you, and
the XSLT is just spitting out what it gets in. I forget though how to
get XSLT to recognize the HTML entities though. Search Google might
offer some clues.
 
A

Arne Vajhøj

I use Transformer to create a HTML file from an DOM Document.
But in generated HTML, all «& » on text nodes in the document, which
are parts of already escaped HTML entities like «&nbsp; », are
re-escaped by Transformer.

See this sample : http://pastebin.com/LfGpWMai
Instead of expected
<div>&mdash;</div>
I get
<div>&amp;mdash;</div>

I tried this:

final Writer out = new StringWriter();
final Source in = new StreamSource(
new StringReader( "<test><div>&mdash;</div></test>") );

transformer.transform( in, new StreamResult( out ) );
System.out.println( out );

And got an error:

[Fatal Error] :1:19: The entity "mdash" was referenced, but not declared.
ERROR: 'The entity "mdash" was referenced, but not declared.'

So it's been a rather long while since I played with XSLT, but it seems
to me that it might be your document builder that is protecting you, and
the XSLT is just spitting out what it gets in. I forget though how to
get XSLT to recognize the HTML entities though. Search Google might
offer some clues.

Parsing XML file/string and building a DOM document
are somewhat different.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top