converting org.w3c.dom.Element to String *without* losing whitespace

A

Adam Funk

I have a web-service client that gets an org.w3c.dom.Element out of
the service's response. (The service contains an application that
produces an XML document and returns its root element as the "answer"
in the response.) I need to turn that Element into a String to
display in the client GUI and to save as a file on the client machine.

This Element has the xml:space="preserve" attribute (added by the
application in the service).

So far I have tried two things:

javax.xml.transform.Source domSource = new DOMSource(xmlElement);
StringWriter sw = new StringWriter();
javax.xml.transform.stream.StreamResult streamResult = new StreamResult(sw);
javax.xml.transform.Transformer identityTransform = transformerFactory.newTransformer();
identityTransform.transform(domSource, streamResult);
return sw.toString();

org.jdom.output.XMLOutputter outputter = new XMLOutputter(Format.getRawFormat());
org.jdom.input.DOMBuilder builder = new DOMBuilder();
return outputter.outputString(builder.build(xmlElement));

Both approaches delete a lot of whitespace inside the root element,
but I don't want this to happen. (The root element still has the
xml:space="preserve" attribute.)

I've had a look around on the web and found only pages telling you how
to get rid of whitespace, but not how to force it to stay. This one
[1] says that preserving whitespace is the default anyway.

I'd appreciate any debugging suggestions or alternative approaches.

Thanks,
Adam


[1]
http://www.xml.com/pub/a/2001/11/07/whitespace.html
 
J

Joe Kesselman

I haven't used the jdom stuff -- I've always considered the arguments in
its favor to be pretty much without merit. But the identity transform
*should* be preserving whitespace everywhere except, possibly, in the
area _preceding_ the root element; I don't see any immediately obvious
problems with that code.

If you're using the XSLT processor that ships with the Sun JVM, that may
be a fairly ancient version of Xalan, with some known bugs. So the first
thing I'd try would be to upgrade to a current copy of Xalan-j, from
Apache, and see if the problem persists.
 
A

Adam Funk

So far I have tried two things:

javax.xml.transform.Source domSource = new DOMSource(xmlElement);
StringWriter sw = new StringWriter();
javax.xml.transform.stream.StreamResult streamResult = new StreamResult(sw);
javax.xml.transform.Transformer identityTransform = transformerFactory.newTransformer();
identityTransform.transform(domSource, streamResult);
return sw.toString();

org.jdom.output.XMLOutputter outputter = new XMLOutputter(Format.getRawFormat());
org.jdom.input.DOMBuilder builder = new DOMBuilder();
return outputter.outputString(builder.build(xmlElement));

Both approaches delete a lot of whitespace inside the root element,
but I don't want this to happen. (The root element still has the
xml:space="preserve" attribute.)

I've had a look around on the web and found only pages telling you how
to get rid of whitespace, but not how to force it to stay. This one
[1] says that preserving whitespace is the default anyway.

Since my OP, I've also tried this, with the same result:

org.apache.xml.serialize.OutputFormat format = new OutputFormat();
StringWriter sw = new StringWriter ();
org.apache.xml.serialize.XMLSerializer serial = new XMLSerializer (sw, format);
serial.serialize(x);
return sw.toString();
 
A

Adam Funk

I have a web-service client that gets an org.w3c.dom.Element out of
the service's response. (The service contains an application that
produces an XML document and returns its root element as the "answer"
in the response.) I need to turn that Element into a String to
display in the client GUI and to save as a file on the client machine.

This Element has the xml:space="preserve" attribute (added by the
application in the service).

So far I have tried two things: ....
Both approaches delete a lot of whitespace inside the root element,
but I don't want this to happen. (The root element still has the
xml:space="preserve" attribute.)

Just to be sure, I've tested this using the TCPMonitor proxy in axis
1.4. The SOAP response from the server definitely includes the
desired whitespace. The problem in particular is with spaces between
elements inside an element that is supposed to consist of PCDATA and
elements, for example:

#v+
<TextWithNodes><Node id="0"/> <Node id="1"/>Internationalisation<Node id="21"/> <Node id="22"/>vertical<Node id="30"/> <Node id="31"/>stream<Node id="37"/>:<Node id="38"/> <Node id="39"/>INT<Node id="42"/> <Node id="43"/>VS<Node id="45"/> <Node id="46"/>UPDATE<Node id="52"/>
#v-

But my WS client is displaying and saving the XML as follows:

#v+
<TextWithNodes><Node id="0"/><Node id="1"/>Internationalisation<Node id="21"/><Node id="22"/>vertical<Node id="30"/><Node id="31"/>stream<Node id="37"/>:<Node id="38"/><Node id="39"/>INT<Node id="42"/><Node id="43"/>VS<Node id="45"/><Node id="46"/>UPDATE<Node id="52"/>
#v-
 
A

Adam Funk

I haven't used the jdom stuff -- I've always considered the arguments in
its favor to be pretty much without merit. But the identity transform
*should* be preserving whitespace everywhere except, possibly, in the
area _preceding_ the root element; I don't see any immediately obvious
problems with that code.

Do you know of any other ways to turn an org.w3c.dom.Element into a
String and a File? (I'm only aware of the three I've tried so far.)

If you're using the XSLT processor that ships with the Sun JVM, that may
be a fairly ancient version of Xalan, with some known bugs. So the first
thing I'd try would be to upgrade to a current copy of Xalan-j, from
Apache, and see if the problem persists.

Thanks, that makes sense. But now I've thrown in xalan-2.7.1.jar,
serializer-2.7.1.jar, and the most up-to-date xercesImpl.jar (the one
I was using was not the latest), but I'm still getting the lost space.
Do you have any other ideas?
 
M

Martin Honnen

Adam said:
#v+
<TextWithNodes><Node id="0"/> <Node id="1"/>Internationalisation<Node id="21"/> <Node id="22"/>vertical<Node id="30"/> <Node id="31"/>stream<Node id="37"/>:<Node id="38"/> <Node id="39"/>INT<Node id="42"/> <Node id="43"/>VS<Node id="45"/> <Node id="46"/>UPDATE<Node id="52"/>
#v-

But my WS client is displaying and saving the XML as follows:

#v+
<TextWithNodes><Node id="0"/><Node id="1"/>Internationalisation<Node id="21"/><Node id="22"/>vertical<Node id="30"/><Node id="31"/>stream<Node id="37"/>:<Node id="38"/><Node id="39"/>INT<Node id="42"/><Node id="43"/>VS<Node id="45"/><Node id="46"/>UPDATE<Node id="52"/>
#v-

Are you sure the whitespace shown above is present as text nodes in the
DOM model you have? I don't think there is anything wrong with the code
you use to serialize, it is more likely that you don't serialize a DOM
having that whitespace present as text nodes.
 
A

Adam Funk

Are you sure the whitespace shown above is present as text nodes in the
DOM model you have?

How would I verify that? I know that the XML coming back over HTTP
has the correct whitespace, but I don't know how to check the
org.w3c.dom.Element inside the client --- except by serializing it.

I don't think there is anything wrong with the code
you use to serialize, it is more likely that you don't serialize a DOM
having that whitespace present as text nodes.

Hmm. The application inside the service creates the XML and
serializes it to String on the fly using StAX --- is there a good way
to reverse the process?
 
M

Martin Honnen

Adam said:
How would I verify that? I know that the XML coming back over HTTP
has the correct whitespace, but I don't know how to check the
org.w3c.dom.Element inside the client --- except by serializing it.

Look at the child nodes your DOM Element has e.g.
NodeList children = element.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
{
System.out.println(children.item(i).getNodeType());
}
Element nodes have node type 1, text nodes 3 I think.
 
A

Adam Funk

Look at the child nodes your DOM Element has e.g.
NodeList children = element.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
{
System.out.println(children.item(i).getNodeType());
}
Element nodes have node type 1, text nodes 3 I think.

Bingo! I did something like that (well, a bit more complicated) to
iterate through the gubbins of the org.w3c.dom.Element and print bits
out: the text nodes with whitespace are missing in there. Thanks very
much for that good piece of debugging advice.

The problem must be in the JAXB stuff that reads the SOAP response
(which has the whitespace) and produces the Element. I'm not sure how
to fix that, but at least this narrows the problem down quite a lot.
 
J

Joe Kesselman

Thanks, that makes sense. But now I've thrown in xalan-2.7.1.jar,
serializer-2.7.1.jar, and the most up-to-date xercesImpl.jar (the one
I was using was not the latest), but I'm still getting the lost space.
Do you have any other ideas?

I still find this surprising.

If you have a Level 3 DOM implementation, the DOM itself may support the
optional load and save operations. I believe Xerces does, though they
actually run through Xalan's serializer these days so whatever's hitting
you might not be cured. Worth a try...

http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/

If that too fails for you, then I'd start to suspect that the problem
isn't where you think it is.

I haven't had time to run a sanity-check on my own machine, to see if
there's something obvious you might be missing. I'll try to do so this week.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
J

Joe Kesselman

Adam said:
Bingo! I did something like that (well, a bit more complicated) to
iterate through the gubbins of the org.w3c.dom.Element and print bits
out: the text nodes with whitespace are missing in there. Thanks very
much for that good piece of debugging advice.

Much more believable than that multiple serializers were all wrong.
<smile/> It's always worth either checking the DOM tree, or building
your own DOM (local parser, or by using the DOM API) and checking
whether that goes through cleanly.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
A

Adam Funk

Much more believable than that multiple serializers were all wrong.
<smile/>

When the third one sank into the swamp, I did start to wonder...
It's always worth either checking the DOM tree, or building
your own DOM (local parser, or by using the DOM API) and checking
whether that goes through cleanly.

Now I just have to figure out how to make the JAXB library behave
.... watch this space.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top