Document.importNode(Node,boolean) - what supports it?

S

Simon Brooke

The DOM API has included public Node importNode(Node,boolean) as a method
of the Document interface for a long time. Does anything actually
implement it? Xerces 2 is giving me:

org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
support the requested type of object or operation.
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator.java:183)

This is so whether the node I'm trying to import is an
org.apache.xerces.dom.DeferredElementImpl (i.e. parsed with Xerces) or a
org.apache.crimson.tree.ElementNode (i.e. parsed with Crimson).

--
(e-mail address removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/
Ye hypocrites! are these your pranks? To murder men and give God thanks?
Desist, for shame! Proceed no further: God won't accept your thanks for
murther
-- Robert Burns, 'Thanksgiving For a National Victory'
 
J

Joe Kesselman

Simon said:
The DOM API has included public Node importNode(Node,boolean) as a method
of the Document interface for a long time. Does anything actually
implement it?

Certainly should work; I wrote Xerces' first implementation of that
function, and in fact was one of those who lobbied the DOM WG to include
it in the standard. If the node being imported properly implements the
DOM APIs, and the implementation being imported into doesn't have some
reason for blocking this (eg, that it's specifically a read-only DOM,
such as the DOM view of Xalan's internal data model), the function
should work. It isn't rocket science, after all; it's just a tree-walker
feeding a tree-builder.

I have to believe the problem resides in something you haven't told us.
 
S

Simon Brooke

Joe Kesselman said:
Certainly should work; I wrote Xerces' first implementation of that
function, and in fact was one of those who lobbied the DOM WG to include
it in the standard. If the node being imported properly implements the
DOM APIs, and the implementation being imported into doesn't have some
reason for blocking this (eg, that it's specifically a read-only DOM,
such as the DOM view of Xalan's internal data model), the function
should work. It isn't rocket science, after all; it's just a tree-walker
feeding a tree-builder.

I have to believe the problem resides in something you haven't told us.

OK, then I have to believe that, too. Furthermore, this is another of the
bits of my code that have been around for a long time (since 2003 in this
case), and I'm sure it used to work (but it may only ever have worked with
Crimson). I have had occasions in the past where I have inadvertently
depended on bugs in a library, and when that library has been fixed all my
code broke.

If this class fails, it returns a text node with a 'flat' representation of
the embedded markup. Looking at the production server logs I see that it
has been intermittently failing in this way for some time, but that the
failure simply has not been noticed. The failure on the production servers
is different from the failure on the development server, I'll detail that
difference below. The production severs use Crimson to parse, but Xerces
to construct documents - I can't remember why, but probably just an
oversight.

The class in question is:

//***********************************************************************\
// *
// MaybeParseGenerator.java *
// *
// Author: Simon Brooke *
// Created: 17th January 2003 *
// $Revision: 1.7.4.3 $; $Date: 2006/09/04 13:45:54 $ *
// *
//***********************************************************************/
package uk.co.weft.domutil;

import org.w3c.dom.Document;
import org.w3c.dom.Node;

import org.xml.sax.InputSource;

import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;

import uk.co.weft.htform.ResourceConsumerImpl;


/*
* $Log: MaybeParseGenerator.java,v $
* Revision 1.7.4.3 2006/09/04 13:45:54 simon
* Added more debugging output. Have an intermittent bug in PRES which may
originate here.
*
* Revision 1.7.4.2 2005/12/30 16:54:00 simon
* EkitWidget now working remarkably well. Still some tidying up to do.
*
* Revision 1.7.4.1 2005/12/23 10:48:33 simon
* Brute force tidy up after CVS server crash: this time it should work.
*
* Revision 1.7 2005/02/05 17:40:17 simon
* Improved diagnostics on failure
*
* Revision 1.6 2004/07/14 12:52:34 simon
* Final commit for 1.10.0
*
* Revision 1.5 2004/06/17 15:10:38 simon
* Extends ResourceConsumerImpl to gain access to grs, etc
*
* Revision 1.4 2003/10/30 12:40:21 simon
* Added debug flag in domutil classes
*
* Revision 1.3 2003/08/20 09:38:35 simon
* Code cleanup with eclipse; mostly removal of exccessive includes
*
* Revision 1.2 2003/07/09 09:32:07 simon
* Initial work on HTML generation of widgets.
*
* Revision 1.1 2003/02/06 11:22:26 simon
* New superclass for node generators which may want to parse XML text.
*/

/**
* Abstract superclass for TextNodeGenerator and ElementGenerator, which
may
* want to parse their content. Parsing is potentially expensive, so if
* you're confident the value won't contain XML markup it may be worth
* setting allowEmbeddeMarkup( false).
*
* @author Simon Brooke
* @version $Revision: 1.7.4.3 $ This revision: $Author: simon $
*/
public abstract class MaybeParseGenerator extends ResourceConsumerImpl
{
//~ Instance fields -----------------------------------------------------

/**
* whether or not I'm in debug mode; if I am I may print debugging
* messages to System.err
*/
protected boolean debug = false;

/** By default we allow embedded markup in children */
protected boolean embeddedMarkup = true;

//~ Constructors --------------------------------------------------------

/**
* Creates a new MaybeParseGenerator object.
*/
public MaybeParseGenerator( )
{
// ...nothing...
}

//~ Methods -------------------------------------------------------------

/**
* whether or not to set debugging mode. If true, the generator _may_
* write debugging messages to System.err
*
* @param debug whether or not to set debugging mode
*
* @since Jacquard 1.10
*/
public void setDebug( boolean debug )
{
this.debug = debug;
}

/**
* Do we allow (and parse for) embedded markup within the value of this
* node? default is we do.
*
* @param allow if true, then allow embedded markup within my value
*/
public void allowEmbeddedMarkup( boolean allow )
{
embeddedMarkup = allow;
}

/**
* Construct a node representing this value. It's perfectly possible (and
* possibly legitimate) that the value of a child should contain embedded
* markup. If so, try to parse a node out of it.
*
* @param doc the document in which the node is to be created
* @param unparsed the string, possibly with embedded markup, to parse
*
* @exception GenerationException if parsing fails
*/
protected Node maybeParse( Document doc, String unparsed )
throws GenerationException
{
Node val = doc.createTextNode( unparsed ); // safe default

if ( debug )
{
System.err.println( "MaybeParseGenerator.maybeParse: parsing [" +
unparsed + "]" );
}

if ( unparsed != null ) // defensive
{
if ( embeddedMarkup && (
// if we allow embedded markup
unparsed.indexOf( "<" ) > -1 ) ) // it looks like markup
{
if ( !unparsed.trim( ).startsWith( "<" ) )
{
// nasty: if it contains markup, but
// isn't contained in markup, the
// parser will barf.
unparsed = "<parsed>" + unparsed + "</parsed>";
}

try
{
DocumentBuilder parser = DOMStub.getParser( );

if ( parser == null )
{
System.err.println( "Could not initialise XML parser" );
}

InputSource i =
new InputSource( new StringReader( unparsed ) );

// i.setCharacterStream( new StringReader( unparsed ) );
Document parsed = parser.parse( i );

if ( debug )
{
System.err.println( "Parsed document: " +
parsed.toString( ) );

if ( parsed != null )
{
Node root = parsed.getDocumentElement( );

if ( root != null )
{
System.err.println( "Root node: (" +
root.getClass( ).getName( ) + "): " +
root.toString( ) );
}
}
}

val = doc.importNode( parsed, true );

if ( debug )
{
System.err.println(
"MaybeParseGenerator.maybeParse: parse successful" );
new Printer( ).print( val, System.err );
}
}
catch ( Exception e )
{
System.err.println(
"MaybeParseGenerator.maybeParse(): Could not parse '" +
unparsed + "'as XML" );
e.printStackTrace( System.err );
}
}
}

return val;
}
}

/* [end of file] */


What I'm getting in the error stream on the development server is (with
parser unconfigured, i.e. using Tomcat's default, which is Xerces; see
below for Crimson):

ElementGenerator.generate: attempting to parse <div class="Intro">
Here be dragons!
</div>
MaybeParseGenerator.maybeParse: parsing [<div class="Intro">
Here be dragons!
</div>]
Parsed document: [#document: null]
Root node: (org.apache.xerces.dom.DeferredElementImpl): [div: null]
MaybeParseGenerator.maybeParse(): Could not parse '<div class="Intro">
Here be dragons!
</div>'as XML
org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
support the requested type of object or operation.
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator.java:183)


(with parser configured as org.apache.crimson.tree.DOMImplementationImpl):

ElementGenerator.generate: attempting to parse <div class="Intro">
Here be dragons!
</div>
MaybeParseGenerator.maybeParse: parsing [<div class="Intro">
Here be dragons!
</div>]
Parsed document: org.apache.crimson.tree.XmlDocument@e9a0e9a
Root node: <div class="Intro">
Here be dragons!
</div>
MaybeParseGenerator.maybeParse(): Could not parse '<div class="Intro">
Here be dragons!
</div>'as XML
org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
support the requested type of object or operation.
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator.java:173)


What's showing up in the production server logs is:
(Firstly, evidence that it sometimes does work):
ElementGenerator.generate: attempting to parse <div
class="Introduction"><p>Copies of documentation issued to licensees is
available in this section.</p></div>
ElementGenerator.generate: attempting to parse Cockle Bags - further
information


(Secondly, evidence that it sometimes doesn't):
ElementGenerator.generate: attempting to parse <div class="Introduction">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>
MayberParseGenerator.maybeParse(): Could not parse '<div
class="Introduction">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>'as XML
java.lang.NullPointerException
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator
..java:163)

I've checked the libraries and the two instances above use the same
versions of the same libraries with the same configuration, so why

<div class="Introduction"><p>Copies of documentation issued to licensees is
available in this section.</p></div>

parses successfully and

<div class="Introduction">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>

fails to parse is frankly baffling me.
 
J

Joe Kesselman

Just a quick observation: Your "sometimes works" and "sometimes doesn't"
are significantly different:
(Firstly, evidence that it sometimes does work):
ElementGenerator.generate: attempting to parse <div
class="Introduction"><p>Copies of documentation issued to licensees is
available in this section.</p></div>

(Secondly, evidence that it sometimes doesn't):
ElementGenerator.generate: attempting to parse <div class="Introduction">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>

<div> contains only text. Haven't looked at the code yet, but are you
sure you aren't doing something simple like trying to import the string
value rather than a TextNode object?
 
J

Joe Kesselman

Also: You didn't show us the implementation of DOMStub... but with that
name, I wouldn't be at all surprised if you've got a subset
implementation there.
 
J

Joe Kesselman

Well, I've reproduced the error message under Eclipse. Lemme see if I
can reproduce it with a current version of Xerces...
 
B

Bjoern Hoehrmann

* Joe Kesselman wrote in comp.text.xml:
You're attempting to import a Document node. That's forbidden. Import
its root element instead.

Heh, I actually had a quick look into the Xerces source code when I
looked at the question, but that case was the only where the specific
claimed exception would be raised, and Simon said he tried to import
element nodes, so I concluded the issue is too weird to investigate
further...
 
S

Simon Brooke

Joe Kesselman said:
Oh. That's stupid; I should have remembered this:

http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#Core-Document-importNode

You're attempting to import a Document node. That's forbidden. Import
its root element instead.

Yes, the error message could have been more helpful. I'd suggest posting
that as a suggestion on the Xerces users mailing list, since I'm not
sure any of the current Xerces maintainers are reading this list.

Thank you. I was going to say indignantly 'oh no I don't', but on reading
through my code I see I get the root node of the document... and then
don't use it. Having fixed that, /this/ problem is solved, and I can now
replace vintage Crimson with current Xerces and my code still works.

Still can't get it to work with current Xalan, but that's another set of
problems...
 
S

Simon Brooke

Joe Kesselman said:
Also: You didn't show us the implementation of DOMStub... but with that
name, I wouldn't be at all surprised if you've got a subset
implementation there.

No, it just allows me to select and configure the DOMImplementation I use:

/**
* Should be called before DOMStub is used, but perfectly safe to call
* more than once. If I've already been initialised, don't intialise me
* again.
*
* @param config my configuration
*
* @exception InitialisationException if requested DOM implementation
* can't be found
*/
public static void init( Context config ) throws InitialisationException
{
String s = config.getValueAsString( "dom_implementation_class" );

if ( domImp == null )
{
/* i.e., I have not already been initialised */
try
{
if ( s != null )
{
domImpName = s;
}

domImp =
(DOMImplementation) Class.forName( domImpName )
.newInstance( );
}
catch ( Exception any )
{
throw new InitialisationException( "Could not find DOM " +
"implementation " + domImpName );
}
}

Boolean b = config.getValueAsBoolean( "dom_coalescing" );

if ( b != null )
{
dbf.setCoalescing( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_expand_entity_references" );

if ( b != null )
{
dbf.setExpandEntityReferences( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_ignore_comments" );

if ( b != null )
{
dbf.setIgnoringComments( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_ignore_whitespace" );

if ( b != null )
{
dbf.setIgnoringElementContentWhitespace( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_namespace_aware" );

if ( b != null )
{
dbf.setNamespaceAware( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_validating" );

if ( b != null )
{
dbf.setValidating( b.booleanValue( ) );
}
}
}
 
J

Joe Kesselman

Bjoern said:
Simon said he tried to import element nodes, so I concluded the issue is too weird to
> investigate further...

This is why it's often helpful to post a minimal example that provokes
the problem. In fact, the process of extracting code and writing that
reduced example is often enough to expose the problem.

I must admit I cheated -- I tossed the code into a debugger, did some
cleanup so it could actually be run, added the Xerces source (so I could
see what was happening inside that), set the classpaths to use this copy
of Xerces rather than the one in the Java libraries, set it to stop when
a DOMException was about to be thrown, and just pushed the button.
Bingo; there we are at the error, and the object in question is indeed a
Document.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Thread overhead 7

Members online

Forum statistics

Threads
474,008
Messages
2,570,268
Members
46,867
Latest member
Lonny Petersen

Latest Threads

Top