DOM2 API (Java): how to get namespace declarations?

S

Simon Brooke

I was debugging a new XML generator tonight and trying to determine why
it wasn't working; and realised my dom printer does not output XML
namespace declarations.

My method to output an Element is as follows:

/**
* Print an element node, and, by recursive descent, it's children
*
* @param node the node to print
* @param out the stream to print it on
* @param url the base URL to use in expanding relative URLs
* @param level the indentation level if pretty printing
*/
protected void print( Element node, PrintStream out, URL url,
int level )
throws IOException
{
indent( out, level );
out.print( '<' );

String tagname = node.getNodeName( );
out.print( tagname );

NamedNodeMap attrs = node.getAttributes( );
NodeList children = node.getChildNodes( );

/**
* Get the attributes of the node and print their values.
*/
for ( int i = 0; i < attrs.getLength( ); i++ )
{
print( ( (Attr) attrs.item( i ) ), out, url, level + 1 );
}

if ( ( children != null ) && ( children.getLength( ) > 0 ) )
{ // it's a non-empty tag
out.print( '>' );

int len = children.getLength( );

for ( int i = 0; i < len; i++ )
{
print( children.item( i ), out, url, level + 1 );
}

/**
* Set the end tag.
*/
indent( out, level );
out.print( '<' );
out.print( '/' );
out.print( tagname );
}
else // it's an empty tag
{
out.print( " /" );
}

out.print( '>' );
}

Performing the exact same XSL transform, the Xerces printer emits:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF version="1.0"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
xmlns:geourl="http://geourl.org/rss/module/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rss version="0.91">
...

whereas my printer emits:

<rdf:RDF version="1.0">
<rss version="0.91">
...

The relevant part of the XSL file reads:

<xsl:template match="category">
<rdf:RDF version="1.0"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:geourl="http://geourl.org/rss/module/"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
<rss version="0.91">
...

Clearly what Xerces is emitting is right and what I am emitting is wrong,
but I'm having trouble seeing what I'm doing wrong. My method to output
an attribute node is as follows:

/**
* Print an attribute node. If url is not null, use it as a base URL
* for expanding URL values.
*
* @param node the node to print
* @param out the stream to print it on
* @param url the base URL to use in expanding relative URLs
* @param level the indentation level if pretty printing
*/
protected void print( Attr node, PrintStream out, URL url,
int level )
throws IOException
{
String delimiter = "\"";
String value = node.getNodeValue( );

if ( value != null )
{
/* As I understand it, you aren't allowed unvalued
* attributes in XML
*/
value = cleanString( value, true );
/* are attribute values allowed to contain *any*
* characters? */

if ( value.indexOf( delimiter ) > -1 )
/* if an attribute has double quotes in it's value, we'll use
* single quotes as the delimiter and vice versa. If it has
* both we're stuffed. */
{
delimiter = "'";
}

indent( out, level );
out.print( " " );
out.print( node.getNodeName( ) );
out.print( "=" );
out.print( delimiter );

/* If this is an attribute whose value
* should be a URL. */
if ( ( node.getNodeName( ).equalsIgnoreCase( "href" ) ||
node.getNodeName( ).equalsIgnoreCase( "link" ) ||
node.getNodeName( ).equalsIgnoreCase( "src" ) ) &&
( url != null ) )
{
/* Change the partial URL to a full URL. */
try
{
String fullURL = new URL( url, value ).toString( );

out.print( fullURL );
}
catch ( MalformedURLException m )
{
// log
m.printStackTrace();
}
}
else
{ /* If I've got a value, clean it and
* print it. */
out.print( value );
}

out.print( delimiter );
}
else
{
System.err.println( "Unvalued attribute: " +
node.getNodeName( ));
}
}

Neither the MalformedURLException nor the string 'Unvalued attribute'
ever appear in the log. From this it seems that neither
Node.getAttributes() nor Node.getChildNodes() return the namespace
declarations. Yet I can't see any other no-args get...() method in the
API. Reading through the Xerces XMLSerializer code makes is seem that
they are finding the namespace declarations among the attributes.

Can anyone see what I'm doing wrong? I appreciate it probably some basic
howler, but I just can't see it myself.
 
J

Joe Kesselman

Simon said:
I was debugging a new XML generator tonight and trying to determine why
it wasn't working; and realised my dom printer does not output XML
namespace declarations.

XML namespace declarations are optional in the DOM, since every node
carries its namespace and bindings can be reconstructed when you
serialize the DOM's contents as XML. The flipside is that it is the
serializer's responsibility to check that the necessary declarations are
present as Attribute nodes, and/or to synthesize those declarations.

The DOM Level 3 spec should have a fairly detailed description of one
algorithm for doing that check and fixup. (I drafted the first version
of that logic, though I think it's been tweaked a bit since then.) I'd
suggest reading that before implementing your own DOM-printer.

Alternatively, you can insist that whoever constructs your DOM take
responsibility for making sure that all the necessary Attribute nodes
exist to declare the namespaces. (Note that they have to be in the
correct namespace themselves...). But it's probably better not to count
on that unless you have full control of both sides of the system.

Note that most DOM implementations these days ship with serializers that
know how to do the right things, so unless you're creating your own DOM
or have unusual formatting requirements it might be simpler to just use
those rather than reimplementing that code. (And of course DOM Level 3
proposes a standard API for that function.)

But doing a recursive-descent DOM printer _is_ a good learning exercise,
so it's probably something you should write at least once. Among other
things, the same tree-walking logic is useful for many other kinds of
DOM processing.
 
S

Simon Brooke

Joe Kesselman said:
XML namespace declarations are optional in the DOM, since every node
carries its namespace and bindings can be reconstructed when you
serialize the DOM's contents as XML. The flipside is that it is the
serializer's responsibility to check that the necessary declarations
are present as Attribute nodes, and/or to synthesize those
declarations.

Thanks very much!
The DOM Level 3 spec should have a fairly detailed description of one
algorithm for doing that check and fixup. (I drafted the first version
of that logic, though I think it's been tweaked a bit since then.) I'd
suggest reading that before implementing your own DOM-printer.

OK, got it.
Note that most DOM implementations these days ship with serializers
that know how to do the right things, so unless you're creating your
own DOM or have unusual formatting requirements it might be simpler to
just use those rather than reimplementing that code. (And of course DOM
Level 3 proposes a standard API for that function.)

Yup. The thing is I wrote my printer back in February 2000 when there
weren't a lot of others around - which makes it surprising that it's
failure to do the right things with namespaces hasn't tripped me up
before. It would probably be more economic now to just make a call to
the DOM3 serialiser API, but as a matter of craftsmanship I'd like to
get mine right.

OK, so: we look at a node and see if it needs a namespace, and if it does
we generate a namespace declaration. Suppose we have a structure

1 <a>
2 <b>
3 <foo:c/>
4 <foo:d/>
5 </b>
6 <bar:e/>
7 </a>

am I right in thinking that it would be correct to attach the 'foo'
namespace declaration at any of nodes c /and/ d, or at node b, or at
node a, and the 'bar' namespace declaration at either node e or node a?

Clearly not duplicating the declaration makes the job of the parser
easier. Is there any good reason not to pre-scan the tree an collect all
of the namespaces used and declare them on the root element of the
document? Looking at the 'algorithms' page it seems that unless two
elements use the same prefix to indicate different namespaces, there
should be no problem in 'shuffling' namespace declaration as high up the
tree as possible.
 
B

Bjoern Hoehrmann

* Simon Brooke wrote in comp.text.xml:
OK, so: we look at a node and see if it needs a namespace, and if it does
we generate a namespace declaration. Suppose we have a structure

1 <a>
2 <b>
3 <foo:c/>
4 <foo:d/>
5 </b>
6 <bar:e/>
7 </a>

am I right in thinking that it would be correct to attach the 'foo'
namespace declaration at any of nodes c /and/ d, or at node b, or at
node a, and the 'bar' namespace declaration at either node e or node a?

xmlns:foo must be in scope of c and d, adding them there would do the
job, as well as adding them to one of the ancestors. Adding them to
a,b,c,d would also be possible, for example, but probably be redundant.
Note that 'foo' might map to different namespace names on different
elements, e.g.

<x>
<y:z xmlns:y='foo' />
<y:z xmlns:y='bar' />
</x>

would also be possible and there might be content that depends on the
prefixes (e.g., XPath expressions in a XSLT document), so if you have

<x some-qname-attribute='y:z' xmlns:y='foo'>
<y:example />
</x>

mapping that to

<x some-qname-attribute='y:z'>
<y:example xmlns:y='foo' />
</x>

might be a bad idea.
Clearly not duplicating the declaration makes the job of the parser
easier. Is there any good reason not to pre-scan the tree an collect all
of the namespaces used and declare them on the root element of the
document? Looking at the 'algorithms' page it seems that unless two
elements use the same prefix to indicate different namespaces, there
should be no problem in 'shuffling' namespace declaration as high up the
tree as possible.

This is true in general, but it would turn a probably incorrect document
like

<x some-qname-attribute='y:z'>
<y:example xmlns:y='foo' />
</x>

into a correct document, which might not be intended. Of course, QNames
in content might not be a concern for your application.
 
S

Simon Brooke

* Simon Brooke wrote in comp.text.xml:

xmlns:foo must be in scope of c and d, adding them there would do the
job, as well as adding them to one of the ancestors. Adding them to
a,b,c,d would also be possible, for example, but probably be redundant.
Note that 'foo' might map to different namespace names on different
elements, e.g.

<x>
<y:z xmlns:y='foo' />
<y:z xmlns:y='bar' />
</x>

would also be possible and there might be content that depends on the
prefixes (e.g., XPath expressions in a XSLT document), so if you have

<x some-qname-attribute='y:z' xmlns:y='foo'>
<y:example />
</x>

mapping that to

<x some-qname-attribute='y:z'>
<y:example xmlns:y='foo' />
</x>

might be a bad idea.


This is true in general, but it would turn a probably incorrect
document like

<x some-qname-attribute='y:z'>
<y:example xmlns:y='foo' />
</x>

into a correct document, which might not be intended. Of course, QNames
in content might not be a concern for your application.

OK, my algorithm at this stage is as follows

if ( responsibleForNamespaceDeclarations )
{
try
{
spaces = recursivelyCollectNamespaces( node );

Enumeration keys = spaces.keys( );

while ( keys.hasMoreElements( ) )
{
String key = keys.nextElement( ).toString( );
printNS( key, spaces.get( key ).toString( ), out,
level + 1 );
}

responsibleForNamespaceDeclarations = false;
}
catch ( NamespaceCollisionException e )
{
String uri = node.getNamespaceURI( );
String prefix = node.getPrefix( );

if ( ( uri != null ) && ( prefix != null ) )
{
printNS( prefix, uri, out, level + 1);
}

System.err.println( "Namespace clash: " + e.getMessage( ) );
}
}
...
for ( int i = 0; i < children.length(); i++ )
{
print( children.item( i ), out, level + 1,
responsibleForNamespaceDeclarations );
}

That is to say, when printing an element node, I do recursive descent to
collect all the namespaces down tree from it. If there is a collision,
then if I have a local namespace to deal with, I deal with that locally,
and leave responsibility for printing namespaces set for the child
nodes. If there is no collision, then I deal with all the down-tree
namespaces and clear the responsibleForNamespaceDeclarations flag.

Can anyone see problems with this? And what do I do about the default
namespace? Will the default namespace have getNamespaceURI() non-null
and getPrefix() null?

--
(e-mail address removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/

The Conservative Party is now dead. The corpse may still be
twitching, but resurrection is not an option - unless Satan
chucks them out of Hell as too objectionable even for him.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,222
Members
46,809
Latest member
moe77

Latest Threads

Top