what is the bettter/performant way to compare org.w3c.dom.DocumentFragment

M

Mausam

I have a java class, whose contains a DocumentFragment.

In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.

I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.

So in such cases this equality will fail.

Please suggest a better approach.
 
J

Jeff Higgins

I have a java class, whose contains a DocumentFragment.

In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.

I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.

So in such cases this equality will fail.

Please suggest a better approach.
A my class is equal to another my class if and only if ...
 
A

Arne Vajhøj

I have a java class, whose contains a DocumentFragment.

In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.

I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.

So in such cases this equality will fail.

I think XML Canonicalization will solve the problem.

It comes as a cost though.

Arne
 
M

Mausam

A my class is equal to another my class if and only if ...

Thanks Jeff, I understand what you mean.

BTW, I was checking the API http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html#isEqualNode(org.w3c.dom.Node)

The attributes NamedNodeMaps are equal. This is: they are both null, or they have the same length and for each node that exists in one map there is a node that exists in the other map and is equal, although not necessarily at the same index.


The childNodes NodeLists are equal. This is: they are both null, or they have the same length and contain equal nodes at the same index. Note that normalization can affect equality; to avoid this, nodes should be normalized before being compared.

Here for attributes, they take care of "NOT necessarily at the same index" but in case of childNodes its not being taken care of. So if there is a sequence of unordered elements (<emp/><dept/> and <dept/><emp/> ) they will be treated as NOT equal.

So either I iterate through each node and attribute and do a comparison. That's the fall back. But before that, I wanted to check the experts if there are better options.
 
J

Jeff Higgins

Thanks Jeff, I understand what you mean.

BTW, I was checking the API http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html#isEqualNode(org.w3c.dom.Node)

The attributes NamedNodeMaps are equal. This is: they are both null, or they have the same length and for each node that exists in one map there is a node that exists in the other map and is equal, although not necessarily at the same index.


The childNodes NodeLists are equal. This is: they are both null, or they have the same length and contain equal nodes at the same index. Note that normalization can affect equality; to avoid this, nodes should be normalized before being compared.

Here for attributes, they take care of "NOT necessarily at the same index" but in case of childNodes its not being taken care of. So if there is a sequence of unordered elements (<emp/><dept/> and<dept/><emp/> ) they will be treated as NOT equal.

So either I iterate through each node and attribute and do a comparison. That's the fall back. But before that, I wanted to check the experts if there are better options.

Yep. I based my hair trigger response upon the .equals(Object) of the
"known implementing classes" of Node. Sorry. I'll be interested in
finding out the "cost" associated with Arne Vajhøj's response.
 
A

Arne Vajhøj

I think XML Canonicalization will solve the problem.

It comes as a cost though.

Example:

import java.io.IOException;
import java.io.UnsupportedEncodingException;

import javax.xml.parsers.ParserConfigurationException;

import org.apache.xml.security.Init;
import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.xml.sax.SAXException;

public class XmlComp {
static {
Init.init();
}
private static String canonicalize(String s) throws
InvalidCanonicalizerException, UnsupportedEncodingException,
CanonicalizationException, ParserConfigurationException, IOException,
SAXException {
Canonicalizer c14n =
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
String res = new
String(c14n.canonicalize(s.getBytes(Canonicalizer.ENCODING)),
Canonicalizer.ENCODING);
return res;
}
public static void main(String[] args) throws Exception {
String s1 = "<a><b c='1' d='2'/></a>";
String s2 = "<a><b d='2' c='1'/></a>";
System.out.println(s1);
System.out.println(s2);
System.out.println(canonicalize(s1));
System.out.println(canonicalize(s2));
}
}

outputs:

<a><b c='1' d='2'/></a>
<a><b d='2' c='1'/></a>
<a><b c="1" d="2"></b></a>
<a><b c="1" d="2"></b></a>

Arne
 
A

Arne Vajhøj

Yep. I based my hair trigger response upon the .equals(Object) of the
"known implementing classes" of Node. Sorry. I'll be interested in
finding out the "cost" associated with Arne Vajhøj's response.

The cost is CPU time. It cost a bit of CPU time to parse and
reorganize and serialize again.

Arne
 
M

Mausam

Thanks Arne,

I can achieve that using Node.isEqualTo(Node) API post JDK1.5.

I am worried of following usecases (wondering if its even valid usecase or not)

1)
Are these two Nodes equal? (check that one has empty street element and other has no street element. That implies that value for street is empty in both cases. So as per employee object is considered in Java, both will be equal.
<Employee company="example" xmlns="http://example.com" debug="true">
<Employeename>mausam</Employeename>
<email>a @example.com</email>
<street/>
</Employee>

<Employee debug="true" company="example" xmlns="http://example.com">
<Employeename>mausam</Employeename>
<email>a @example.com</email>
</Employee>

2)
Check the sequence of street element. In Node 1 it is after email and in node2 it is before.
<Employee company="example" xmlns="http://example.com" debug="true">
<Employeename>mausam</Employeename>
<email>a @example.com</email>
<street>Marienplatz</street>
</Employee>

<Employee debug="true" company="example" xmlns="http://example.com">
<Employeename>mausam</Employeename>
<street>Marienplatz</street>
<email>a @example.com</email>
</Employee>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top