compare xml dom documents

A

asaf.lahav

Hi all,

I'm looking for an infrastructure that will enable me to compare two
xml dom document instances.
Preferrably the infrastructure will generate a differences map between
the two documents maybe based on a schema or xpaths.

I would appreciate any pointers...
Thanks in advance,
 
S

Stefan Schulz

Hi all,

I'm looking for an infrastructure that will enable me to compare two
xml dom document instances.
Preferrably the infrastructure will generate a differences map between
the two documents maybe based on a schema or xpaths.

I would appreciate any pointers...
Thanks in advance,

This problem is decidedly non-trivial. What do you want to do? Do your
documents necessarily have something in common? Do they allow only some
modifications to be applied (for example, only add / delete subtrees)?

I am not currently aware of a "out of the box" solution for the problem,
but the more specific you are, the more likely someone will be able to
point you in the right direction.

Consider:
<A>
<B foo="bar"/>
<C/>
</A>
------------------------------
<N>
<B foo="bar"/>
<D number="0"/>
</N>

What would the differences be?

* Delete root node, all children, and create new tree.
* Change root node from A to N, delete child C, create child D
* Change root node from A to N, change child C to D, add attribute "number"

Something else still?
 
E

EricF

This problem is decidedly non-trivial. What do you want to do? Do your
documents necessarily have something in common? Do they allow only some
modifications to be applied (for example, only add / delete subtrees)?

I am not currently aware of a "out of the box" solution for the problem,
but the more specific you are, the more likely someone will be able to
point you in the right direction.

Consider:
<A>
<B foo="bar"/>
<C/>
</A>
------------------------------
<N>
<B foo="bar"/>
<D number="0"/>
</N>

What would the differences be?

* Delete root node, all children, and create new tree.
* Change root node from A to N, delete child C, create child D
* Change root node from A to N, change child C to D, add attribute "number"

Something else still?

Google for XML Diff
 
M

mrandywarner

I did some research on this while back for a project I was working on
and came across a graduate student who was working on this as his
thesis. The more general of a solution that you need, the harder the
problem becomes, (the grad student had a proof that the problem
allowing a subtree from one document to appear in the second document,
possibly altered, possibly multiple times, is NP Complete). If you
have a particular schema that both docs are known to be valid against I
think you're most likely to get the best solution by doing this one
yourself where you can optimize when appropriate. The xml I was
parsing was actually custom built to serialize a java object in a
database, so I ended up finding it easier to simply build to objects
and write code that would find the differences in the objects since
that is a more concrete problem space. But that also depends on what
you're using it for. If you're diff is going to be performed a lot on
a performance intensive environment, parsing and constructing the
objects might end up being slower. Everything's a trade off.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top