Comparing Subtrees in XSLT

R

Ryan Nordman

Hey XML gurus,

Is there a way to write a template so that it will compare two XML
subtrees wholesale? I can't quite get my head around this problem.

Let's say I have these two XML trees, both have the same structure,
but one of them has a few different pieces of text than the other.

So my basic thoughts here are to first create a key that in essence
makes a catalog of all the elements in one of the subtrees, call it
tree A, with the key being simply the name of the nodes. Then I call
a template on the root node of tree B, and basically, for each node, I
generate the key so I can pull up the matching node in tree A and
compare them somehow.

By compare though, for each node in tree A I actually want to compare
all of it's children with all of the children in tree B so that if
they're all equal, I can ignore them. If a node in A has some child
that doesn't match it's equivalent child in B, then I need to write
that node. Otherwise, I ignore it and it is not in my XSLT output.

Does this make sense? Am I thinking about this problem the wrong
way? Any kind of discussion or pointing me in the right direction
would really help.
 
J

Joe Kesselman

Ryan said:
Is there a way to write a template so that it will compare two XML
subtrees wholesale? I can't quite get my head around this problem.

Should be possible by doing parallel recursive tree walks. Think about
what comparison means:

The node has the same type.

If it's an element or attribute, it has the same local name and
namespace. If it's a PI, it has the same name. (PIs are not namespaced.)

If it's an attribute or text node or comment or PI, it has the same value.

If it's an element:

1) It has the same attributes. Remember that order of attributes is not
meaningful, so you probably want to sort each set before scanning
through it. Each attribute node pair can then be processed as above.
Obviously if the number of attributes disagrees that's also a difference.

2) It has the same namespaces in scope. (You may actually want to skip
or limit this test depending on what you're doing.)

3) It has the same children. Children are ordered, so you can scan them
first to last. Each child can be compared as above, recursively so its
content also gets scanned.

So, yes, this seems doable.

Though, actually, for what you've requested (finding corresponding
points), you might be able to shortcut by just taking the source node,
deriving an XPath for it relative to its subtree root, then re-applying
that XPath to the other subtree's root... You'd need an extension
function to apply a dynamically calculated XPath from within the
stylesheet, but many (not all) processors do support the EXSLT
dynamic-xpath call.

Having said all this: I suspect I'd solve this problem with a language
other than XSLT, if I was the one coding it.
 
R

Ryan Nordman

Should be possible by doing parallel recursive tree walks. Think about
what comparison means:

The node has the same type.

If it's an element or attribute, it has the same local name and
namespace. If it's a PI, it has the same name. (PIs are not namespaced.)

If it's an attribute or text node or comment or PI, it has the same value.

If it's an element:

1) It has the same attributes. Remember that order of attributes is not
meaningful, so you probably want to sort each set before scanning
through it. Each attribute node pair can then be processed as above.
Obviously if the number of attributes disagrees that's also a difference.

2) It has the same namespaces in scope. (You may actually want to skip
or limit this test depending on what you're doing.)

3) It has the same children. Children are ordered, so you can scan them
first to last. Each child can be compared as above, recursively so its
content also gets scanned.

So, yes, this seems doable.

Though, actually, for what you've requested (finding corresponding
points), you might be able to shortcut by just taking the source node,
deriving an XPath for it relative to its subtree root, then re-applying
that XPath to the other subtree's root... You'd need an extension
function to apply a dynamically calculated XPath from within the
stylesheet, but many (not all) processors do support the EXSLT
dynamic-xpath call.

Having said all this: I suspect I'd solve this problem with a language
other than XSLT, if I was the one coding it.

Thanks for your input. I took your advice and just used java along
with dom4j's API to get the job done. Haven't dealt with the
attributes yet or the possibility of repeating sections that have
different numbers of nodes between the two subtrees. Anyway, I think
I can extend what I've got.

Thanks again,
-Ryan
 
J

Joe Kesselman

Ryan said:
Thanks for your input. I took your advice and just used java along
with dom4j's API to get the job done.

Personally I have never understood the appeal of DOM4J. It's a bit more
Java-flavored, perhaps, maybe... but that appears to be its sole real
advantage; all the other things claimed for it have been supported with
fairly bogus examples.

Yes, as one of the authors of the W3C DOM spec, I'm biased... but I
would still recommend sticking with the DOM rather than DOM4J unless
there is something in the latter which is really a make-or-break for
you, simply because the DOM is a far more portable solution.

And if you have to use something like DOM4J, I'd recommend looking at
one of the competing systems derived from it. DOM4J's biggest weak point
is that it's concrete classes rather than interfaces, which leaves no
room to plug in a version which is more efficient for your particular
needs without completely ripping it out and replacing it. That may not
matter on toy applications; it definitely matters in the real world.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top