XML traversal in level-order (breadth-first) with XSLT

C

Christian Rühl

Hi all!

I need to traverse a XML file in level-order (breadth-first) in order
to number it's nodes. The XML structure looks a little like this:

<Component>
<Component>
<Component/>
</Component>
<Component/>
</Component>

I want to append attributes to each node carrying its level and its
occurence as integers.
Is there a chance to do this using XSLT?

The result should then look like:

<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="4"/>
</Component>
<Component level="2" number="3"/>
</Component>

After googling a little I found this:
<http://www.tkachenko.com/blog/archives/000268.html>

What do you think? What is a simple way to start here?

Thanks in advance!

//Chris
 
C

Christian Rühl

Okay, just found out that setting each node's level ain't that tough.
This is done with:

<xsl:variable name="mncl"><xsl:value-of select="count(ancestor::*)"/></
xsl:variable>
<xsl:attribute name="level"><xsl:value-of select="$mncl"/></
xsl:attribute>

But how can I achieve a correct level-ordered numbering of my
"Component" nodes?
 
P

Pavel Lepin

Christian Rühl said:
I want to append attributes to each node carrying its
level and its occurence as integers.

<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="4"/>
</Component>
<Component level="2" number="3"/>
</Component>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="xml" indent="yes"/>
<xsl:template match="@*|node()[not(self::*)]">
<xsl:copy/>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:attribute name="level">
<xsl:call-template name="calc-level"/>
</xsl:attribute>
<xsl:attribute name="number">
<xsl:call-template name="calc-number"/>
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template name="calc-level">
<xsl:value-of select="1+count(ancestor::*)"/>
</xsl:template>
<xsl:template name="calc-number">
<xsl:variable name="level">
<xsl:call-template name="calc-level"/>
</xsl:variable>
<xsl:value-of
select=
"
1
+count(//*[1+count(ancestor::*) &lt; $level])
+count(preceding::*[1+count(ancestor::*)=$level])
"/>
</xsl:template>
</xsl:stylesheet>

Note that while this works, calc-number named template is
computationally expensive. Using a general-purpose language
together with DOM API might be a much better solution for
large documents.
 
C

Christian Rühl

Thank you, Pavel!
Note that while this works, calc-number named template is
computationally expensive. Using a general-purpose language
together with DOM API might be a much better solution for
large documents.

I thought of that. But I don't use too large XML files, so that a XSLT
solution should not have noticeable effects here. The thing is, that
later I need to traverse the tree in pre-order - but of course then as
a DOM. So I was looking for a chance to avoid having two different DOM
traversals whereof one only gets the components numbered.
 
C

Christian Rühl

I played around a little and figured out, that this part always
returns 0:

<xsl:value-of select="count(//*[1+count(ancestor::*) &lt; $mncl])"/>

But I don't see whats wrong with it. In my eyes it looks okay though,
for it counts all direct ancestors with a smaller level, doesn't it?
 
P

Pavel Lepin

Please quote what you're replying to.

Christian Rühl said:
I played around a little and figured out, that this part
always returns 0:

<xsl:value-of select="count(//*[1+count(ancestor::*) &lt;
$mncl])"/>

Works fine for me, using a modified version of your sample
document, the transformation that I posted originally and
Saxon-8B:

pavel@debian:~/dev/xslt$ saxon -t comp.xml comp.xsl
Saxon 8.8J from Saxonica
Java version 1.5.0
Warning: at xsl:stylesheet on line 2 of
file:///var/www/dev/xslt/comp.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 778 milliseconds
Processing file:/var/www/dev/xslt/comp.xml
Building tree for file:/var/www/dev/xslt/comp.xml using
class net.sf.saxon.tinytree.TinyBuilder
Tree built in 11 milliseconds
Tree size: 45 nodes, 0 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?>
<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="10"/>
<Component level="3" number="11"/>
<Component level="3" number="12"/>
</Component>
<Component level="2" number="3"/>
<Component level="2" number="4">
<Component level="3" number="13"/>
<Component level="3" number="14"/>
</Component>
<Component level="2" number="5"/>
<Component level="2" number="6">
<Component level="3" number="15"/>
<Component level="3" number="16"/>
<Component level="3" number="17"/>
</Component>
<Component level="2" number="7"/>
<Component level="2" number="8">
<Component level="3" number="18"/>
<Component level="3" number="19"/>
</Component>
<Component level="2" number="9"/>
</Component>Execution time: 241 milliseconds
Memory used: 15732736
NamePool contents: 19 entries in 17 chains. 7 prefixes, 8
URIs
pavel@debian:~/dev/xslt$

Using libxslt or Xalan-C++ yields the same results. Please
post minimal example that reproducibly demonstrates the
problem, and mention the transformation engine you're
using. Might be an XPath precedence problem, now that I
think about it. If that is the case, it's likely a problem
with your XSLT processor, although it is marginally
possible that three major engines would all get it wrong.
 
C

Christian Rühl

Using libxslt or Xalan-C++ yields the same results. Please
post minimal example that reproducibly demonstrates the
problem, and mention the transformation engine you're
using. Might be an XPath precedence problem, now that I
think about it. If that is the case, it's likely a problem
with your XSLT processor, although it is marginally
possible that three major engines would all get it wrong.

Hm, then maybe I'm doing something wrong.

I'm transforming with javax.xml.transform.*; in Eclipse. Here's the
code I'm using:

------------------------------------------------------------------------------------
// set target location and xslt location
m_result = new StreamResult( new FileOutputStream(m_prodTreeFile) );
m_xsltSource = new StreamSource( new
FileInputStream(m_prodTreeXslt) );

// create transformer factory and transformer instance
m_factory = TransformerFactory.newInstance();
m_transformer = m_factory.newTransformer(m_xsltSource);

// set parameters (files to copy)
m_transformer.setParameter("tree", m_prodTree);
m_transformer.setParameter("archive", m_archiveFile);

// copy file contents and fill result target
m_transformer.transform(new StreamSource(), m_result);

// clear both parameters
m_transformer.clearParameters();
------------------------------------------------------------------------------------

I don't think that the problem is due to that code. Maybe I'm calling
my templates and/or parameters wrong.

Here's that code part (I modified the calculations due to 2 more node-
levels on top that I don't want to count):

------------------------------------------------------------------------------------
<xsl:template name="calculate-level">
<xsl:value-of select="count(ancestor::*)-1"/>
</xsl:template>

<xsl:template name="calculate-number">
<xsl:variable name="level">
<xsl:call-template name="calculate-level"/>
</xsl:variable>
<xsl:value-of select="1 + count(//*[count(ancestor::*)-1 &lt;
$level]) + count(preceding::*[count(ancestor::*)-1 = $level])"/>
</xsl:template>

<xsl:template name="top-level-component" match="/">
<xsl:for-each select="$treeDoc//PRODUCT_TREE">
<xsl:if test="Component">
<Component>
<xsl:attribute name="mncl"><xsl:call-template
name="calculate-level"/></xsl:attribute>
<xsl:attribute name="mncn"><xsl:call-template
name="calculate-number"/></xsl:attribute>
<xsl:if test="Component">
<xsl:call-template name="component"/>
</xsl:if>
</Component>
</xsl:if>
</xsl:for-each>
</xsl:template>

<xsl:template match="/">
<xsl:call-template name="top-level-component"/>
</xsl:template>
------------------------------------------------------------------------------------

My main problem here might be, that I can't put "$treeDoc" in the
match-tag of a template. Therefore I have one template for the top-
level-component which calls an analog template "component" that then
handles all following nodes. Maybe you can give me a hint here, for I
really doubt that this is a good way to go.

To show you the results I'm getting, my input file currently looks
like this:

------------------------------------------------------------------------------------
<top>
<product>
<Component>
<Component>
<Component>
<Component/>
</Component>
<Component/>
</Component>
<Component>
<Component/>
</Component>
</Component>
</product>
</top>
------------------------------------------------------------------------------------

And the result looks like:

------------------------------------------------------------------------------------
<top>
<product>
<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="2">
<Component level="4" number="4"/>
</Component>
<Component level="3" number="3"/>
</Component>
<Component level="2" number="3">
<Component level="3" number="5"/>
</Component>
</Component>
</product>
</top>
------------------------------------------------------------------------------------
 
P

Pavel Lepin

Once again, this post contains a good deal of critique. If
you find that somehow offensive, please just ignore it.

Christian Rühl said:
I'm transforming with javax.xml.transform.*; in Eclipse.

I strongly advise that you get a standalone XSLT processor
somewhere and use it for debugging your transformations.

I'm not a Java programmer by trade, so correct me if I'm
wrong, but I believe it's just a generic API to various
transformation engines.
// set target location and xslt location
m_result = new StreamResult( new
FileOutputStream(m_prodTreeFile) );
m_xsltSource = new StreamSource( new
FileInputStream(m_prodTreeXslt) );

// create transformer factory and transformer instance
m_factory = TransformerFactory.newInstance();
m_transformer = m_factory.newTransformer(m_xsltSource);

Once again, correct me if I'm wrong, but this gives no
indication what engine you're actually using. Could be
Saxon, Xalan-J or any other transformation engine you
happen to have registered with your factory.
// set parameters (files to copy)
m_transformer.setParameter("tree", m_prodTree);
m_transformer.setParameter("archive", m_archiveFile);

Are those file names or parsed XML documents? I believe
passing anything but integral data to your stylesheet is
ill-specified (mmm... if not outright disallowed - can't be
bothered to look it up right now), so I wouldn't do that if
there was any way around it.
<xsl:template name="calculate-level">
<xsl:value-of select="count(ancestor::*)-1"/>
</xsl:template>

Bad idea. Instead of tinkering with the total count, modify
the XPath expression so that it returns a nodeset
consisting solely of the nodes that you actually want to
count:

1+count(ancestor::Component)
<xsl:template name="calculate-number">
<xsl:variable name="level">
<xsl:call-template name="calculate-level"/>
</xsl:variable>
<xsl:value-of select="1 +
count(//*[count(ancestor::*)-1 &lt;
$level]) + count(preceding::*[count(ancestor::*)-1 =
$level])"/> </xsl:template>

Same here.
<xsl:template name="top-level-component" match="/">

Generally bad idea, unless you have a very good reason to do
this.
<xsl:for-each select="$treeDoc//PRODUCT_TREE">

for-each? $treeDoc?
<xsl:if test="Component">
<Component>
<xsl:attribute name="mncl"><xsl:call-template
name="calculate-level"/></xsl:attribute>
<xsl:attribute name="mncn"><xsl:call-template
name="calculate-number"/></xsl:attribute>

Wrong. The current node is in the nodeset resulting from
evaluating the XPath expression in for-each. <xsl:if> does
not affect the current node. So you're creating a Component
element, then invoke the named templates meant to calculate
the level and number in context of PRODUCT_TREE element.
<xsl:if test="Component">
<xsl:call-template name="component"/>
</xsl:if>

You haven't defined a component named template, though...
</Component>
</xsl:if>
</xsl:for-each>
</xsl:template>

<xsl:template match="/">
<xsl:call-template name="top-level-component"/>
</xsl:template>

This whole idea is wrong. That's what identity
transformation is for (google it, it's THE ultimate
essential technique in XSLT).
My main problem here might be, that I can't put "$treeDoc"
in the match-tag of a template.

*shrug* What is $treeDoc anyway? You seem to be using it,
you're talking about, but haven't defined it anywhere.
Therefore I have one template for the top-level-component
which calls an analog template "component" that then
handles all following nodes. Maybe you can give me a hint
here, for I really doubt that this is a good way to go.

Why do you want to process top Component element
differently, anyway?
And the result looks like:

<top>
<product>
<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="2">
<Component level="4" number="4"/>
</Component>
<Component level="3" number="3"/>
</Component>
<Component level="2" number="3">
<Component level="3" number="5"/>
</Component>
</Component>
</product>
</top>

Well, I cannot run your version of it. But everything works
just fine in my version:

pavel@debian:~/dev/xslt$ xmllint comp2.xml
<?xml version="1.0"?>
<top>
<product>
<Component>
<Component>
<Component>
<Component/>
</Component>
<Component/>
</Component>
<Component>
<Component/>
</Component>
</Component>
</product>
</top>
pavel@debian:~/dev/xslt$ xmllint comp2.xsl
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:eek:utput method="xml" indent="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Component">
<xsl:copy>
<xsl:attribute name="level">
<xsl:call-template name="calc-level"/>
</xsl:attribute>
<xsl:attribute name="number">
<xsl:call-template name="calc-number"/>
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template name="calc-level">
<xsl:value-of select="1+count(ancestor::Component)"/>
</xsl:template>
<xsl:template name="calc-number">
<xsl:variable name="level">
<xsl:call-template name="calc-level"/>
</xsl:variable>
<xsl:value-of select=" 1 +count
( //Component
[1+count(ancestor::Component) &lt; $level] )
+count ( preceding::Component
[1+count(ancestor::Component)=$level] ) "/>
</xsl:template>
</xsl:stylesheet>
pavel@debian:~/dev/xslt$ saxon -t comp2.xml comp2.xsl
Saxon 8.8J from Saxonica
Java version 1.5.0
Warning: at xsl:stylesheet on line 2 of
file:///var/www/dev/xslt/comp2.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 754 milliseconds
Processing file:/var/www/dev/xslt/comp2.xml
Building tree for file:/var/www/dev/xslt/comp2.xml using
class net.sf.saxon.tinytree.TinyBuilder
Tree built in 10 milliseconds
Tree size: 25 nodes, 0 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?>
<top>
<product>
<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="4">
<Component level="4" number="7"/>
</Component>
<Component level="3" number="5"/>
</Component>
<Component level="2" number="3">
<Component level="3" number="6"/>
</Component>
</Component>
</product>
</top>Execution time: 145 milliseconds
Memory used: 14512128
NamePool contents: 20 entries in 19 chains. 7 prefixes, 8
URIs
pavel@debian:~/dev/xslt$
 
C

Christian Rühl

Once again, this post contains a good deal of critique. If
you find that somehow offensive, please just ignore it.

Don't worry. I'm glad you're showing me my mistakes and helping me
making it better.
I strongly advise that you get a standalone XSLT processor
somewhere and use it for debugging your transformations.

I'm not a Java programmer by trade, so correct me if I'm
wrong, but I believe it's just a generic API to various
transformation engines.

You are right:
"[...] This package defines the generic APIs for processing
transformation instructions, and performing a transformation from
source to result. These interfaces have no dependencies on SAX or the
DOM standard, and try to make as few assumptions as possible about the
details of the source and result of a transformation. It achieves this
by defining Source and Result interfaces. [...]"
<http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/transform/package-
summary.html>
Once again, correct me if I'm wrong, but this gives no
indication what engine you're actually using. Could be
Saxon, Xalan-J or any other transformation engine you
happen to have registered with your factory.


Are those file names or parsed XML documents? I believe
passing anything but integral data to your stylesheet is
ill-specified (mmm... if not outright disallowed - can't be
bothered to look it up right now), so I wouldn't do that if
there was any way around it.

These parameters carry the full paths of the files I want to work
with. The first one doesn't matter here. The second one (String
m_prodTree) holds the path of my input product tree.
1+count(ancestor::Component)

Okay, that's what I did. Thanks, works fine!
<xsl:template name="calculate-number">
<xsl:variable name="level">
<xsl:call-template name="calculate-level"/>
</xsl:variable>
<xsl:value-of select="1 +
count(//*[count(ancestor::*)-1 &lt;
$level]) + count(preceding::*[count(ancestor::*)-1 =
$level])"/> </xsl:template>

Same here.
<xsl:template name="top-level-component" match="/">

Generally bad idea, unless you have a very good reason to do
this.

Why is this bad and what would be a good reason? I can't follow you
here. How would you do this?
for-each? $treeDoc?

Yeah, I know that's a bad one. But I didn't find a better way... I've
been searching a couple of days now... :-(
And as I said, i can't put "$treeDoc" in the match-tag of a template.
Wrong. The current node is in the nodeset resulting from
evaluating the XPath expression in for-each. <xsl:if> does
not affect the current node. So you're creating a Component
element, then invoke the named templates meant to calculate
the level and number in context of PRODUCT_TREE element.

Hm, that makes sense.
Ooops, and sorry: I forgot to add "/Component" to the select-tag. So
the whole thing actually looks like <xsl:for-each select="$treeDoc//
You haven't defined a component named template, though...

Oh, I did. I just didn't post it here, because it looks exactly like
the "top-level-component" template. Except the <xsl:for-each
select="Component"> (Here I got rid of the "$treeDoc" part).
This whole idea is wrong. That's what identity
transformation is for (google it, it's THE ultimate
essential technique in XSLT).

Sorry, but I don't really know how to work with that. I think Joe
Kesselman postet a Link to it last time. Looks "good" to me, but I
have no idea how to work with that.
*shrug* What is $treeDoc anyway? You seem to be using it,
you're talking about, but haven't defined it anywhere.

That's a DOM of the input file:
Why do you want to process top Component element
differently, anyway?

Actually I don't, I just didn't find a "workaround" As said above.
The problem is, that I need to do this "level-order component
numbering" within another transformation. Therefore I have the second
(String m_archiveFile) parameter. If you want, I can upload my
stylesheet and my input files for you. So maybe then you are able to
run my version.

And btw: what would be a good stand-alone XSLT processor? I'm
currently editing my files with Altova XMLSpy 2006, but I'm not quite
sure if there's a XSLT processor integrated. Haven't tried yet.
 
C

Christian Rühl

Might be an XPath precedence problem, now that I
think about it. If that is the case, it's likely a problem
with your XSLT processor, although it is marginally
possible that three major engines would all get it wrong.

Just tried it at home with a clean transformer class and your solution
works fine! So there's neither a problem with the XSLT processor nor
with your stylesheet. Bot are working just fine!

What I made different now is to initialize the transformer directly
with the source file instead of giving a path-parameter to the
stylesheet. In Java code this section looks like this:

NOW:
m_transformer.transform(new StreamSource(m_prodTree), m_result); //
predefined source

BEFORE:
m_transformer.transform(new StreamSource(), m_result); // clean source

I could go on that way, but I want to learn what went wrong in my
parameter-solution. So I hope to read more of you tomorrow. Have a
good one!
 
C

Christian Rühl

NOW:
m_transformer.transform(new StreamSource(m_prodTree), m_result); //
predefined source

BEFORE:
m_transformer.transform(new StreamSource(), m_result); // clean source

I could go on that way, but I want to learn what went wrong in my
parameter-solution. So I hope to read more of you tomorrow. Have a
good one!

Good morning everybody!

I started a new stylesheet from scratch using the following template:

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

I added some other templates, i.e. <xsl:template match="Component">
which handle my nodes. Even the component numbering works absolutlely
fine now.

There's only one problem: How can I copy the nodes of my second file
now?

While my product-tree file looks like shown before:

<top>
<product_tree>
<component>
<component>
</component>
</product_tree>
</top>

.... my archive file looks like this (common-, glue- and mtds-nodes of
unbounded occurrence - but no child nodes):

<top>
<archive>
<common name="bla" path="C:/file_1_of_3.file"/>
<glue name="bla" path="C:/file_1_of_2.file"/>
<mtds name="bla" path="C:/file_1_of_9.file"/>
</archive>
</top>

Previously I copied those files like this ($archiveDoc is the DOM of
document($archive) where $archive is the parameter that carries the
archive file full path):

<xsl:template name="archive-files" match="/">
<xsl:for-each select="$archiveDoc//Common">
<Common name="{@name}" path="{@path}"/>
</xsl:for-each>
<xsl:for-each select="$archiveDoc//Glue">
<Glue name="{@name}" path="{@path}"/>
</xsl:for-each>
<xsl:for-each select="$archiveDoc//Mtds">
<Mtds name="{@name}" path="{@path}"/>
</xsl:for-each>
</xsl:template>

How can I now add these informations to my source file so that it
looks like this:

<top>
<archive>
<common/>
<glue/>
<mtds/>
</archive>
<product_tree>
<component>
<component>
</component>
</product_tree>
</top>
 
P

Pavel Lepin

Christian Rühl said:
These parameters carry the full paths of the files I want
to work with. The first one doesn't matter here. The
second one (String m_prodTree) holds the path of my input
product tree.

Should actually be workable, if you really need that.
Why is this bad and what would be a good reason? I can't
follow you here. How would you do this?

The bad idea I'm talking about is a template that is both
named, and matchable, that is, a template that is supposed
to be invoked using both xsl:apply-templates and
xsl:call-template. The potential for confusion is huge.
And as I said, i can't put "$treeDoc" in the match-tag of
a template.

Mmm, you don't need to, unless I'm missing something.
match="PRODUCT_TREE" will match PRODUCT_TREE elements no
matter where they come from. Then you do something like:

<xsl:template match="/">
<xsl:apply-templates select="$treeDoc"/>
said:
Sorry, but I don't really know how to work with that. I
think Joe Kesselman postet a Link to it last time. Looks
"good" to me, but I have no idea how to work with that.

Basically, you need a paradigm shift, if you'll pardon me
using a beaten cliche. You need to grok the whole idea of
rule-based processing. You define "rules" (that is,
templates), then let your transformation engine decide
which rule to invoke when processing a particular node.
Identity transformation is a ground-level rule: "just copy
as is, and recursively apply templates to attributes and
child nodes". If for certain nodes you need to do something
else (think PRODUCT_TREE elements and Component elements in
your case), you define corresponding templates matching
those nodes and specify a different processing model. This
is the very basic stuff that you won't go far without.
Actually I don't, I just didn't find a "workaround" As
said above. The problem is, that I need to do this
"level-order component numbering" within another
transformation. Therefore I have the second (String
m_archiveFile) parameter.

I'm not sure I get your point, but you cannot really do
chained transformations with XSLT1, due to the weird
distinction between nodesets (what you get by evaluating
XPath expressions) and RTFs (what you get by constructing
nodes). Basically, you can't do anything to RTFs aside from
just stuffing them into the result document. So if you need
to chain two transformations with XSLT1, you need two
separate stylesheets and two processor invocations:


First Second
Style- Style-
sheet sheet
| |
v v
Source -> First -> Inter- -> Second -> Resulting
XML Trans- mediate Trans- XML
formation XML formation

Now, XSLT2 does away with the notions of nodesets and RTFs,
replacing them with 'sequences', but that is likely of no
consequence to you, unless you're using Saxon for your
transformations.
If you want, I can upload my stylesheet and my input files
for you. So maybe then you are able to run my version.

Frankly, that would be a bit more effort than I'm willing to
invest.
And btw: what would be a good stand-alone XSLT processor?

I normally recommend xsltproc. It's a command-line processor
that comes with libxslt package. libxslt is in
ports/packages collection of pretty much every Unix-like
I'm seeing these days, and is available for Cygwin as well,
if you happen to be on a Windows box.

However, seeing as you are a Java developer, Saxon might be
a much better bet for you. It's an excellent F/OSS
XSLT2/XPath2/XQuery2 processor written in Java, and
invoking it from command-line is as simple as:

java net.sf.saxon.Transform -t document.xml stylesheet.xsl

Naturally, there are many other options, most of which I'm
not all that familiar with.
I'm currently editing my files with Altova XMLSpy 2006,
but I'm not quite sure if there's a XSLT processor
integrated.

I believe I've seen reports of people running XSLT
transformations in Altova's suite, but personally I
wouldn't touch it with a ten-foot pole, solely for the
reason that it has a somewhat murky history with regard to
standard-compliance. My information on the topic is dated,
so that might've changed recently.
 
P

Pavel Lepin

[identity transformation]
I added some other templates, i.e. <xsl:template
match="Component"> which handle my nodes. Even the
component numbering works absolutlely fine now.

There's only one problem: How can I copy the nodes of my
second file now?

While my product-tree file looks like shown before:

<top>
<product_tree>
<component>
<component>
</component>
</product_tree>
</top>

... my archive file looks like this (common-, glue- and
mtds-nodes of unbounded occurrence - but no child nodes):

<top>
<archive>
<common name="bla" path="C:/file_1_of_3.file"/>
<glue name="bla" path="C:/file_1_of_2.file"/>
<mtds name="bla" path="C:/file_1_of_9.file"/>
</archive>
</top>

How can I now add these informations to my source file so
that it looks like this:

<top>
<archive>
<common/>
<glue/>
<mtds/>
</archive>
<product_tree>
<component>
<component>
</component>
</product_tree>
</top>

<xsl:template match="top">
<xsl:copy>
<xsl:apply-templates select="$archiveDoc"/>
<xsl:apply-templates select="product_tree"/>
</xsl:copy>
</xsl:template>

Define templates matching archive, common, glue etc.
elements if you need some extra processing.
 
C

Christian Rühl

The bad idea I'm talking about is a template that is both
named, and matchable, that is, a template that is supposed
to be invoked using both xsl:apply-templates and
xsl:call-template. The potential for confusion is huge.

Right, that makes sense now I think of it. Thanks for lightening it up
for me.
Mmm, you don't need to, unless I'm missing something.
match="PRODUCT_TREE" will match PRODUCT_TREE elements no
matter where they come from. Then you do something like:

<xsl:template match="/">
<xsl:apply-templates select="$treeDoc"/>
</xsl:template>

This gives me a java.lang.StackOverflowError. But thanks to that I now
know that this all runs on Xalan. :)
Basically, you need a paradigm shift, if you'll pardon me
using a beaten cliche. You need to grok the whole idea of
rule-based processing. You define "rules" (that is,
templates), then let your transformation engine decide
which rule to invoke when processing a particular node.
Identity transformation is a ground-level rule: "just copy
as is, and recursively apply templates to attributes and
child nodes". If for certain nodes you need to do something
else (think PRODUCT_TREE elements and Component elements in
your case), you define corresponding templates matching
those nodes and specify a different processing model. This
is the very basic stuff that you won't go far without.

Ah, okay. I understood that. So you're just matching @*|node(),
copying all of its contents and apply that recursively to get all
child nodes. Thanks for that!
I'm not sure I get your point, but you cannot really do
chained transformations with XSLT1, due to the weird
distinction between nodesets (what you get by evaluating
XPath expressions) and RTFs (what you get by constructing
nodes). Basically, you can't do anything to RTFs aside from
just stuffing them into the result document. So if you need
to chain two transformations with XSLT1, you need two
separate stylesheets and two processor invocations:

First Second
Style- Style-
sheet sheet
| |
v v
Source -> First -> Inter- -> Second -> Resulting
XML Trans- mediate Trans- XML
formation XML formation

Hm, well I had it working before. Okay, my solution didn't work with
that component numbering and it was more or less - well let's say more
- bohemian - but I was able to chain two files and appended attributes
to some of the nodes.
Now, XSLT2 does away with the notions of nodesets and RTFs,
replacing them with 'sequences', but that is likely of no
consequence to you, unless you're using Saxon for your
transformations.


Frankly, that would be a bit more effort than I'm willing to
invest.

I totally understand that.
However, seeing as you are a Java developer, Saxon might be
a much better bet for you. It's an excellent F/OSS
XSLT2/XPath2/XQuery2 processor written in Java, and
invoking it from command-line is as simple as:

java net.sf.saxon.Transform -t document.xml stylesheet.xsl

Thanks, I will try that one.
<xsl:template match="top">
<xsl:copy>
<xsl:apply-templates select="$archiveDoc"/>
<xsl:apply-templates select="product_tree"/>
</xsl:copy>
</xsl:template>

Define templates matching archive, common, glue etc.
elements if you need some extra processing.

I will play around with that. It doesn't really work yet, but maybe
I'll find a solution myself. Thank you for your help, Pavel!
But I still have the feeling that this chapter ain't over yet...
Anyway, I think I'm getting closer, am I?

This is what I have so far:

1.) I created a bunch of templates to handle all possible nodes like
the one here:
<xsl:template match="Component">
said:
</xsl:attribute>
said:
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

2.) I created the calc-level and calc-number templates your way. I.e.:
<xsl:template name="calc-level">
<xsl:value-of select="1+count(ancestor::Component)"/>
</xsl:template>

3.) I created the following DOMs (tree and archive are Strings with
the full paths, as said before):
<xsl:param name="archive"/>
<xsl:param name="tree"/>
<xsl:variable name="archiveDoc" select="document($archive)"/>
<xsl:variable name="treeDoc" select="document($tree)"/>

4.) I think here there still are road works ahead:
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

So now you think I can simply go on doing something like that?

<xsl:template match="top">
<xsl:copy>
<xsl:apply-templates select="$archiveDoc"/>
<xsl:apply-templates select="$treeDoc"/>
</xsl:copy>
</xsl:template>

Hm, doing this simply gives me the following error and a blank result
file:

[Fatal Error] tmp_pmbv2_prodtree_archive.xml:3:1: Premature end of
file.
Premature end of file.
[Ljava.lang.StackTraceElement;@e0c7c3
 
J

Joseph Kesselman

Pavel said:
The bad idea I'm talking about is a template that is both
named, and matchable, that is, a template that is supposed
to be invoked using both xsl:apply-templates and
xsl:call-template. The potential for confusion is huge.

Sorry, Pavel, I have to disagree with you here. If the same processing
really is needed both as a match template and as a "subroutine" called
template, there's absolutely no reason to duplicate the logic, and a
single template can perfectly reasonably be used for both.

The thing to remember is that you don't *have* to specify both match and
name, and the normal practice is only to provide the one you actually
intend to use.
I'm not sure I get your point, but you cannot really do
chained transformations with XSLT1, due to the weird
distinction between nodesets (what you get by evaluating
XPath expressions) and RTFs (what you get by constructing
nodes).

If your processor supports the EXSLT node-set extension function (as
most do these days), that gets around this limitation... but yes,
removing that distinction so you can do multi-pass processing without
needing the extension is one of the small-but-important changes in XSLT 2.0.

The other alternative, as Pavel pointed out, is to actually invoke two
separate stylesheet passes -- which could, but doesn't have to, be the
same stylesheet with different parameters, assuming that your processor
lets you pass in parameters. (Again, most do.)
I normally recommend xsltproc.

And I normally recommend Apache Xalan. But I'm biased; I authored a
significant amount of that code. Downside: It still only supports the
1.0 versions of XPath and XSLT... though I still hope to see that
limitation lifted. Xalan's Java-based.
However, seeing as you are a Java developer, Saxon might be
a much better bet for you. It's an excellent F/OSS
XSLT2/XPath2/XQuery2 processor written in Java

Another very good choice; Saxon is Xalan's main competition and
currently does have the lead. If you need 2.0 support in Java, I'd have
to point you to Saxon for now.

I've heard a few too many bug reports about XML Spy to be comfortable
with it. Of course those may be outdated, but... well, if you get
results out of it that don't look right, you may want to check what some
of the other processors do with the same input before assuming the error
is in your stylesheet.
 
P

Pavel Lepin

Joseph Kesselman said:
Sorry, Pavel, I have to disagree with you here. If the
same processing really is needed both as a match template
and as a "subroutine" called template, there's absolutely
no reason to duplicate the logic, and a single template
can perfectly reasonably be used for both.

I suppose that's largely a matter of taste, but what I would
do would be:

<xsl:template match="foo">
<xsl:call-template name="named-template"/>
</xsl:template>
<xsl:template name="named-template">
<Stuff/>
</xsl:template>

It just seems to be a bit more proper to me.
If your processor supports the EXSLT node-set extension
function (as most do these days), that gets around this
limitation...

Well, yeah, but that's EXSLT. Support varies. To me XSLT2
seems to be a somewhat wiser choice these days.
And I normally recommend Apache Xalan. But I'm biased; I
authored a significant amount of that code. Downside: It
still only supports the 1.0 versions of XPath and XSLT...
though I still hope to see that limitation lifted. Xalan's
Java-based.

Idle nitpick: Xalan-J is Java-based; Xalan-C++ is written in
C++ and on the whole is a very different kettle of fish.
 
C

Christian Rühl

Wooooooooow! After hours - oh, after days - of smacking myself in the
forehead, I finally got it working:

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="top">
<xsl:copy>
<xsl:call-template name="archive-files"/>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

Plus various templates like <xsl:template match="Component">...</
xsl:template> and so on.

This gives me exactly the result I wanted. Thanks a lot, Pavel!
 
P

Pavel Lepin

Christian Rühl said:
This gives me a java.lang.StackOverflowError. But thanks
to that I now know that this all runs on Xalan. :)

I suspect I know what causes your problem. More on that
below.
Hm, well I had it working before. Okay, my solution didn't
work with that component numbering and it was more or less
- well let's say more - bohemian - but I was able to chain
two files and appended attributes to some of the nodes.

No, now that you elaborated I see that your scenario is a
little different. You are processing two separate source
documents, and output just one resulting document. I was
talking about a scenario where you would need some sort of
intermediate representation - that is not possible with
XSLT1 alone.
3.) I created the following DOMs (tree and archive are
Strings with the full paths, as said before):
<xsl:param name="archive"/>
<xsl:param name="tree"/>
<xsl:variable name="archiveDoc"
select="document($archive)"/> <xsl:variable name="treeDoc"
select="document($tree)"/>

<xsl:template match="top">
<xsl:copy>
<xsl:apply-templates select="$archiveDoc"/>
<xsl:apply-templates select="$treeDoc"/>
</xsl:copy>
</xsl:template>

Uh, no, not really. First of all, this template would be
invoked when you are processing top element in your 'tree'
document. And your second template application would start
processing from the root node of 'tree' document again,
*whammo* infinite recursion. Replace select="$treeDoc" with
select="*" (you want to process all the children of top
element here, not the whole document once again).

But wait, it gets worse. Your 'archive' document contains
contains a top element as well. That element would match
this very same template, *whammo* infinite recursion.
Basically, there are two ways around that:

1. Use modes. You can use mode attributes on
xsl:apply-templates and xsl:template elements to
manually specify different rules for superficially
similar scenarios (such as 'top' element having
different semantics in different documents).

2. Change your template so that it matches 'top' element
in 'tree' document, but not in 'archive' document. In
your sample documents this is trivial:

<xsl:template match="top[product]">

This only matches top elements that have product
element children.
Hm, doing this simply gives me the following error and a
blank result file:

[Fatal Error] tmp_pmbv2_prodtree_archive.xml:3:1:
[Premature end of
file.
Premature end of file.
[Ljava.lang.StackTraceElement;@e0c7c3

I strongly recommend taking a very small XML document, a
small transformation, and doing the transformation on
paper. This is going to take half an hour to figure out,
but once you go through this (and once your results match
those that actual XSLT processors produce), you'll have a
much more clear mental picture of how XSLT stylesheets are
processed, and believe me, this is going to be immensely
helpful.

Group archives and XSLT FAQ have a lot of small snippets
that you could use for this.
 
C

Christian Rühl

Sorry, Pavel, I have to disagree with you here. If the same processing
really is needed both as a match template and as a "subroutine" called
template, there's absolutely no reason to duplicate the logic, and a
single template can perfectly reasonably be used for both.

The thing to remember is that you don't *have* to specify both match and
name, and the normal practice is only to provide the one you actually
intend to use.

I removed all "duplicate" tags. My current (and working) stylesheet
consists of call-only and match-only templates now.
The other alternative, as Pavel pointed out, is to actually invoke two
separate stylesheet passes -- which could, but doesn't have to, be the
same stylesheet with different parameters, assuming that your processor
lets you pass in parameters. (Again, most do.)

I don't think this is a good way to go here, because that XML-to-XML
transformation ain't the biggest part in my current software project.
So I guess using two or more stylsheets would be a bit overpowered for
my purposes. But this is good to know for future issues where I need
to work with XSL Transformation.
Another very good choice; Saxon is Xalan's main competition and
currently does have the lead. If you need 2.0 support in Java, I'd have
to point you to Saxon for now.

Thanks for this advice I will surely check this out and I hope there
is a chance to work with one of those in the workshop I'm going to
visit in January.
I've heard a few too many bug reports about XML Spy to be comfortable
with it. Of course those may be outdated, but... well, if you get
results out of it that don't look right, you may want to check what some
of the other processors do with the same input before assuming the error
is in your stylesheet.

Okay, but as a noob I usually assume that the error could be found
between a pair of ears. :p

Btw: Do you know a complete tool for editing XML, XSD and/or XSLT
files? Doesn't have to bring a processor. XMLSpy ain't that bad for
editing purposes, but I need a free/open source one at home.

I'll be looking for Saxon now. Many thanks to both of you!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,007
Messages
2,570,266
Members
46,863
Latest member
montyonthebonty

Latest Threads

Top