XSL problem

A

Andy Fish

Hi,

I'm stuck with an XSL problem - can anyone give me any hints?

I have some XML with nested formatting tags like this:

<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>

which I need to 'flatten out' into something like this:

<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>

It doesn't have to work with any arbitrary tags - there are only a few
possible ones - but I'm not sure how to "remember" the outer level
formatting nodes when processing the text inside. It seems to be
crying out for some kind of state variable

Andy
 
M

Martin Honnen

Andy Fish wrote:

I'm stuck with an XSL problem - can anyone give me any hints?

I have some XML with nested formatting tags like this:

<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>

which I need to 'flatten out' into something like this:

<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>

It doesn't have to work with any arbitrary tags - there are only a few
possible ones - but I'm not sure how to "remember" the outer level
formatting nodes when processing the text inside. It seems to be
crying out for some kind of state variable

Modes can help to give some kind of state in which a node is to be
processed, here is my attempt at using them to solve the problem:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:eek:utput method="xml" encoding="UTF-8" indent="yes" />

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>

<xsl:template match="text">
<xsl:apply-templates select="node()" mode="flatten" />
</xsl:template>

<xsl:template match="text()" mode="flatten">
<text><xsl:value-of select="." /></text>
</xsl:template>

<xsl:template match="bold" mode="flatten">
<xsl:apply-templates select="node()" mode="flattenBold" />
</xsl:template>

<xsl:template match="text()" mode="flattenBold">
<text bold="true"><xsl:value-of select="." /></text>
</xsl:template>

<xsl:template match="italic" mode="flattenBold">
<xsl:apply-templates select="node()" mode="flattenBoldItalic" />
</xsl:template>

<xsl:template match="text()" mode="flattenBoldItalic">
<text bold="true" italic="true"><xsl:value-of select="." /></text>
</xsl:template>

</xsl:stylesheet>

The result is not quite what you want but besides a white space text
node showing up it has the right structure (note I wrapped your source
above in a <doc> element as otherwise if the result is flattened it
wouldn't have a root element):

<doc>
<text>
this is plain
</text>
<text bold="true">
this is bold
</text>
<text italic="true" bold="true">
this is bold-italic
</text>
<text bold="true">
</text>
<text>
this is plain
</text>
</doc>


Now to solve the whitespace text node issue I think the following should
help:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:eek:utput method="xml" encoding="UTF-8" indent="yes" />

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>

<xsl:template match="text">
<xsl:apply-templates select="node()" mode="flatten" />
</xsl:template>

<xsl:template match="text()" mode="flatten">
<xsl:variable name="normalizedText" select="normalize-space(.)" />
<xsl:if test="$normalizedText">
<text><xsl:value-of select="." /></text>
</xsl:if>
</xsl:template>

<xsl:template match="bold" mode="flatten">
<xsl:apply-templates select="node()" mode="flattenBold" />
</xsl:template>

<xsl:template match="text()" mode="flattenBold">
<xsl:variable name="normalizedText" select="normalize-space(.)" />
<xsl:if test="$normalizedText">
<text bold="true"><xsl:value-of select="." /></text>
</xsl:if>
</xsl:template>

<xsl:template match="italic" mode="flattenBold">
<xsl:apply-templates select="node()" mode="flattenBoldItalic" />
</xsl:template>

<xsl:template match="text()" mode="flattenBoldItalic">
<xsl:variable name="normalizedText" select="normalize-space(.)" />
<xsl:if test="$normalizedText">
<text bold="true" italic="true"><xsl:value-of select="." /></text>
</xsl:if>
</xsl:template>

</xsl:stylesheet>
 
B

Ben Edgington

Hi Andy,

I'm stuck with an XSL problem - can anyone give me any hints?

I have some XML with nested formatting tags like this:

<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>

which I need to 'flatten out' into something like this:

<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>

It doesn't have to work with any arbitrary tags - there are only a few
possible ones - but I'm not sure how to "remember" the outer level
formatting nodes when processing the text inside. It seems to be
crying out for some kind of state variable

Here's a brute force version that uses XPath to look at the ancestor
axis. Perhaps not as elegant as Martin's solution, but shorter and
easier to extend (Martin's script will grow as the factorial of the
number of options, I think, whereas this is only linear).

This transformation:

<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<xsl:eek:utput indent="yes"/>

<xsl:template match="/text">
<doc>
<xsl:apply-templates select="text()|node()"/>
</doc>
</xsl:template>

<xsl:template match="text()">
<xsl:if test="normalize-space(.)">
<text>
<xsl:if test="ancestor::bold">
<xsl:attribute name="bold">true</xsl:attribute>
</xsl:if>
<xsl:if test="ancestor::italic">
<xsl:attribute name="italic">true</xsl:attribute>
</xsl:if>
<xsl:value-of select="normalize-space(.)"/>
</text>
</xsl:if>
</xsl:template>

</xsl:stylesheet>


with this input
<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>


gives this output
<?xml version="1.0"?>
<doc>
<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>
</doc>

Ben
 
B

Ben Edgington

Ben Edgington said:
(Martin's script will grow as the factorial of the
number of options, I think, whereas this is only linear).

Sorry, not factorial, but exponential: there will be 2^n-1
cases for n elements.
 
A

Andy Fish

Thanks to both yourself and martin for these two solutions.

although there are "only a few" formatting elements, I certainly don't fancy
having to enumerate every possible combination of them.

Fortunately performance will not be an issue so I can use your idea.

Andy
 
B

Ben Edgington

Andy Fish said:
although there are "only a few" formatting elements, I certainly don't fancy
having to enumerate every possible combination of them.

Fortunately performance will not be an issue so I can use your idea.

You don't need to choose!

Whilst watching television with my two-year-old this morning I
realised that, of course, recursion is the proper way to maintain
state-information in XSLT. (The challenge presented by the Fimbles is
limited, you understand.)

This solution combines the best of Martin's and mine: it maintains
state information without recalculating it from scratch every time,
and its size is only linear in the number of options considered.

This XSLT,

<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<xsl:eek:utput indent="yes"/>

<xsl:template match="/text">
<doc>
<xsl:apply-templates select="text()|node()">
<xsl:with-param name="italic" select="0"/>
<xsl:with-param name="bold" select="0"/>
</xsl:apply-templates>
</doc>
</xsl:template>

<xsl:template match="text()">
<xsl:param name="italic"/>
<xsl:param name="bold"/>
<xsl:if test="normalize-space(.)">
<text>
<xsl:if test="$bold">
<xsl:attribute name="bold">true</xsl:attribute>
</xsl:if>
<xsl:if test="$italic">
<xsl:attribute name="italic">true</xsl:attribute>
</xsl:if>
<xsl:value-of select="normalize-space(.)"/>
</text>
</xsl:if>
</xsl:template>

<xsl:template match="bold">
<xsl:param name="italic"/>
<xsl:apply-templates select="text()|node()">
<xsl:with-param name="italic" select="$italic"/>
<xsl:with-param name="bold" select="1"/>
</xsl:apply-templates>
</xsl:template>

<xsl:template match="italic">
<xsl:param name="bold"/>
<xsl:apply-templates select="text()|node()">
<xsl:with-param name="italic" select="1"/>
<xsl:with-param name="bold" select="$bold"/>
</xsl:apply-templates>
</xsl:template>

</xsl:stylesheet>


with this XML


<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
<italic>this is italic
<bold>this is italic-bold</bold>
</italic>
</text>


gives this output
<?xml version="1.0"?>
<doc>
<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>
<text italic="true">this is italic</text>
<text bold="true" italic="true">this is italic-bold</text>
</doc>


Hope you enjoy it.

Ben
 
A

Andy Fish

Ben Edgington said:
Whilst watching television with my two-year-old this morning I
realised that, of course, recursion is the proper way to maintain
state-information in XSLT. (The challenge presented by the Fimbles is
limited, you understand.)

This solution combines the best of Martin's and mine: it maintains
state information without recalculating it from scratch every time,
and its size is only linear in the number of options considered.

ah of course - I'd forgotten about apply-templates...with-param

It's very interesting how different these three approaches are. I had
already thought about something like Martin's idea but I figured it would
get out of hand so I didn't take it all the way to completion. Your original
was the lateral thinking solution I was hoping someone would come up with.
But this last one is the one I'm kicking myself for not thinking of - the
one I didn't think existed.

Andy
 
B

Ben Edgington

Andy Fish said:
ah of course - I'd forgotten about apply-templates...with-param

It's very interesting how different these three approaches are. I had
already thought about something like Martin's idea but I figured it would
get out of hand so I didn't take it all the way to completion. Your original
was the lateral thinking solution I was hoping someone would come up with.
But this last one is the one I'm kicking myself for not thinking of - the
one I didn't think existed.

Exactly - I had a feeling when I submitted the original version that
there was a better way, which is why it kept bugging me. It's an
example of one of those things which are *obvious* when you spend a
couple of days thinking about them 8^)

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top