S
Simon Brooke
This is supposed to be a very simple XSL stylesheet to strip styling
information out of HTML documents - it could not be more basic. And yet,
it doesn't work. I'm obviously getting something very basic wrong and for
the life of me I can't see it. Please, somebody, cast your eyes over this
and tell me what's wrong!
First, the XSL stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- strip out styling -->
<xslutput indent="yes" method="xml"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
<xsl:template match="style">
<xsl:comment>
Don't carry style through
</xsl:comment>
</xsl:template>
<xsl:template match="font">
<span>
<xsl:apply-templates/>
</span>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:for-each select="@*">
<xsl:choose>
<xsl:when test="name()='style'">
<!-- nothing -->
</xsl:when>
<xsl:when test="name()='STYLE'">
<!-- nothing -->
</xsl:when>
<xsltherwise>
<xsl:attribute name="{name()}"><xsl:value-of
select="."/></xsl:attribute>
</xsltherwise>
</xsl:choose>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Now, an example HTML document to test it:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head lang="en" dir="ltr">
<title>Test document for destyler</title>
<style type="text/css">
BODY
{
font-family: cursive;
}
</style>
</head>
<body>
<h1>Test document for destyler</h1>
<p style="font-family: serif">
Test with 'style='
</p>
<p STYLE="font-family: serif">
Test with 'STYLE='
</p>
</body>
</html>
And the output after processing:
-[simon]-> xsltproc destyle.xsl test.html
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head lang="en" dir="ltr" xml:lang="en"><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" />
<title>Test document for destyler</title>
<style type="text/css" xml:space="preserve">
BODY
{
font-family: cursive;
}
</style>
</head>
<body>
<h1>Test document for destyler</h1>
<p>
Test with 'style='
</p>
<p>
Test with 'STYLE='
</p>
</body>
</html>
As you can see, the 'style' attributes are getting successfully stripped.
But the 'style' element is not stripped. The template
<xsl:template match="style">
<xsl:comment>
Don't carry style through
</xsl:comment>
</xsl:template>
simply never matches. I cannot see why not; my understanding of the
conflict resolution rule was that the most specific matching template
should be applied. I've tried this with two completely different XSL-T
implementations, the Gnome libxslt and Apache Xalan, and they behave
consistently with one another (as one would hope).
So what have I missed? What am I doing wrong?
information out of HTML documents - it could not be more basic. And yet,
it doesn't work. I'm obviously getting something very basic wrong and for
the life of me I can't see it. Please, somebody, cast your eyes over this
and tell me what's wrong!
First, the XSL stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- strip out styling -->
<xslutput indent="yes" method="xml"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
<xsl:template match="style">
<xsl:comment>
Don't carry style through
</xsl:comment>
</xsl:template>
<xsl:template match="font">
<span>
<xsl:apply-templates/>
</span>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:for-each select="@*">
<xsl:choose>
<xsl:when test="name()='style'">
<!-- nothing -->
</xsl:when>
<xsl:when test="name()='STYLE'">
<!-- nothing -->
</xsl:when>
<xsltherwise>
<xsl:attribute name="{name()}"><xsl:value-of
select="."/></xsl:attribute>
</xsltherwise>
</xsl:choose>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Now, an example HTML document to test it:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head lang="en" dir="ltr">
<title>Test document for destyler</title>
<style type="text/css">
BODY
{
font-family: cursive;
}
</style>
</head>
<body>
<h1>Test document for destyler</h1>
<p style="font-family: serif">
Test with 'style='
</p>
<p STYLE="font-family: serif">
Test with 'STYLE='
</p>
</body>
</html>
And the output after processing:
-[simon]-> xsltproc destyle.xsl test.html
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head lang="en" dir="ltr" xml:lang="en"><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" />
<title>Test document for destyler</title>
<style type="text/css" xml:space="preserve">
BODY
{
font-family: cursive;
}
</style>
</head>
<body>
<h1>Test document for destyler</h1>
<p>
Test with 'style='
</p>
<p>
Test with 'STYLE='
</p>
</body>
</html>
As you can see, the 'style' attributes are getting successfully stripped.
But the 'style' element is not stripped. The template
<xsl:template match="style">
<xsl:comment>
Don't carry style through
</xsl:comment>
</xsl:template>
simply never matches. I cannot see why not; my understanding of the
conflict resolution rule was that the most specific matching template
should be applied. I've tried this with two completely different XSL-T
implementations, the Gnome libxslt and Apache Xalan, and they behave
consistently with one another (as one would hope).
So what have I missed? What am I doing wrong?