XSL and entities

T

Tjerk Wolterink

I've a problem in an xsl transformation.
My xml input:

--- input.xml ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xc:content [
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="news">
<xc:text type="html">
leuk he jazeker ãôé<br/>
</xc:text>
</xc:xcontent>

----

And an xsl file:

-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:eek:utput method="xml" indent="yes"/>

<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="/xc:xcontent">
<page:page type="module">
<p>
<xsl:apply-templates select="xc:text"/>
</p>
</page:page>
</xsl:template>

</xsl:stylesheet>

---



The output here is:

---
<page:page type="module">
<p>
leuk he jazeker<br/>
</p>
</page:page>

---


But i expect this as output


---
<page:page type="module">
<p>
leuk he jazeker ãôé<br/>
</p>
</page:page>
---


How can that be, why are the characters: ãôé gone??
Is there something wrong with my encoding?
Note: i do'nt know if the files are really encoded in ISO-8859-1, but it did work for me.
My editor says the encoding is ISO-8859-1 so i think that is good.. Or did the editor get that
information from the xml prolog?
 
T

Tjerk Wolterink

cut

Well my topic-subject is not really a good choice. there are not entities involved.
 
D

David Carlisle

are they really gone or are you just looking at the file in some program
that doesn't understand the encoding, they appeared to be gone inyour
posted output but that does'nt match what xslt should have done.
That output is also missing a namespace declaration for xhtml, is it
really the output you got from XSLT?

If you want iso-8859-1 output add
<xsl:eek:utput encoding="iso-8859-1"/>
to your stylesheet.

Incidentally despite the fact that you have refered to entities in the
subject line there are no entity references in your input (except the
parameter entity reference %xhtml) if you enter all your characters
directlly as character data there's no need to reference the xhtml dtd
(which might have a very noticable effect on parsing speed, especially
if you really are fetching the dtd off eth w3c site each time)

David
 
M

Martin Honnen

Tjerk said:
I've a problem in an xsl transformation.
My xml input:

--- input.xml ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xc:content [ ^^^^^^^^^^

<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent"

If the DOCTYPE declaration says the root element is xc:content then you
should have that but you have xc:xcontent so one needs to be changed.
xmlns="http://www.w3.org/1999/xhtml" module="news">
<xc:text type="html">
leuk he jazeker ãôé<br/>
</xc:text>
</xc:xcontent>

----

And an xsl file:

-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:eek:utput method="xml" indent="yes"/>

What output encoding do you want?
<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>

Could be done easier and more efficient:

<xsl:template match="xhtml:*">
<xsl:copy>
<xsl:copy-of select="@* " />
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>

where the prefix xhtml is bound to the namespace URI for XHTML earlier
in the document.

The output here is:

---
<page:page type="module">
<p>
leuk he jazeker<br/>
</p>
</page:page>

---


But i expect this as output


---
<page:page type="module">
<p>
leuk he jazeker ãôé<br/>
</p>
</page:page>
---

What XSLT processor are you using, how exactly do you run the
transformation?
 
T

Tjerk Wolterink

Martin said:
Tjerk said:
I've a problem in an xsl transformation.
My xml input:

--- input.xml ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xc:content [
^^^^^^^^^^

<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent"


If the DOCTYPE declaration says the root element is xc:content then you
should have that but you have xc:xcontent so one needs to be changed.

Your right, typing error. The xml-reader does not complain, therefore i did not notice this error.
xmlns="http://www.w3.org/1999/xhtml" module="news">
<xc:text type="html">
leuk he jazeker ãôé<br/>
</xc:text>
</xc:xcontent>

----

And an xsl file:

-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:eek:utput method="xml" indent="yes"/>


What output encoding do you want?
<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>


Could be done easier and more efficient:

<xsl:template match="xhtml:*">
<xsl:copy>
<xsl:copy-of select="@* " />
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>

where the prefix xhtml is bound to the namespace URI for XHTML earlier
in the document.

that is a solution, but they both work.
What XSLT processor are you using, how exactly do you run the
transformation?

I'm using sablatron for xsl transformations.
But i think the problem is more complex than i thought.
 
T

Tjerk Wolterink

[cut]

Well,

The example i gave you was a bad one.
The problem i have do not occur in my examples.

Here an example where the problem does occur:

I have an xml document:
---
<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE xc:xcontent [
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="geschiedenis">
<xc:text type="html">
<p>Caf&eacute; de Kletskop is gevestigd in een oud lichtenvoords pander,
</p> </xc:text>
</xc:xcontent>
---


And when i put this together with this xsl document:


-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:eek:utput method="xml" indent="yes"/>

<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="/xc:xcontent">
<page:page type="module">
<xsl:apply-templates select="xc:text"/>
</page:page>
</xsl:template>

</xsl:stylesheet>
---


Then the output will be:

--
<page:page type="module">
<p>
Caf de Kletskop is gevestigd in een oud lichtenvoords pander,
</p>
</page:page>
--


My &eacute; in the xml is gone in the transformation output.

Sorry that i gave a bad example, now the problem should be clear.

How do you solve my problem?
 
D

David Carlisle

a non validating parser is allowed by the XML recommendation to _not_
fetch external DTD files and just report entity references as undefined.

howevr the Xpath model does not support undefined entities so in this
case I would expect that you get a parsing error on input that the
entity reference cab bot be resolved. Your system seems to be silently
dropping the entities, which looks like a bug to me.

Can't suggest what you can do other than raise it with maintainers.

David
 
T

Tjerk Wolterink

David said:
a non validating parser is allowed by the XML recommendation to _not_
fetch external DTD files and just report entity references as undefined.

howevr the Xpath model does not support undefined entities so in this
case I would expect that you get a parsing error on input that the
entity reference cab bot be resolved. Your system seems to be silently
dropping the entities, which looks like a bug to me.

Is there no way to match entities in xsl?
What is the default behavior of xsl systems when it comes to entities?
Can't suggest what you can do other than raise it with maintainers.

raise it with maintainers??
You mean to report it as a bug
 
D

David Carlisle

Tjerk Wolterink said:
Is there no way to match entities in xsl?

No, they are expanded by teh xml parser befope XSLT starts , so the
input tree has all entities expanded.
What is the default behavior of xsl systems when it comes to entities?
If the parser expands then they are not there as far as XXSLT is
concerned, if it doesn't it's a fatal error and nothing is transformed.

Try a different XSLT engine?
raise it with maintainers??
You mean to report it as a bug
yes.

David
 
T

Tjerk Wolterink

David said:


I could set the following option of the xslt-parser:

XSLT_SABOPT_PARSE_PUBLIC_ENTITIES = on
Tell the processor to parse public entities. By default this has been turned off.

But now when i do the following xsltransformation:

xml:
--

<?xml version="1.0" encoding="ISO-8859-1"?>
<page:page xmlns:page="http://www.wolterinkwebdesign.com/xml/page">
<page:content>

<page:module
module="agenda"
stylesheet="agenda.xsl">

<page:multiple-settings multiple="agendapunt" max="30" order-by="datum" direction="desc"/>
</page:module>
</page:content>
</page:page>

--


xsl:
--
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>

<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:menu="http://www.wolterinkwebdesign.com/xml/menu"
xmlns:r="http://www.wolterinkwebdesign.com/xml/roles">


[rest does not matter]

</xsl:stylesheet>
--


Now i get the following error:

["msgtype"]=> string(5) "error"
["code"]=> string(1) "2"
["module"]=> string(9) "Sablotron"
["URI"]=> string(49) "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
["line"]=> string(1) "1"
["msg"]=> string(51) "XML parser error 4: not well-formed (invalid token)"


So the dtdt on w3c.org is not valid??
How can i solve this?

What i want is that xhtml entities like &nbsp; are parsed to a number entitie lik &209;
(dont know if 209=nbsp but you know what i mean)

What should i do?
 
D

David Carlisle

So the dtdt on w3c.org is not valid??

I just tested the file you posted with rxp and it reported it as being
well formed.

How can i solve this?

Report it as a bug to the parser maintainers?

You don't need to load the whole xhtml dtd, just the entity definitions,
eg the dtd you quoted uses
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;


so
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
for latin-1 for example. so you might like to try just loading those, or
the versions of entity files you will find at
http://www.w3.org/2003/entities which I personally prefer (being
biased:) instead of loading the xhtml dtd.


But as I said at the beginning, not using a <!DOCTYPE and not using
entity references in your stylesheet really will make your life simpler.

At the very least you ought to make local copies of the files and
reference those. refererencing the w3c site to download the xhtml dtd
every time you do a transformation is going to slow your transformation
down dramatically.

David
 
T

Tjerk Wolterink

David said:
So the dtdt on w3c.org is not valid??

I just tested the file you posted with rxp and it reported it as being
well formed.

How can i solve this?

Report it as a bug to the parser maintainers?

You don't need to load the whole xhtml dtd, just the entity definitions,
eg the dtd you quoted uses
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;


so
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
for latin-1 for example. so you might like to try just loading those, or
the versions of entity files you will find at
http://www.w3.org/2003/entities which I personally prefer (being
biased:) instead of loading the xhtml dtd.


But as I said at the beginning, not using a <!DOCTYPE and not using
entity references in your stylesheet really will make your life simpler.

At the very least you ought to make local copies of the files and
reference those. refererencing the w3c site to download the xhtml dtd
every time you do a transformation is going to slow your transformation
down dramatically.

David


i think i solved the problem. The xsl-engine was not able to load dtd's from other servers.

David,
thanks for your help!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top