obscure problem using elementtree to make xhtml website

Lee · Sep 3, 2009

Elementtree (python xml parser) will transform markup like

<tag boo="baa"></tag>

into

<tag boo="baa" />

which is a reasonable thing to do for xml (called minimization, I
think).

But this caused an obscure problem when I used it to create the xhtml
parts of my website,
causing Internet Explorer to display nearly blank pages. I explain the
details at

http://lee-phillips.org/scripttag/

and am writing here as a heads-up to anyone who might be using a
workflow similar to mine: writing documents in xml and using python
and elementtree to transform those into xhtml webpages, and using the
standard kludge of serving them as text/html to IE, to get around the
latter's inability to handle xml. I can't be the only one (and I doubt
this problem is confined to elementtree).

Lee Phillips

David Smith · Sep 3, 2009

Lee said:
Elementtree (python xml parser) will transform markup like

<tag boo="baa"></tag>

into

<tag boo="baa" />

which is a reasonable thing to do for xml (called minimization, I
think).

But this caused an obscure problem when I used it to create the xhtml
parts of my website,
causing Internet Explorer to display nearly blank pages. I explain the
details at

http://lee-phillips.org/scripttag/

and am writing here as a heads-up to anyone who might be using a
workflow similar to mine: writing documents in xml and using python
and elementtree to transform those into xhtml webpages, and using the
standard kludge of serving them as text/html to IE, to get around the
latter's inability to handle xml. I can't be the only one (and I doubt
this problem is confined to elementtree).

Lee Phillips

It's not just Elementtree that does this .. I've seen others libraries
(admittedly in other languages I won't mention here) transform empty
tags to the self-terminating form. A whitespace text node or comment
node in between *should* prevent that from happening. AFAIK, the only
tag in IE xhtml that really doesn't like to be reduced like that is the
<script > tag. Firefox seems to be fine w/ self-terminating <script />
tags. At any rate, I tend to put a comment node in between the begin
and end to prevent the reduction:

<script src=" ... " type="text/javascript"></script>

--David

Lee · Sep 3, 2009

I went with a space, but a comment is a better idea.

I only mention the <script> tag in my article, for brevity, but I had
the same problem with the <object> tag: basically any tag that can
have content in html you had better close the html way (<tag></tag>),
or IE will see it as unclosed and will not display the rest of the
page after the tag (or do something else unexpected). Not a bug in IE
(this time), which is correctly parsing the file as html.

Lee

Lee · Sep 3, 2009

I went with a space, but a comment is a better idea.

I only mention the <script> tag in my article, for brevity, but I had
the same problem with the <object> tag: basically any tag that can
have content in html you had better close the html way (<tag></tag>),
or IE will see it as unclosed and will not display the rest of the
page after the tag (or do something else unexpected). Not a bug in IE
(this time), which is correctly parsing the file as html.

Lee

Stefan Behnel · Sep 3, 2009

Lee said:
basically any tag that can
have content in html you had better close the html way (<tag></tag>),
or IE will see it as unclosed and will not display the rest of the
page after the tag (or do something else unexpected). Not a bug in IE
(this time), which is correctly parsing the file as html.

.... which is obviously not the correct thing to do when it's XHTML.

Stefan

Rami Chowdhury · Sep 3, 2009

basically any tag that can

... which is obviously not the correct thing to do when it's XHTML.

Not correct, of course, but AFAIK it's a very common hack indeed.

If the goal is to produce XHTML that will work as text/html, have you
considered using one of the myriad templating libraries? IIRC a lot (if
not most) of them support "HTMLish" output for precisely that reason.

Richard Brodie · Sep 4, 2009

... which is obviously not the correct thing to do when it's XHTML.

It isn't though; it's HTML with a XHTML DOCTYPE, and the
compatibility rules in Appendix C of the XHTML recommendation apply.
http://www.w3.org/TR/xhtml1/#C_3

Stefan Behnel · Sep 4, 2009

Richard said:
It isn't though; it's HTML with a XHTML DOCTYPE

Not the page I look at (i.e. the link provided by the OP). It clearly has
an XHTML namespace, so it's X(HT)ML, not HTML.

Stefan

Nobody · Sep 4, 2009

Not the page I look at (i.e. the link provided by the OP). It clearly has
an XHTML namespace, so it's X(HT)ML, not HTML.

It depends upon your User-Agent header.

By default, it returns a Content-Type of application/xhtml+xml, so it
should be parsed as XML, i.e. <script /> should be treated as
<script></script>.

But if the User-Agent header indicates MSIE, it returns a Content-Type of
text/html, which should be parsed as HTML, where <script /> won't work.

XHTML can be either HTML or XML, and it makes a difference as to whether
you parse it as HTML or XML. If you want to create a document which parses
the same way in either case, you must adhere to the compatibility
rules in Appendix C of the XHTML standard, which means (amongst other
things) not minimising tags which can have content (i.e. not EMPTY),
regardless of whether or not they do have content.

Parsing XML with ElementTree (unicode problem?)	13	Jul 23, 2007
transforming xhtml to html (resolving namespace dependencies)	9	Jan 30, 2011
Using XSLT to extend XHTML	1	Oct 8, 2004
HELP: Using Xerces DOM to parse XHTML	0	Apr 4, 2004
XHTML to PDF	0	Jul 24, 2003
XslTransform.transform does not generate xhtml	1	Jul 31, 2003
Problem with styles converting from HTML to XHTML	18	Dec 13, 2004
How-To: Preserve embedded XHTML tags during XSLT tranform?	4	Sep 23, 2004

obscure problem using elementtree to make xhtml website

Lee

David Smith

Lee

Lee

Stefan Behnel

Rami Chowdhury

Richard Brodie

Stefan Behnel

Nobody

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads