encoding of script tags in html

A

Andy Fish

hi,

this is more an html parsing question than an XML question but I think it's
the kind of thing that folks in an XML newsgroup would be more likely to
help with, so please excuse me if it's a little off topic. please be aware
that I am primarily talking about HTML rather than XHTML but I would also
like to understand how XHTML works for when I prepare to convert the app to
XHTML.

I have recently discovered that this:

<script>var x='</script>';</script>

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script. initially my reaction was
that this is not a surprise because I had failed to HTML encode the script
contents, so my second attempt was this:

<script>var x='&lt;/script&gt;';</script>

however this it DOES NOT WORK - the variable ends up containing the text
"&lt;/script&gt;"

can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.

interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html and seems even to be valid XHTML even though it is not valid
XML

Andy
 
M

Martin Honnen

Andy said:
can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.

See http://www.w3.org/TR/html4/types.html#type-script:
"Please note that script data that is element content may not contain
character references, but script data that is the value of an attribute
may contain them."
and http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data:
"The DTD defines script and style data to be CDATA for both element
content and attribute values. SGML rules do not allow character
references in CDATA element content but do allow them in CDATA attribute
values."
interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html and seems even to be valid XHTML even though it is not valid
XML

That snippet is not well-formed so it can't be valid XML or XHTML as it
is not even XML.
 
A

Andy Fish

thanks to both for the quick replies

wow - what a minefield this has turned out to be !!

i previously had just one server-side utility function to escape a string as
a literal javascript string. now i realise i need to have 2 separate
functions, one for when the javascript literal is to be placed inside an
HTML attribute value (e.g. onclick="...") and a different one for when it is
inside a script block, because one is CDATA and one is PCDATA

Andy
 
P

Peter Flynn

Martin said:
Andy said:
can someone point me at part of the w3c specification that states how
script tags are parsed differently to other tags in HTML. [...]
interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html and seems even to be valid XHTML even though it is not
valid XML

That snippet is not well-formed so it can't be valid XML or XHTML as it
is not even XML.

It is, however, valid HTML (SGML): the < sign is valid unescaped in
CDATA declared content (and would be valid elsewhere, as the digit
following it cannot be taken to be the beginning of an element type
name).
wow - what a minefield this has turned out to be !!

Don't guess ("seems to be..."). Install a standalone validating parser
that handles both SGML and XML (eg onsgmls, part of SP), and copies of
the relevant DTDs; or a schema validator and copies of the schemas,
and test any files you create for validity. A good XML editor will do
this for you anyway.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,817
Latest member
AdalbertoT

Latest Threads

Top