Are XML-style "/>" tags valid in 4.01 Transitional? I get weird answers from validators.

H

Hostile17

Consider the following HTML.

----------

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<title>Untitled</title>
<link rel="Stylesheet" href="mystylesheet.css" />
</head><body>
<img src="myimage.gif" alt="my image" width="100" height="100" />
</body>
</html>

----------

There's a guy a work who insists on coding like this. Or rather,
*some* of his images have a trailing slash, not others, and his link
tags, as above, have them, but not his meta-tags.

I don't know where he picked up the habit, but he says it's valid 4.01
Transitional, and it's also "best practice".

If it is, why aren't all his single tags like this, but let's move on.

In an attempt to get an official answer on this, I validate the above
at the W3C.

I get this result:

----------

Line 7, column 6: end tag for element "HEAD" which is not open
(explain...).

Line 8, column 5: document type does not allow element "BODY" here
(explain...).

----------

If I validate it with BBEdit's built-in Check Syntax, it give me this:

----------

File "xmlstyle.html"; Line 7: Document type doesn't permit empty XML
element; "<link/>".
File "xmlstyle.html"; Line 10: Document type doesn't permit empty XML
element; "<img/>".

----------

When I show these results to the coder, he says the W3C is complaining
about "HEAD and BODY tags on the same line" which is laughable, but
he's got a point when he says "BBEdit says the tags are empty. They're
not empty."

I could go on trying other validators, but I'm not happy with the
results of these.

Validated as 4.01 Strict, by the way, these tags are definitely
errors. "Character Data Is Not Allowed Here" with an arrow pointing to
the end of the tag.

What I want is someone to give me a definitive response, backed up by
a link to a reputable website, where it gives an answer either way. I
think I'm right but I can't cite anything.

To me coding this way is something like the HTML equivalent of wearing
a baseball cap backwards...
 
R

rf

Hostile17 said:
Consider the following HTML.

----------

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<title>Untitled</title>
<link rel="Stylesheet" href="mystylesheet.css" />
</head><body>
<img src="myimage.gif" alt="my image" width="100" height="100" />
</body>
</html>

----------

There's a guy a work who insists on coding like this. Or rather,
*some* of his images have a trailing slash, not others, and his link
tags, as above, have them, but not his meta-tags.

A / at the end of a tag is not valid with any 4.01 DTD. Your guy is wrong.

Go to the spec - http://www.w3.org/TR/html4/ - and have a look.

The fact that it appears to "work" is that the browsers error correction is
kicking in and throwing away the invalid '/' attribute.
I don't know where he picked up the habit, but he says it's valid 4.01
Transitional, and it's also "best practice".

No it is not. It is an XHTML rule that empty elements be closed, most easily
by including the closing, the /, in the opening tag. This is actually a
requirement of XML. It has nothing to do with HTML.
If it is, why aren't all his single tags like this, but let's move on.

Because he is only wrong some of the time :)
In an attempt to get an official answer on this, I validate the above
at the W3C.

I get this result:

----------

Line 7, column 6: end tag for element "HEAD" which is not open
(explain...).

Line 8, column 5: document type does not allow element "BODY" here
(explain...).

The validator's error recovery is not quite as good as the browsers. It is
misinterpreting /> as something else and getting screwed up a bit further
down.

Probably it is interpreting something in the head as body text. This is
quite permissable, a UA should implicitly close the head element and open a
body element. You get exactly the same effect if you use something like
<p>text</p> inside your head element.

So, when the validator gets to the </head> tag it raises an error. The head
element has already been closed. When it gets to the <body> tag it raises an
error. The body element has already been opened. You can not have nested
body elements.
----------

If I validate it with BBEdit's built-in Check Syntax, it give me this:

----------

File "xmlstyle.html"; Line 7: Document type doesn't permit empty XML
element; "<link/>".
File "xmlstyle.html"; Line 10: Document type doesn't permit empty XML
element; "<img/>".

Very true. See above.
but
he's got a point when he says "BBEdit says the tags are empty. They're
not empty."

Look again. It does not say empty tag, it says empty element. The image and
link elements are indeed empty elements, they have no content like that
title element up there does. All the goodies with a link or image element
happen in the opening tag.

With HTML empty elements do not have a closing tag. So, if you attempt to
close the element in the opening tag BBEdit, correctly, gets upset. It would
also get upset if you said something like <img...>description of
image said:
I could go on trying other validators, but I'm not happy with the
results of these.

Don't bother. Use the specs.
Validated as 4.01 Strict, by the way, these tags are definitely
errors. "Character Data Is Not Allowed Here" with an arrow pointing to
the end of the tag.

Correct. Transitional allows invalid attributes (the /). Strict does not.
You want a new attribute? Add it to the DTD.
What I want is someone to give me a definitive response, backed up by
a link to a reputable website, where it gives an answer either way. I
think I'm right but I can't cite anything.

As I said, go to the spece above. Also look up the XHTML spec. There is a
good description of what is new/different with XHTML and the above is
specifically mentioned.

Cheers
Richard.
 
N

Nick Theodorakis

Consider the following HTML.

----------

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<title>Untitled</title>
<link rel="Stylesheet" href="mystylesheet.css" />
</head><body>
<img src="myimage.gif" alt="my image" width="100" height="100" />
</body>
</html>

----------

There's a guy a work who insists on coding like this. [...]

What I want is someone to give me a definitive response, backed up by
a link to a reputable website, where it gives an answer either way. I
think I'm right but I can't cite anything.

Jukka has a nice article that you maay be interested in:

<http://www.cs.tut.fi/~jkorpela/html/empty.html>

Nick
 
T

Toby A Inkster

Hostile17 said:
I don't know where he picked up the habit, but he says it's valid 4.01
Transitional, and it's also "best practice".

Technically it is sometimes valid HTML 4.01 Transitional, although it
doesn't neccessarily mean what he thinks it means.

For example "<hr />" in HTML 4.01 Transitional *technically* means a
horizontal line followed by ">".

This is thus valid within the body of a document. However, within the head
it is not, as you would have a ">" in the head of the document, and you
aren't allowed to to start putting characters there -- only metas, links,
styles, scripts, titles, etc.

While technically valid, it is certainly not best practice in HTML.
 
N

Nick Kew

one of infinite monkeys said:
(chop broadly accurate reply)
A / at the end of a tag is not valid with any 4.01 DTD. Your guy is wrong.

Technically that's not entirely right, for reasons below. For practical
purposes you are right.
The validator's error recovery is not quite as good as the browsers. It is
misinterpreting /> as something else and getting screwed up a bit further
down.

Nope. The "/>" isn't an error under strict SGML rules, although the second
character (">") may be. If it happens in the HEAD then an artifact of
the HTML/Legacy DTD causes it to close the HEAD and open the BODY.
This kind of ambiguity is just one of many reasons to use HTML strict.
Probably it is interpreting something in the head as body text. This is
quite permissable, a UA should implicitly close the head element and open a
body element. You get exactly the same effect if you use something like
<p>text</p> inside your head element.

It's not just permissible; it's required (though under HTML strict this
is not the case - your <p> would indeed terminate <HEAD> but the "/>"
is just an error).

To see what's going on, use Page Valet and select "visual" mode:
it will display your HTML normalised.

To get more useful error messages, select a more helpful parse mode.
The default selections in either Page Valet or the WDG Validator
will do this.
Don't bother. Use the specs.

Careful! That's more complex than you realise.

Yes, but
Transitional allows invalid attributes (the /). Strict does not.
You want a new attribute? Add it to the DTD.

Totally wrong. Transitional doesn't allow invalid attributes, and /
is not an attribute - it's an abbreviated way to close the tag.

Some of us regard this as a bug in the HTML spec. But it's the
validator's job to implement the spec, warts and all. That's why
other validators (Page Valet and WDG Validator) offer users the
choice of parse modes (WDG calls it "warnings").

http;//valet.webthing.com/page/parsemode.html
 
J

Jim Dabell

Hostile17 said:
Consider the following HTML.

----------

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<title>Untitled</title>
<link rel="Stylesheet" href="mystylesheet.css" />
</head><body>
<img src="myimage.gif" alt="my image" width="100" height="100" />
</body>
</html>

----------

There's a guy a work who insists on coding like this. Or rather,
*some* of his images have a trailing slash, not others, and his link
tags, as above, have them, but not his meta-tags.

It sounds like cargo-cult behaviour. Can he explain why he does it?

I don't know where he picked up the habit, but he says it's valid 4.01
Transitional, and it's also "best practice".

The example you gave is invalid, but the practice of XML-style empty
elements _may_ result in a valid document, although only by coincidence,
and it will not mean what he thinks it means. If he's concerned with "best
practice", then he should be far more worried about using the Transitional
document type.

The real problem, though, is that he's shifted the burden of proof onto you,
when it's _him_ that needs to justify his position (as you can clearly
demonstrate error messages, even if you don't understand them).


[snip]
In an attempt to get an official answer on this, I validate the above
at the W3C.

I get this result: [snip errors]
If I validate it with BBEdit's built-in Check Syntax, it give me this: [snip]
When I show these results to the coder, he says the W3C is complaining
about "HEAD and BODY tags on the same line" which is laughable,

He has completely misunderstood the error message. Apart from anything
else, it's trivial to prove him wrong by simply putting them on separate
lines.

but he's got a point when he says "BBEdit says the tags are empty. They're
not empty."

BBEdit says that it doesn't permit empty XML elements. You are mixing up
tags and elements. <img> elements, for example, are always empty, although
the tags are not. Empty elements have no content, although they usually
have attributes.

I could go on trying other validators, but I'm not happy with the
results of these.

They are working properly. Let's walk through the code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">

This is an HTML 4.01 Transitional document.

<html>



<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">

Opens and closes a <meta> element, which is within the <head> element. This
is an empty element.

<title>Untitled</title>

Opens the <title> element, within the <head> element, which contains the
character data 'Untitled', and closes it.

<link rel="Stylesheet" href="mystylesheet.css" />

Opens a <link> element. This is where the problem lies. An obscure part of
HTML allows authors to use shorthand syntax to write elements. For
example, the following are equivalent:

<em>...</em>
<em/.../

No browser (that I know of) implements this part of HTML, only the
validator, which is why you haven't heard of it. So, going back to the
code:
<link rel="Stylesheet" href="mystylesheet.css" />

This is actually equivalent to:

<link rel="Stylesheet" href="mystylesheet.css" >/

Now, <link> elements are always empty. So, upon encountering the solidus
(slash), the parser jumps back out one layer, to the <head> element. The
<head> element cannot contain character data, so the parser jumps out
another level (the closing tag for the <head> element is optional), to the
<html> element. The <html> element cannot contain character data either,
_but_, another little-known corner of HTML states that the <body> element
can be implied; that is to say you don't need to explicitly start it with
an opening tag. So, going back to the code (again):
<link rel="Stylesheet" href="mystylesheet.css" />

This opens a <link> element, which is empty, closes the <head> element and
</head><body>

Obviously, at this point, there is no <head> element to close, and you can
only have one <body> element per document, which is already open at this
point. This is the error.


[snip]
Validated as 4.01 Strict, by the way, these tags are definitely
errors. "Character Data Is Not Allowed Here" with an arrow pointing to
the end of the tag.

This is actually a different error. HTML 4.01 Strict documents do not allow
character data within <body> elements, they need to be contained within
another element, such as a <p> element. The following is not valid HTML
4.01 Strict:

....
<body>
Britney

The following is valid HTML 4.01 Strict:

....
<body>
<p>
Britney

What I want is someone to give me a definitive response, backed up by
a link to a reputable website, where it gives an answer either way.
[snip]

The only definitive source is the specification, unfortunately it isn't
trivial to point to a single part to say "this is wrong".

<URL:http://www.w3.org/TR/html401/>

However, if you can make do with a less definitive, but pretty clear
statement from the validator guys, have a look at the validator FAQ:

<URL:http://validator.w3.org/docs/help.html#faq-linkandmeta>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,007
Messages
2,570,266
Members
46,865
Latest member
AveryHamme

Latest Threads

Top