Root element specified by DTD ?

J

Joe Kesselman

Jukka said:
In future, please quote or paraphrase the message that you are
commenting on.

I usually do. Apologies.
It depends on. There's no law that requires additional rules

Granted. It's rare that there aren't any, in my experience, unless the
document type is pure structure.

What's "higher-level" here?

Higher than the basic XML syntax.
Anyway, in the issue discussed in this
thread, it is the additional _syntactic_ constraints that imply that a
certain kind of document is not an HTML document.

That's what I was agreeing with, though apparently I may have phrased it
badly. The DTD is not always a completely constrained specification of
"a kind of document". That flexibility may in fact have been deliberate;
I strongly suspect the intent was that a single DTD could describe
several documents which share related structures.
Whether HTML specifications make such a
requirement is debatable; the prose in the specs is a mixture of
normative-looking prose, comments, hints, wishful thinking, etc.)

http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.1

The complicating factor here is the use of the word "should". The HTML4
spec predates the W3C's adoption of the normative use of MAY, SHOULD,
and MUST to mean "optional", "don't violate this without extremely good
reason", and "required by the spec" respectively. So we need to
crosscheck that.

XHTML 1.0 does follow that convention, so we can backhandedly check the
intent by looking at that spec. There, a Strictly Conforming Document
must (!) have html as its root element, and this is *NOT* flagged as one
of the differences from HTML4 either in this spec or in the
compatability guidelines (http://www.w3.org/TR/xhtml1/#guidelines). This
strongly suggests that the W3C intended that HTML4 docs follow this rule.

I agree, that's a less than ideal way to answer this question, but I can
tell you that even folks working on the W3C's specs often have to resort
to that kind of pointer chasing to nail things down.

If you need a fully official answer... I haven't checked; are any of us
members of the (X)HTML Working Group? If not, I'd suggest dropping a
quick note to (e-mail address removed) and suggesting that it might be good to
have an erratum which clarifies whether this "should" was intended to be
"must" or not. (I checked; there isn't one.)
 
J

Jukka K. Korpela

VK said:
You have no choice but claim it as "HTML document".

Surely there's the option of being silent? And, in fact, saying that it is
not an HTML document.
It is served from
the served with "Content-Type: text/html",

So what? Serving it as image/gif would not make it a GIF image. The Internet
media type would be incorrectly declared. A Content-Type declaration does
not magically _make_ the data conform to the specification of a specific
media type.
for local files it is
served as the same type by association .html,.htm... --> text/html.

That's a rule that you just made up. Besides, nobody said the filename
suffix is .html or .htm. For all that you can know, it can be .gif or .foo.
So before any DTD you /have/ to explicetly declare what document you
are serving

Nope. Nobody forces you to serve a document on the Internet, or using HTTP
in particular.
 
J

Jukka K. Korpela

Joe Kesselman said:
http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.1

The complicating factor here is the use of the word "should".

I don't see any "should" in the statement "An HTML 4 document is composed of
three parts:...", which explicitly mentions the <head> element, which by the
DTD must be nonempty. (The <head> and </head> tags are omissible, but the
<title> element is not.)

Besides, reading a bit further, under 7.2 you find
"HTML 4.01 specifies three DTDs, so authors must include one of the
following document type declarations in their documents."

Regarding the more abstract and more vague question what is an "HTML
document" in general, surely any reasonable definition would require
syntactic conformance to _some_ published specification (though not
necessarily one that uses a DTD, for example). The issue was a document that
contains a DOCTYPE declaration referring to an HTML 4.01 DTD, so what HTML
specification could it possibly comply with?
If you need a fully official answer... I haven't checked; are any of
us members of the (X)HTML Working Group? If not, I'd suggest dropping
a quick note to (e-mail address removed) and suggesting that it might be good
to have an erratum which clarifies whether this "should" was intended
to be "must" or not. (I checked; there isn't one.)

They are clearly not interested in doing such things. Look at the errata:
http://www.w3.org/MarkUp/html4-updates/errata
(The absence of any additions since May 2001 does not mean that no errors
have been reported.)
HTML 4.01 is closed for all practical purposes, with all the flaws,
ambiguities, and vagueness.
 
P

Peter Flynn

Jukka said:
No, it is a valid SGML document, but it is not an HTML document, as
defined in HTML specifications.

Yes, if you need to reference the HTML Spec in addition to the DTD.
That would indicate the validity, but the HTML 4.01 specification
requires that one of three specific DOCTYPE declarations be used - not
just that one of three DTDs be used.

That's why it is unenforceable by a standard parser. Only browsers
implement this requirement, and they are not conforming SGML
applications.
And this isn't one of them.
Moreover, the specification explicitly says:
"After document type declaration, the remainder of an HTML document is
contained by the HTML element."
http://www.w3.org/TR/REC-html40/struct/global.html#h-7.3

I'm not clear why you were asking this question if you already knew
the answer.

///Peter
 
J

Jukka K. Korpela

Peter Flynn said:
Yes, if you need to reference the HTML Spec in addition to the DTD.

I'm not sure I see what you are saying "Yes" to and what the if statement
relates to. Surely what is or is not an HTML document is to be defined in
HTML specifications, not in a DTD.
That's why it is unenforceable by a standard parser.

Yes, but the question was not whether something can be enforced.
Only browsers
implement this requirement, and they are not conforming SGML
applications.

They surely aren't, but they don't implement the requirement. They simply
started using the presence and exact form of a DOCTYPE declaration to decide
on the "quirks" vs. "standard" mode. They don't reject a document on the
grounds that it lacks a correct DOCTYPE; they simply process it differently.
(OK, you might say that "quirks" mode intentionally deviates from the
standards, but this is really just a difference in degree - the "standards"
mode isn't standard-conforming either. Besides, "quirks" mode largely means
intentionally broken CSS implementation rather than intentionally broken
HTML implementation.)
I'm not clear why you were asking this question if you already knew
the answer.

I wasn't. It wasn't me who asked the original question. I'm just commenting
on the answers.
 
L

Lachlan Hunt

Chris said:
It's valid, but is it a valid *HTML* document?

There is a difference between validity and conformance. It is a valid
document, though it is not a conforming HTML document.
 
J

Joe Kesselman

Jukka said:
HTML 4.01 is closed for all practical purposes, with all the flaws,
ambiguities, and vagueness.

Granted; new effort is going into XHTML.. But in my experience, that
doesn't mean you can't get answers about HTML if you ask intelligent
questions.

I don't care enough to pursue it further. If you do, either try to get
an official ruling or live with ambiguity.
 
A

Andy Dingley

Henri said:
Since you haven't learning invested in DTDs, unless you have a
non-negotiable requirement to use them, I suggest learning RELAX NG
Compact Syntax instead:
http://relaxng.org/compact-tutorial-20030326.html

Thanks to everyone for their contributions to this useful thread.

As to Relax, then I've been using that for a couple of years now and
found it an excellent format for human-readable definitions. However
most of my actual work is with Schema, simply because it's the
data-typing layer I use with some OWL work (although Relax is making
inroads there).

This particular job needs to be built around DTDs though, something
which so far I've managed to avoid bothering with.
 
A

Andy Dingley

Joe said:
Tidy isn't a validatator. It's a tool for repairing broken documents.

Agreed. But it's already on my desktop and nsgmls isn't
(or at least is refusing to install and work right thus far)


Anyone care to comment on what Tidy thinks this document _is_ ?

Now I think we can agree that "<!doctype...><div><p>Foo</p></div>" is
probably a valid HTML fragment, but that it's not correct to serve such
things over the web.

Now what's Tidy trying to interpet it as? As far as I can judge, Tidy
think this is _also_ a valid HTML document, albeit one that needs a lot
of implicit content adding to <head> beforehand. Is this at all
justifiable, or is Tidy completely out to lunch here?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,260
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top