mike said:
(2) I don't know precisely how a HTML document is translated into
a XHTML file by your words.
*groan* That's swhat I get for trying to be subtle.
Do I mistake your sayings?
Yes. I wrote:
That is no problem at all: translate every input document to this:
...........................................
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="
http://www.w3.org/1999/xhtml"" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
<p>This used to be a HTML document, now it's valid XHTML!</p>
</body>
</html>
............................................
And I really meant "translate every input document to EXACTLY THE
TEXT BETWEEN THE DOTTED LINES. That fits your requirements because
you have only specified that the output has to be valid XHTML, not
that it must have anything whatsoever to do with the input
document. That's exactly why I (and others) told you you need to
rethink your requirements.
And that's not just snide hairsplitting. Presumably you want the
output document to be rendered exactly as the input document would
be. But that is practically impossible when the input is invalid
HTML (which many, if not most HTML pages found on the WWW are),
because rendering that involves guesswork by the browser, and that
guesswork differs a lot between browsers and how it is done exactly
is not known, at least in the case of the most popular browser,
Internet Explorer.
This is exactly why jtidy will not translate some HTML pages, as you
have noticed.