Can HTML be translated to XHTML perfectly?!

M

mike

regards:

Can HTML be translated to XHTML perfectly?!

or there is another similar tool in java ( like api )

can do the work?!...thank you
-----------------------------------------------------------
I use the Jtidy opensources but I find that I cannot translate some
HTML pages.




thank you
 
J

JScoobyCed

mike said:
regards:

Can HTML be translated to XHTML perfectly?!

Can you define your definition of "perfectly", related to your question ?
Maybe a different newsgroup oriented XML/HTML would be more appropriated
(I am not sure if there are any, though ... :) )
 
T

Tor Iver Wilhelmsen

Can HTML be translated to XHTML perfectly?!

Not necessarily. HTML is more lenient than XHTML, and browsers even
more so. For instance, the EMBED element is an abomination unto W3C,
since it doesn't have a restricted set of possible attributes.

The easiest is probably to lowercase all element and attribute names,
and change <empty> elements into <empty/> and hope for the best.
 
M

mike

Can you define your definition of "perfectly",


"perfectly", that is:

"XHTML file"(translated from HTML)can be identified by Nokia Mobile Browser.

Is there any constructive idea?


thank you
 
M

Michael Borgwardt

mike said:
"perfectly", that is:

"XHTML file"(translated from HTML)can be identified by Nokia Mobile Browser.

That is no problem at all: translate every input document to this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
<p>This used to be a HTML document, now it's valid XHTML!</p>
</body>
Is there any constructive idea?

Think some more about your definition of "perfect translation".
 
A

Ann

He doesn't really mean perfect, he means it needs to work
on Nokia Mobile Browser, that's all.
 
J

John C. Bollinger

mike said:
Can HTML be translated to XHTML perfectly?!

_Valid_ HTML can be translated to XHTML without excessive difficulty.
The tool you already have probably does that job quite well. Most of
the HTML you come across on the web is not valid, however, even if you
excuse the absence (in most cases) of a DOCTYPE declaration.
or there is another similar tool in java ( like api )

can do the work?!...thank you

That's not surprising at all. There are some pages that some major
browsers can't translate -- which pages they are depend on which browser
you're talking about. When a browser runs into such a page it generally
makes its best guess, which is the most significant cause of
cross-browser compatibility issues. (Different browsers guess
differently, but page authors have a tendency to depend on the browser
always guessing the same way that their favorite browser does.)


John Bollinger
(e-mail address removed)
 
T

Tim Jowers

Michael Borgwardt said:
That is no problem at all: translate every input document to this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
<p>This used to be a HTML document, now it's valid XHTML!</p>
</body>


Think some more about your definition of "perfect translation".

I remember using the Java html document parser class and then writing
"handlers" for many special cases. Seem to remember having to
subprocess each element block for things like <p> with no </p>, <br>,
et cetera as these are non-conforming. Fortutely, I was only concerned
with the textarea/input fields/buttons and could just pass
applets/objects through as they were. A tool to suck in data fields
from another app and allow the clients to build a site very quickly
based on their business model in the tool. Pretty cool in many ways:
http://www.speedbuildersystems.com ... look for ScreenGen. They make
tools for analysis for insurance industry so the concept was to roll
out basic enrollment validation rules as well as model the ocmplex
rules on the mainframe and the web farm.

It'll be cool once many tools can work on the same HTML files. At that
time, last year, even FrontPage could not HTML to XHTML.
 
M

mike

regards:

thank you for your valuable suggestions:

(1) In my nokia 6600 test,your code is OK as followings.
nokia 6600 can identify the following xhtml code.
------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
<p>This used to be a HTML document, now it's valid XHTML!</p>
</body>
</html>
---------------------------------------------------------------------
(2) I don't know precisely how a HTML document is translated into
a XHTML file by your words.

Now I have a HTML document.The body part of the HTML document
is defined as (Body part of a HTML document).

By your sayings,a XHTML file is as followings:
--------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>

(Body part of a HTML document).

</body>
</html>
--------------------------------------------------------------------
but my nokia 6600 cannot identify the above xhtml file.

Do I mistake your sayings?

or something important I am missing.

any constructive suggestions is welcome.





































best wishes
 
M

Michael Borgwardt

mike said:
(2) I don't know precisely how a HTML document is translated into
a XHTML file by your words.

*groan* That's swhat I get for trying to be subtle.
Do I mistake your sayings?

Yes. I wrote:

That is no problem at all: translate every input document to this:

...........................................
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
<p>This used to be a HTML document, now it's valid XHTML!</p>
</body>
</html>
............................................

And I really meant "translate every input document to EXACTLY THE
TEXT BETWEEN THE DOTTED LINES. That fits your requirements because
you have only specified that the output has to be valid XHTML, not
that it must have anything whatsoever to do with the input
document. That's exactly why I (and others) told you you need to
rethink your requirements.

And that's not just snide hairsplitting. Presumably you want the
output document to be rendered exactly as the input document would
be. But that is practically impossible when the input is invalid
HTML (which many, if not most HTML pages found on the WWW are),
because rendering that involves guesswork by the browser, and that
guesswork differs a lot between browsers and how it is done exactly
is not known, at least in the case of the most popular browser,
Internet Explorer.

This is exactly why jtidy will not translate some HTML pages, as you
have noticed.
 
M

mike

Now I get what you mean,thank you anyway.
What I want is as follows:

a HTML document---> after tidy's translation --->"a XHTML document"

I want the "a XHTML document" to be identified by nokia 6600.


(1)I use the Jtidy api,and find that nokia 6600 mobile browser cannot identify
the "a XHTML document".

(2)but nokia 6600 can identify the normal XHTML file like you post

BETWEEN DOTTED LINES.

...........................................
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
<p>This used to be a HTML document, now it's valid XHTML!</p>
</body>
</html>
............................................

(3)I am curious that:
Besides Jtidy's help,can I produce exact XHTML file identified by nokia 6600,
using other good HTML parser.

some parser like:

http://htmlparser.sourceforge.net/


could someone good give me a constructive suggestion.

best wishes
 
M

Michael Borgwardt

mike said:
Now I get what you mean,thank you anyway.
What I want is as follows:

a HTML document---> after tidy's translation --->"a XHTML document"

I want the "a XHTML document" to be identified by nokia 6600.


(1)I use the Jtidy api,and find that nokia 6600 mobile browser cannot identify
the "a XHTML document".

(2)but nokia 6600 can identify the normal XHTML file like you post

Ah, that is a far clearer problem.

Well, thanks to the tight specification of XHTML, I think there are really
only three things that could be the reason for this behaviour:

- Jtidy has a bug
- The Nokia browser has a bug
- Jtidy's output conforms to or produces a different version of XHTML than the
Nokia browser expects

After a bit of googling, it seems that the Nokia browser actually supports
only XHTMl MP (mobile profile), which is a *subset* of XHTML described here:
http://www.wapforum.org/tech/documents/WAP-277-XHTMLMP-20011029-a.pdf

Unfortunately, that makes it a lot more difficult to fulfill your requirements.
 
A

Ashburton Industries

Im affraid not... Sorry... but it cant...


Michael Borgwardt said:
Ah, that is a far clearer problem.

Well, thanks to the tight specification of XHTML, I think there are really
only three things that could be the reason for this behaviour:

- Jtidy has a bug
- The Nokia browser has a bug
- Jtidy's output conforms to or produces a different version of XHTML than
the
Nokia browser expects

After a bit of googling, it seems that the Nokia browser actually supports
only XHTMl MP (mobile profile), which is a *subset* of XHTML described
here:
http://www.wapforum.org/tech/documents/WAP-277-XHTMLMP-20011029-a.pdf

Unfortunately, that makes it a lot more difficult to fulfill your
requirements.
 
M

mike

regards:

Is it reasonable?

_NOTValid_ HTML
--->_Valid_HTML (check syntax program)
--->_Valid_XHTML(after translation)

thank you

best wishes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,835
Latest member
lila30

Latest Threads

Top