Jukka K. Korpela wrote :
Scripsit Gérard Talbot:
Thanks for the heads-up.
Such tools should _not_ be used without great discretion.
Apart from fixing nested lists, which is a vague expression and could
mean just about anything,
All previous Mozilla Composer versions were not creating nested lists in
a valid manner. They were creating improperly nested lists like this:
<ul>
<li>first item at first level</li>
<ul>
<li>first item of second level</li>
<li>second item of second level</li>
</ul>
<li>second item at first level</li>
all of these operations change the document
and cause largely unpredictable effects on its visual appearance.
For example, authors and editors often insert consecutive <br> tags to
produce some vertical spacing. That's a wrong approach, but so is the
operation of blindly removing them. The author wanted to create some
spacing, so the author should decide what to do. Maybe the spacing
_could_ be removed. Maybe some simple CSS code should be added while
removing the tags.
Excellent suggestion. I know you and I have mentioned, talked about this
(arbitrary number of consecutive <br> should be better replaced with a
sensible CSS margin-bottom declaration) before in this newsgroup.
Composer 2 could have a feature like this: convert consecutive <br> into
a correspondent margin-bottom of/for the previous block-level element.
Same thing with "drop-empty-paras: specifies if Tidy should discard
Even "cleaning" <td align="right"></td> to <td></td> is wrong if you
don't know what will happen,
Maybe a better HTML Tidy documentation, support or FAQ or "how to use"
document should be developed so that users could see/understand how/what
a setting can do, will do.
and a simple program surely cannot know
People shouldn't use/trust blindly an application at first: they should
back up their work and then experiment.
Maybe the attribute is there for no good reason, but it's possible
that it's there intentionally, e.g. because some client-side script will
change the element's content to nonempty and the author wanted that
content to be right-aligned.
That would be rather rare, I'd say. Chances are, most of the time, the
left/center/right-alignment attributes were semi-automatically added by
a previous/older/other WYSIWYG HTML editor
I didn't know there's a new version of Tidy; I thought the software was
effectively frozen. Now I'm afraid I need to take a look, and I'm afraid
I will be disappointed. When I last tested Tidy, it did _far too much_
"fixing", making wild assumptions and even changing simple
presentational HTML to awfully ugly
It's possible... and that should be rare... otherwise you'd invited to
file a bug on this.
The difficult part with Tidy is finding the correct (for your needs),
best/optimal blend of parameters so that it minimizes "ugly fixes"
occurences.
and poorly structured tag soup in a
CSS flavor
What are your settings/parameters? Here are mine:
--char-encoding latin1 --clean yes --doctype strict --drop-font-tags yes
--drop-proprietary-attributes yes --enclose-block-text yes
--enclose-text yes --indent auto --logical-emphasis yes --replace-color
yes --show-warnings no --wrap 80
All these are the ones that needed to be changed (for me, for my task)
as I did not want their default value. All of the other parameters (some
70-80 parameters) in their default value are ok with me.
as well as changing my perfectly good ISO-8895-1 characters
into messy "escapes".
You need to check the char-encoding parameter
http://tidy.sourceforge.net/docs/quickref.html#char-encoding
and possibly change it from ascii to latin1 since the default is ascii
"Good iso-8859-1 converted into messy 'escapes'" could mean, most
probably mean that input-encoding and output-encoding are not (but
should be) synchronized.
"Tidy will accept Latin-1 (ISO-8859-1) character values, but will use
entities for all characters whose value > 127."
http://tidy.sourceforge.net/docs/quickref.html#char-encoding
My solution/proposal for you: use
--char-encoding latin1
The default values for both parameters (input-encoding and
output-encoding) are not synchronized... which is non-sense. If the
default value for input-encoding is latin1, then the default value of
output-encoding should be latin1 too.
That might be nice, but if the defaults for the parameters are poor, I
cannot really recommend it to most people. Few people will be capable of
setting, say, 50 parameters to reasonable values when the programmer was
not able to do that.
You shouldn't have to set 50 parameters... otherwise, it means the
default parameter value are often not best. I personally set only 12
parameters and I think I could even drop one or 2 when upgrading webpages.
I also think that HTML Tidy is not a good, recommendable tool for
totally new comers to HTML edition. A less powerful, less configurable
version of HTML Tidy might be recommendable for newbies though.
That sounds odd. If it is mightly powerful etc. etc., how come it can't
do the fairly simple job of SGML validation - at least with the DTD
fixed to one of HTML DTDs?
That is a suggestion, a certainly reasonable good suggestion.
Latest HTML Tidy (.exe) version is 102 KB; a true SGML validation fixed
to, say, HTML 4.01 strict DTD would probably be more than 600 KB, would
be more complex/longer to develop - not that fairly simple, as you say
-, would require a lenghty documentation, etc. With so many invalid
webpages out there, it is very much still worth the trouble to do this.
W3C people should have done this many years ago and made such product
free, open-source, easily available, easily embeddable in applications.
Many of the available WYSIWYG HTML editors (commercial ones or freeware
ones) do not have HTML Tidy built-in nor SGML parsing feature built-in
.... and that is a shame.
Gérard