Tidy; how to make it XML-conform? needs to be closed

Ragnar · Oct 23, 2006

Hi

I have one question regarding Tidy (http://tidy.sourceforge.net). My
source XML-file has got a lot of unclosed -tags. Which command do I
need (in my tidy config-file) to close it and make valid XML out
of it?

regards
Rag.

Richard Tobin · Oct 23, 2006

Ragnar said:
I have one question regarding Tidy (http://tidy.sourceforge.net). My
source XML-file has got a lot of unclosed -tags. Which command do I
need (in my tidy config-file) to close it and make valid XML out
of it?

Use the -asxml or -asxhtml flag.

-- Richard

Bjoern Hoehrmann · Oct 24, 2006

* Ragnar wrote in comp.text.xml:

I have one question regarding Tidy (http://tidy.sourceforge.net). My
source XML-file has got a lot of unclosed -tags. Which command do I
need (in my tidy config-file) to close it and make valid XML out
of it?

HTML Tidy is not designed to clean up arbitrary XML documents, so if by
"XML-file" you really mean some arbitrary XML document, then it might be
difficult to address your problem. If you mean "HTML" or "XHTML" instead
then use the output-* family of options, or the -asxml command line
option and ensure that you have not set the input-xml flag.

Ragnar · Oct 24, 2006

Thank your for your help. It is very important to get support because
I have to finish it today

my command line looks like: tidy -asxml -config config.txt old.xml

I get the same error like without using "-asxml"

Error: unexpected </reference> in 

That means it finds an unclosed -tag at node "reference".

To get rid of it I could use "no-xml" as input-format but then tidy
would transform my XML into a HTML-structure what is not wanted

Ragnar

Ragnar · Oct 24, 2006

Another question regarding Tidy:

I want to use the COM-Wrapper of Tidy. Now I have found this example:
I dont know why "Stat As Long" is used. I tried to work without "Stat"
but I cannot call objTidyDoc.MethodName directly

Dim objTidyDoc As TidyDocument
Set objTidyDoc = New TidyDocument
Stat = 0
Stat = objTidyDoc.LoadConfig(strTidyConfig)
Stat = objTidyDoc.ParseFile(strFilePath & strXmlFileName)
Stat = objTidyDoc.CleanAndRepair()
Stat = objTidyDoc.RunDiagnostics()
Stat = objTidyDoc.SaveFile(strFilePath & strXmlFileName)

Ragnar · Oct 26, 2006

Now I know how to use the COM-Wrapper but my main question is still
open

How can I transform this source-xml into valid xml without using the
workaround of getting an HTML-output? I dont want to have the HTML-tags
like <HEAD> and <BODY> around it

http://www.ticope.de/tmp/source.xml/download

help VERY appreciated, this task keeps me busy too long
Rag.

Joseph Kesselman · Oct 26, 2006

If your input isn't HTML, Tidy may not be able to help you, and nothing
else out there is likely to be able to read your mind and guess that you
intended tags to autoterminate.

Since you know that *was* your intent, how about just doing a text-level
global replace of with ?

Ragnar · Oct 26, 2006

Joseph said:
Since you know that *was* your intent, how about just doing a text-level
global replace of with ?

Joseph,
that is a very nice idea

It could look like this (assuming appears in node "reference"):
Set objDOMnode = objDom.selectSingleNode("//reference")
If Not objDOMnode Is Nothing Then
strReference = objDOMnode.Text
End If
strReference = Replace(strReference , " ", " ", 1, -1,
vbTextCompare)

But I dont get a value in strReference which means that XML has to be
valid before working with XMLDOM. Am I right? I checked it by closing
 manually, then I get a value for strReference

Joe Kesselman · Oct 27, 2006

Ragnar said:
But I dont get a value in strReference which means that XML has to be
valid before working with XMLDOM.

XML has to be well-formed before using any XML tools. An unterminated
element, such as your , is not well-formed XML. Fix it first.

Andy Dingley · Oct 27, 2006

Ragnar said:
How can I transform this source-xml into valid xml without using the
workaround of getting an HTML-output?

Find some non-Tidy Tidy-like XML tool ? Maybe write one for your
specific task?

Tidy uses an approximation of an SGML parser and a tag-soup strainer to
take "approximate HTML", turn it into the best-guess internal
(DOM-like) model of the intended page, then serialise it accurately.
This relies on three things that you don't have available:

* SGML parsing (omitted tags can often be inferred cleanly)
* A known HTML DTD
* Fix-up code outside the SGML parser that has assumed HTML-soup
behaviours coded explicitly into it.

If your problem is "bad XML" that isn't even approximating HTML, then I
sympathise, but Tidy has three of its hands tied.

Why is your bad XML bad? What's the problem? Can you build some specifc
tool that fixes some specific problem? Even if it has to work with
simple text-file processing and can't support more than one encoding,
it might be enough.

I've done a lot of work with RSS which is only approximate XML at best
and often significantly invalid. Typically it includes HTML entity
references (eg é )that aren't part of XML. It's not too hard to
scan the whole document with a crude entity reference expander that can
map these (from a known list) onto the numeric form. I usually try to
XML parse them, then if this fails I check for the presence of such
entities, convert them and then attempt to re-parse.

I am trying to make an audio player, how do I get the selected file to be playable?	5	Mar 29, 2022
Tidy transforms "&" in the source-xml into a "&"	13	Nov 4, 2006
How to make XML::XPath ignore namespaces?	0	May 21, 2013
<BR/> tag in XML content	7	Sep 11, 2005
XSLT, HTML to XML, understanding external Website	0	Jul 15, 2012
I'm tempted to quit out of frustration	1	Aug 13, 2023
Limited XML tidy	3	Aug 23, 2005
How to Create a random password generator in a separate window	4	May 26, 2022

Tidy; how to make it XML-conform? <BR> needs to be closed

Ragnar

Richard Tobin

Bjoern Hoehrmann

Ragnar

Ragnar

Ragnar

Joseph Kesselman

Ragnar

Joe Kesselman

Andy Dingley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads