Java package to correct html errors

C

Chanchal

Hello

Anyone knows about any java packages which can read an HTML code (from
a file or as a string input) and return it as a 'well-formatted'
HTML ?

Thanks

Chanchal
 
R

Roedy Green

Anyone knows about any java packages which can read an HTML code (from
a file or as a string input) and return it as a 'well-formatted'
HTML ?

TagSoup makes a stab at it.

http://mindprod.com/jgloss/tagsoup.html

To detect errors, see
http://mindprod.com/jgloss/htmlvalidator.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

"The industrial civilisation is based on the consumption of energy resources that are inherently limited in quantity, and that are about to become scarce. When they do, competition for what remains will trigger dramatic economic and geopolitical events; in the end, it may be impossible for even a single nation to sustain industrialism as we have know it in the twentieth century."
~ Richard Heinberg, The Party’s Over: Oil, War, and the Fate of Industrial Societies
 
M

Martin Gregorie

Anyone knows about any java packages which can read an HTML code (from a
file or as a string input) and return it as a 'well-formatted' HTML ?
Take a look at Jtidy on the HTML-tidy website:
http://tidy.sourceforge.net/

I haven't used the Java version, but if its anything like as good as the
C 'tidy' utility it should do the trick.
 
T

Tom Anderson

Anyone knows about any java packages which can read an HTML code (from a
file or as a string input) and return it as a 'well-formatted' HTML ?

TagSoup, JTidy and NekoHTML can all be used this way. IME, TagSoup is the
most liberal parser, and NekoHTML is the most strict, but still deals well
with HTML as it is found in the wild.

tom

--
Formal logical proofs, and therefore programs - formal logical proofs
that particular computations are possible, expressed in a formal system
called a programming language - are utterly meaningless. To write a
computer program you have to come to terms with this, to accept that
whatever you might want the program to mean, the machine will blindly
follow its meaningless rules and come to some meaningless conclusion. --
Dehnadi and Bornat
 
M

Mike Amling

Chanchal said:
Anyone knows about any java packages which can read an HTML code (from
a file or as a string input) and return it as a 'well-formatted'
HTML ?

I use the JTidy (http://jtidy.sourceforge.net) that others have
mentioned. I find that its -asxhtml option, which converts to XHTML,
makes the output easily parsable.

--Mike Amling
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top