Parsing HTML file with xerces2-j

X

Xavier

Hi,

I've just download Xerces2 Java and I'd like to parse an HTML file using
the HTMLDOMImplementation found in the org.apache.html.dom package.

First I try :

DOMImplementationRegistry registry =
DOMImplementationRegistry.newInstance();

DOMImplementation domImpl =
(DOMImplementation)registry.getDOMImplementation("HTML");

but it doesn't find any DOM Implementation for HTML.
Then I try :

HTMLDOMImplementation domImpl =
HTMLDOMImplementationImpl.getHTMLDOMImplementation();

DOMImplementationLS domImplLS =
(DOMImplementationLS)domImpl.getFeature("LS","3.0");

LSParser parser =
domImplLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null);

Document document = parser.parseURI("C:\\test.html");

but I don't know how to get an instance of HTMLDocument to use the HTML
DOM interfaces.

thks
 
J

Joe Kesselman

If you want to parse HTML, you want the NekoHTML parser rather than
normal Xerces. (HTML is not an XML language, though XHTML is.)
 
X

Xavier

In fact, I should start at the beginning :

Because w3c DOM interface is too generic (createElement(),
getAttribute(), ...), I want to create my own DOM Implementation (like
HTML4.01/XHTML1.0, MathML or SVG...) from my own schema.

I'll use XercesJ but I don't know how to do.

In XercesJ 2.8 distribution can be found DOM HTML and DOM WML.

I try the 'HTMLDOMImplementation' to parse HTML file and use the HTML
DOM Interface.

Then I should be inspired for my own DOMImplementation.

But I'm not able to use the HTML DOM Implementation.

Could somebody help me ?

Thks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,255
Members
46,853
Latest member
GeorgiaSta

Latest Threads

Top