ParserCallback - Html Parser in Java

K

k4

Hi!

I have problem, my class parse html document work pretty good, but if in
html document find "<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en">" then return error. Why ?

import javax.swing.text.html.*;
import javax.swing.text.*;

import java.net.*;

public class HTMLParse extends HTMLEditorKit.ParserCallback{
int begin=0;
int end = 0;

//public void handleError(String errorMsg,int pos){
// System.out.println("Wystapil blad: "+errorMsg);
//System.exit(2); //gdybysmy chcieli wyjsc na problem z tagiem,
ale nie chcemy
// }

public void handleStartTag(HTML.Tag tag, MutableAttributeSet attrSet,
int pos) {
if (tag == HTML.Tag.P) {
begin++;
}
}

public void handleEndTag(HTML.Tag t,int pos){
if (t == HTML.Tag.STRONG) {
end++;
}
}

public void handleText(char[] data, int cos){

if(begin == 3 || begin == 4 || begin == 5 )
System.out.println(data);

}

}
 
T

Tom Hawtin

k4 said:
I have problem, my class parse html document work pretty good, but if in
html document find "<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en">" then return error. Why ?

import javax.swing.text.html.*;

The Swing HTML parse is ancient (and basic). Modern XHTML (XML format) I
believe will give it problems.

You could use something like JTidy (google it) to reformat the document
as old school HTML. Alternatively a short SAX handler could remove the
XMLisms (like using <x/> instead of <x></x>).

Tom Hawtin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top