SAX XMLReader, XMLFilter, ContentHandler and XMLWriter question

J

Jeff Calico

Hello all. I am implementing a SAX filter to strip a bunch of unneeded
elements out of a large XML file. I found a book "Java & XML" by Brett
McLaughlin, and an interesting article by him wich address my issues:
http://www-128.ibm.com/developerworks/xml/library/x-tipbigdoc3.html

However, doing it in that specified way does not seem to work at all!
Doing it very differently *seems* to work, but most likely I am not
understanding something.

Here is my code, contrasted with the book's code. The class
KeepSpecificElementsFilter is at the end:

--------------------------
MY TRIAL AND ERROR WAY:
---------------------------
FileReader r = new FileReader( "filename" );
XMLReader xr = XMLReaderFactory.createXMLReader();
KeepSpecificEltsFilter filter = new KeepSpecificEltsFilter( xr,
"elt");
XMLWriter xw = new XMLWriter( filter, new FileWriter( "Out.xml" ) );
xw.parse( new InputSource(r) );

--------------------------------------------------------
THE WAY THE BOOK SAYS TO DO IT (maybe I misunderstand):
---------------------------------------------------------
FileReader r = new FileReader( s );
XMLReader xr = XMLReaderFactory.createXMLReader();
XMLWriter xw = new XMLWriter( xr, new FileWriter( "jeffOut.xml" ) );
KeepSpecificEltsFilter filter = new KeepSpecificEltsFilter( xw,
"elt");

//DefaultHandler dh = new DefaultHandler();
JeffContentHandler dh = new JeffContentHandler(xr);

filter.setContentHandler( dh );
filter.parse( new InputSource(r) );
------------------------------------------------------------
Note the difference between who does the parsing
(writer or filter) and the way they are chained together.

And Last, here is the Filter class:
------------------------------------------------------------

public class KeepSpecificEltsFilter extends XMLFilterImpl {

private List elementsToKeep;

private boolean inKeptElement = false;

public KeepSpecificEltsFilter( XMLReader parent, String
elementToKeep )
{
super( parent );
elementsToKeep = new LinkedList();
elementsToKeep.add(elementToKeep);
}

//---------------------------------------------------------------------------

public KeepSpecificEltsFilter( XMLReader parent, List elementsToKeep
)
{
super(parent);
this.elementsToKeep = elementsToKeep;
}

//---------------------------------------------------------------------------

public void startElement( String uri, String localName, String qName,
Attributes atts)
throws SAXException
{
if( elementsToKeep.contains(localName) )
{
System.out.println("In kept element = " + localName);
super.startElement( uri, localName, qName, atts );
inKeptElement = true;
}
else
{
}
}

//---------------------------------------------------------------------------

public void endElement( String uri, String localName, String qName )
throws SAXException
{
if( elementsToKeep.contains(localName) )
{
super.endElement( uri, localName, qName );
inKeptElement = false;
}
else
{
// DON'T DO ANYTHING... PREVENTS PROCESSING OF
ELEMENTS
}
}

//---------------------------------------------------------------------------

public void characters( char ch[], int start, int len )
throws
SAXException
{
if( inKeptElement )
{
super.characters( ch, start, len );
}
}
}

Any insight would be appreciated!

--Jeff
 
J

Jeff Calico

I forgot to add that I don't understand what to do with the
ContentHandler class;
I tried to use the DefaultHandler, and then I tried my own class
"JeffContentHandler"
with an empty implementation. It seems to me that the Filter class is
doing this
work though, so why would I register a ContentHandler?

--Jeff
 
J

Joseph Kesselman

Jeff said:
I forgot to add that I don't understand what to do with the
ContentHandler class;
I tried to use the DefaultHandler, and then I tried my own class
"JeffContentHandler"
with an empty implementation. It seems to me that the Filter class is
doing this
work though, so why would I register a ContentHandler?

Normally, the filter is a ContentHandler whose only job is to pass
selected events along to another ContentHandler which actually uses the
filtered document. You have to register your "real" ContentHandler with
the filter so it knows what to do with the events after deciding whether
to keep them or not.

Alternatively, of course, you can combine both the filtering and the
operate-on-the-data stages in a single custom ContentHandler. But in
that case there's no need for it to claim to be a Filter.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top