Using Xerces SAX to parse just part of an input stream?

N

Nobody

I'm trying to put together code to deal with a SOAP with attachements
response, and I'd like to process the response in a single pass. The
SOAP with attachments specification returns XML in a MIME message, so
it looks like this:

--4389012.48390
Content-Type: text/xml

<?xml version="1.0" encoding="UTF-8"?>
<soap-env:Envelope
xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/">
....snip...
</soap-env:Envelope>
--4389012.48390
Content-Type: text/xml
Content-Id: RootNode

<?xml version="1.0" encoding="UTF-8"?><RootNode>
... snip ...
</RootNode>
--4389012.48390--

So what I'd LIKE to be able to do is to parse the incoming input stream
up to the <?xml> declaration, hand the input stream over to a SAX
parser, let it parse to the end of the document, and then have it
return at the end so I can continue parsing the same input stream.

The problem is that "SAXParser.parse( new InputSource( inputStream ),
handler );" appears to want to consume the input stream until it
reaches EOF on the input stream (which, when given the input stream
above, fails with the error message "Content is not allowed in trailing
section."). Is this something I can work around in Xerces, or is there
a better SAX implementation that will let me tell the parser to stop
when it reaches the last element?
 
J

Joe Kesselman

Nobody said:
The problem is that "SAXParser.parse( new InputSource( inputStream ),
handler );" appears to want to consume the input stream until it
reaches EOF on the input stream (which, when given the input stream
above, fails with the error message "Content is not allowed in trailing
section.").

Unfortunately, the definition of XML parsing does say that there
shouldn't be anything following the document element.

Possible solution: Create a stream filter which you pass the
"--4389012.48390" at the start of the enclosed message, and which
delivers characters only until it sees the corresponding
"--4389012.48390" mark at the end, returning EOF thereafter. Run the
parser from that filter-stream rather than direct from your original
input stream.

In other words, sweep the issue under the carpet so the parser doesn't
have to see it.
 
N

Nobody

Thanks - that was pretty much what I've come up with, although I was
hoping for something simpler. Of course, it doesn't look like writing
a SAX parser is all THAT hard...
 
J

Joseph Kesselman

Nobody said:
Thanks - that was pretty much what I've come up with, although I was
hoping for something simpler. Of course, it doesn't look like writing
a SAX parser is all THAT hard...

XML 1.0 was designed with the goal that writing a parser should be about
the right size for a student project.

Of course that's before namespaces, and schemas, and other things were
added to the mix.

Experience has shown that this is very much a 90/10 problem. You can get
90% of the behavior for 10% of the effort; the other 10% takes the other
90% (or more) of the effort. And making it perform well can add yet
another 90%...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,815
Latest member
treekmostly22

Latest Threads

Top