F
Fred
Hi,
I am parsing a small xml document and the parseing goes 'all funny'
when parsing this element: <useragent>Mozilla/4.61 [en] (WinNT;
I)</useragent>
I've created a subclass of org.xml.sax.helpers.DefaultHandler, and an
instance of this subclass is set on my
org.apache.xerces.parsers.SAXParser:
SAXParser parser = new SAXParser();
parser.setContentHandler(pdh);
parser.setErrorHandler(pdh);
I've found that the
public void characters(char[] ch, int offset, int length) throws
SAXException
method is called once per element parsed. my debug output confirms
this. e.g. when parsing <useragent>MobileExplorer/3.00 (Mozilla/1.22;
compatible; MMEF300; Microsoft; Windows; GenericLarge)</useragent> it
reads:
D: reading characters...(useragent) length=89, offset=721,
found='MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300;
Microsoft; Windows; GenericLarge)'
D: ending element (useragent) current element value is :
[MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300; Microsoft;
Windows; GenericLarge)]
But... when parsing <useragent>Mozilla/4.61 [en] (WinNT;
I)</useragent>
the debug output reads
D: reading characters...(useragent) length=16, offset=1097,
found='Mozilla/4.61 [en'
D: reading characters...(useragent) length=1, offset=0, found=']'
D: reading characters...(useragent) length=11, offset=1114, found='
(WinNT; I)'
D: ending (useragent) current element value is : [ (WinNT; I)]
It calls the characters method trice?!
Does the [en] bit in the element value have anything to do with this?
Would like to understand what and why.
(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)
Thanks for your input.
Fred.
I am parsing a small xml document and the parseing goes 'all funny'
when parsing this element: <useragent>Mozilla/4.61 [en] (WinNT;
I)</useragent>
I've created a subclass of org.xml.sax.helpers.DefaultHandler, and an
instance of this subclass is set on my
org.apache.xerces.parsers.SAXParser:
SAXParser parser = new SAXParser();
parser.setContentHandler(pdh);
parser.setErrorHandler(pdh);
I've found that the
public void characters(char[] ch, int offset, int length) throws
SAXException
method is called once per element parsed. my debug output confirms
this. e.g. when parsing <useragent>MobileExplorer/3.00 (Mozilla/1.22;
compatible; MMEF300; Microsoft; Windows; GenericLarge)</useragent> it
reads:
D: reading characters...(useragent) length=89, offset=721,
found='MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300;
Microsoft; Windows; GenericLarge)'
D: ending element (useragent) current element value is :
[MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300; Microsoft;
Windows; GenericLarge)]
But... when parsing <useragent>Mozilla/4.61 [en] (WinNT;
I)</useragent>
the debug output reads
D: reading characters...(useragent) length=16, offset=1097,
found='Mozilla/4.61 [en'
D: reading characters...(useragent) length=1, offset=0, found=']'
D: reading characters...(useragent) length=11, offset=1114, found='
(WinNT; I)'
D: ending (useragent) current element value is : [ (WinNT; I)]
It calls the characters method trice?!
Does the [en] bit in the element value have anything to do with this?
Would like to understand what and why.
(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)
Thanks for your input.
Fred.