K
Kai Schlamp
Hy!
I tried to parse PubMed (a biomedical article database) with SAX and
also StAX. The last one failed, but I am not sure why (see Exception
below).
Why does SAX succeed and StAX don't?
The XML document seems to be fine (see
http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml)
Any suggestions?
Kai
StAX example:
String address = "http://www.ncbi.nlm.nih.gov/entrez/
eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
URL url = new URL(address);
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser =
factory.createXMLStreamReader(url.openConnection().getInputStream());
while(parser.hasNext()) {
switch(parser.getEventType()) {
}
parser.next();
}
Error message:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
Message: A '(' character or an element type is required in the
declaration of element type "PubMedPubDate".
SAX example:
SAXParserFactory parserFactory =
SAXParserFactory.newInstance();
parserFactory.setValidating(true);
parserFactory.setNamespaceAware(true);
SAXParser parser = parserFactory.newSAXParser();
parser.parse(url.openConnection().getInputStream(), new
PubmedEFetchHandler());
(PubmedEFetchHander is a simple DefaultHandler with some debugging
output).
I tried to parse PubMed (a biomedical article database) with SAX and
also StAX. The last one failed, but I am not sure why (see Exception
below).
Why does SAX succeed and StAX don't?
The XML document seems to be fine (see
http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml)
Any suggestions?
Kai
StAX example:
String address = "http://www.ncbi.nlm.nih.gov/entrez/
eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
URL url = new URL(address);
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser =
factory.createXMLStreamReader(url.openConnection().getInputStream());
while(parser.hasNext()) {
switch(parser.getEventType()) {
}
parser.next();
}
Error message:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
Message: A '(' character or an element type is required in the
declaration of element type "PubMedPubDate".
SAX example:
SAXParserFactory parserFactory =
SAXParserFactory.newInstance();
parserFactory.setValidating(true);
parserFactory.setNamespaceAware(true);
SAXParser parser = parserFactory.newSAXParser();
parser.parse(url.openConnection().getInputStream(), new
PubmedEFetchHandler());
(PubmedEFetchHander is a simple DefaultHandler with some debugging
output).