Hi, I'm new to this forum. I need some help with java programming.
I'm writing a program that extract the data in text files and store it's content in its respective fields.
The input file looks like this.
I'll like to have
docno = CNN19981001.0130.0263
doctype = NEWS
etcetc as output.
I tried using SAXparser but I got an error when it parse the "&" character.
"org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference."
Was wondering whether there is a better way to read in the file.
My code:
Please help me. It's urgent. Thanks in advance.
I'm writing a program that extract the data in text files and store it's content in its respective fields.
The input file looks like this.
<DOC>
<DOCNO> CNN19981001.0130.0263 </DOCNO>
<DOCTYPE> NEWS </DOCTYPE>
<TXTTYPE> CAPTION </TXTTYPE>
<TEXT>
The budget surplus was ignored by investors on Wall Street. The Dow
Jones industrial average lost 237 points to close at 7842. We'll have
more in "Dollars & Sense" at 46 minutes past the hour.
</TEXT>
</DOC>
I'll like to have
docno = CNN19981001.0130.0263
doctype = NEWS
etcetc as output.
I tried using SAXparser but I got an error when it parse the "&" character.
"org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference."
Was wondering whether there is a better way to read in the file.
My code:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
org.w3c.dom.Document tempDoc = dBuilder.parse(file);
tempDoc.getDocumentElement().normalize();
Please help me. It's urgent. Thanks in advance.