DOM parsing - Document root element is missing.

R

Rico

The following piece of code :

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filename);


ends in "Document root element is missing" for the following XML:

<?xml version="1.0" encoding="utf-8"?>
<EmailSender>
<db_name>master</db_name>
<document_type>document_New</document_type>
<emailID />
<document_ID>23983</document_ID>
</EmailSender>


I don't really know how the XML is being produced but a space between the
last double-quote and the last '?' seems to solve the problem.
So does changing double-quotes to single-quotes.

Is it something wrong with the XML document or am I missing something
about the usage of the API ?

Thanks. Regards,
Rico.
 
X

xarax

Rico said:
The following piece of code :

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filename);


ends in "Document root element is missing" for the following XML:

<?xml version="1.0" encoding="utf-8"?>
<EmailSender>
<db_name>master</db_name>
<document_type>document_New</document_type>
<emailID />
<document_ID>23983</document_ID>
</EmailSender>


I don't really know how the XML is being produced but a space between the
last double-quote and the last '?' seems to solve the problem.
So does changing double-quotes to single-quotes.

Is it something wrong with the XML document or am I missing something
about the usage of the API ?

The first line of the XML file is not XML syntax.
That's according to the rules of XML.

<?xml version='1.0' encoding='UTF-8' ?>

The first line above is an example of a correct
XML header. It is *not* XML, because the keywords
must be specified in the correct order. (Attribute
keywords that appear within the XML body can be
specified in any order.) I use single quotes in
preference to double quotes, but the space appearing
before the final ? is required.
 
R

Rico

The first line of the XML file is not XML syntax.
That's according to the rules of XML.

<?xml version='1.0' encoding='UTF-8' ?>

The first line above is an example of a correct
XML header. It is *not* XML, because the keywords
must be specified in the correct order. (Attribute
keywords that appear within the XML body can be
specified in any order.) I use single quotes in
preference to double quotes, but the space appearing
before the final ? is required.

Thanks for the input xarax. However, I don't think so, after checking,
that the space is required. The file is produced by a program written in
VB.Net and I am reading it using the Java DOM package.

For some reason, if I somehow modify and save the file before getting my
program to read it, the parsing goes fine. No missing root element or
anything. That's what was happening when I added the space, to match what
worked when I had been testing my program using my own files.

Any further pointers would be very much appreciated. Thanks.

Rico.
 
S

Sudsy

Rico wrote:
Any further pointers would be very much appreciated. Thanks.

Rico.

So as soon as you touch the file with an editor it parses
correctly? So now you have enough information to start on
the process of discovery!
Edit the file, making no changes, save, and exit.
Next use a binary comparator to check for differences.
Perhaps it's as simple as the ^Z used to mark end-of-file
in the M$ world.
Possibly the line termination characters: \r\n in the M$
world, \n in *NIX. Could be the cause, as the problem is
manifesting itself in the first line of the file, no?
 
R

Rico

So as soon as you touch the file with an editor it parses
correctly? So now you have enough information to start on
the process of discovery!
Edit the file, making no changes, save, and exit.
Next use a binary comparator to check for differences.
Perhaps it's as simple as the ^Z used to mark end-of-file
in the M$ world.

Thanks Sudsy. This sounds like a good line of reasoning. Both my Java
program and the VB.NET program are running on Win2K Pro.
Vim on Cygwin reports that I've got an "incomplete last line"
So the above guess could be in the right direction...

Appending "\n" to the file had Vim not complaining anymore but there's
some rubbish characters before the header, which even Textpad displays in
binary mode for the unmodified file coming from the VB.NET program.

It turns out the machine on which the VB.NET program was compiled is
running some Unicode settings that produced garbage on my PC. Textpad
manages to get rid of that upon saving and that's why I could parse it
afterwards.

Rico.
 
S

Sudsy

Rico wrote:
Appending "\n" to the file had Vim not complaining anymore but there's
some rubbish characters before the header, which even Textpad displays in
binary mode for the unmodified file coming from the VB.NET program.

If you check the archives you'll find mention of a BOM, or Byte Order Mark.
It sounds like you'll have to perform some pre-processing of this file
before trying to parse it. But I think you already know this by now...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top