DOM parsing - Document root element is missing.

Rico · Oct 17, 2004

The following piece of code :

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filename);

ends in "Document root element is missing" for the following XML:

<?xml version="1.0" encoding="utf-8"?>
<EmailSender>
<db_name>master</db_name>
<document_type>document_New</document_type>
<emailID />
<document_ID>23983</document_ID>
</EmailSender>

I don't really know how the XML is being produced but a space between the
last double-quote and the last '?' seems to solve the problem.
So does changing double-quotes to single-quotes.

Is it something wrong with the XML document or am I missing something
about the usage of the API ?

Thanks. Regards,
Rico.

xarax · Oct 17, 2004

Rico said:
The following piece of code :

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filename);

ends in "Document root element is missing" for the following XML:

<?xml version="1.0" encoding="utf-8"?>
<EmailSender>
<db_name>master</db_name>
<document_type>document_New</document_type>
<emailID />
<document_ID>23983</document_ID>
</EmailSender>

I don't really know how the XML is being produced but a space between the
last double-quote and the last '?' seems to solve the problem.
So does changing double-quotes to single-quotes.

Is it something wrong with the XML document or am I missing something
about the usage of the API ?

The first line of the XML file is not XML syntax.
That's according to the rules of XML.

<?xml version='1.0' encoding='UTF-8' ?>

The first line above is an example of a correct
XML header. It is *not* XML, because the keywords
must be specified in the correct order. (Attribute
keywords that appear within the XML body can be
specified in any order.) I use single quotes in
preference to double quotes, but the space appearing
before the final ? is required.

Rico · Oct 18, 2004

The first line of the XML file is not XML syntax.
That's according to the rules of XML.

<?xml version='1.0' encoding='UTF-8' ?>

The first line above is an example of a correct
XML header. It is *not* XML, because the keywords
must be specified in the correct order. (Attribute
keywords that appear within the XML body can be
specified in any order.) I use single quotes in
preference to double quotes, but the space appearing
before the final ? is required.

Thanks for the input xarax. However, I don't think so, after checking,
that the space is required. The file is produced by a program written in
VB.Net and I am reading it using the Java DOM package.

For some reason, if I somehow modify and save the file before getting my
program to read it, the parsing goes fine. No missing root element or
anything. That's what was happening when I added the space, to match what
worked when I had been testing my program using my own files.

Any further pointers would be very much appreciated. Thanks.

Rico.

Sudsy · Oct 18, 2004

Rico wrote:

Any further pointers would be very much appreciated. Thanks.

Rico.

So as soon as you touch the file with an editor it parses
correctly? So now you have enough information to start on
the process of discovery!
Edit the file, making no changes, save, and exit.
Next use a binary comparator to check for differences.
Perhaps it's as simple as the ^Z used to mark end-of-file
in the M$ world.
Possibly the line termination characters: \r\n in the M$
world, \n in *NIX. Could be the cause, as the problem is
manifesting itself in the first line of the file, no?

Rico · Oct 18, 2004

So as soon as you touch the file with an editor it parses
correctly? So now you have enough information to start on
the process of discovery!
Edit the file, making no changes, save, and exit.
Next use a binary comparator to check for differences.
Perhaps it's as simple as the ^Z used to mark end-of-file
in the M$ world.

Thanks Sudsy. This sounds like a good line of reasoning. Both my Java
program and the VB.NET program are running on Win2K Pro.
Vim on Cygwin reports that I've got an "incomplete last line"
So the above guess could be in the right direction...

Appending "\n" to the file had Vim not complaining anymore but there's
some rubbish characters before the header, which even Textpad displays in
binary mode for the unmodified file coming from the VB.NET program.

It turns out the machine on which the VB.NET program was compiled is
running some Unicode settings that produced garbage on my PC. Textpad
manages to get rid of that upon saving and that's why I could parse it
afterwards.

Rico.

Sudsy · Oct 18, 2004

Rico wrote:

Appending "\n" to the file had Vim not complaining anymore but there's
some rubbish characters before the header, which even Textpad displays in
binary mode for the unmodified file coming from the VB.NET program.

If you check the archives you'll find mention of a BOM, or Byte Order Mark.
It sounds like you'll have to perform some pre-processing of this file
before trying to parse it. But I think you already know this by now...

Eloquence Denied?: getElementsByTagName	5	May 28, 2014
DOM Partial Document Parsing	3	Feb 20, 2004
Editing XML file with JDOM, null reference to Root	7	Feb 2, 2009
xml:id	5	Dec 18, 2011
Root element is missing	0	Feb 25, 2008
Query regarding XSLT "Root element not set"	4	Jul 22, 2009
documentBulder.parse(string) returns [#document: null]	4	Aug 2, 2007
Get "java.lang.OutOfMemoryError" when Parsing an XML useing DOM	29	Mar 23, 2007

DOM parsing - Document root element is missing.

Rico

xarax

Rico

Sudsy

Rico

Sudsy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads