xml:base attribute added by a parser make validation fails

SL · Jul 24, 2006

I try to validate against a schema a document stored in several files
thanks to external entities. The parseur add a 'xml:base="url"'
attribute on the root element of this sub-trees during parsing, so the
validation of the document fails.

Is there a recommanded solution to this situation ? I have no idea how
to handle the problem: I don't want to take into account at the
vocabulary level a question of syntax (the external entities) by
adding the xml:base attribute into my schema.

Thanks for any suggestion.

Joe Kesselman · Jul 24, 2006

SL said:
The parseur add a 'xml:base="url"'

Try another parser?

SL · Jul 24, 2006

Joe Kesselman a écrit :

Try another parser?

Yes...

(I preferred to deteriorate my schemas rather than to
deteriorate the abstraction of the call to a jaxp-conform parser.)

Is this an eccentricity of the parsor (Xerces-j) or a normal behavior?
It it is the normal behavior, there is perhaps a advisable solution.

Joe Kesselman · Jul 25, 2006

SL said:
Is this an eccentricity of the parsor (Xerces-j) or a normal behavior?

There is a recently added feature (driven by the XInclude
recommendation) which may be related to this. See

http://xerces.apache.org/xerces2-j/features.html#xinclude.fixup-base-uris

Try turning that off, and see if it helps.

(I still don't think an XML parser should be adding xml:base
automatically unless you have explicitly asked it to do so, but...)

SL · Jul 25, 2006

Joe Kesselman a écrit :

There is a recently added feature (driven by the XInclude
recommendation) which may be related to this. See

http://xerces.apache.org/xerces2-j/features.html#xinclude.fixup-base-uris

Try turning that off, and see if it helps.

No, it doesn't help. There is a feature devoted to this in the DOM
API:

<http://apache.org/xml/features/dom/create-entity-ref-nodes>

according to this mail:

<http://marc.theaimsgroup.com/?l=xerces-j-dev&m=104057076909479&w=2>

Thanks for the pointer, it is a the right direction

George Bina · Jul 25, 2006

Hi,

The recommended solution is to make your schema aware of these
attributes and allow them. Look also over the following article
(including comments) for more details:
http://norman.walsh.name/2005/04/01/xinclude

Best Regards,
George

SL · Jul 25, 2006

"George Bina" a écrit :

Hi,

The recommended solution is to make your schema aware of these
attributes and allow them. Look also over the following article
(including comments) for more details:
http://norman.walsh.name/2005/04/01/xinclude

Thanks ! This was exactly the kind of reference I was looking for. So
this behaviour is expected and the solution should be made into the
Schemas. I find quite curious the comment by Daniel Veillard on the
fact that xml:base is not necessary if the included tree is in the
same directory as the including one, it is rather a details.

I still find curious that Xerces add an xml:base attribute for
external /entities/. If I undestand correctly, external entities are a
mechanism from the syntactic side of XML (or perhaps even below, in
the "physical" side), and there is no reason to change vocabulary at
the semantic level.

Richard Tobin · Jul 25, 2006

I still find curious that Xerces add an xml:base attribute for
external /entities/. If I undestand correctly, external entities are a
mechanism from the syntactic side of XML (or perhaps even below, in
the "physical" side), and there is no reason to change vocabulary at
the semantic level.

The base URI for resolving a relative URI reference in an XML document
is the base URI of the element it appears in, and that (in the absence
of xml:base attributes) is the URI of the external or document entity
containing the element.

So if a document at http://example.org/doc.xml uses an external entity
foo/bar.ent, then the base URI for relative URI references in that
entity is http://example.org/foo/bar.ent. But if you parse that
document and write it out again as a single file, the entity
boundaries are lost. Inserting xml:base attributes *when you write it
out* is a way to preserve the meaning of relative URI references.

I am unconvinced that it makes sense for the *parser* to insert the
xml:base attributes however; I would expect it instead to provide a
way to determine the base URI of an element taking into account the
entity boundaries.

-- Richard

SL · Jul 25, 2006

Richard Tobin a écrit :

The base URI for resolving a relative URI reference in an XML
document is the base URI of the element it appears in, and that (in
the absence of xml:base attributes) is the URI of the external or
document entity containing the element.

So if a document at http://example.org/doc.xml uses an external
entity foo/bar.ent, then the base URI for relative URI references in
that entity is http://example.org/foo/bar.ent. But if you parse
that document and write it out again as a single file, the entity
boundaries are lost. Inserting xml:base attributes *when you write
it out* is a way to preserve the meaning of relative URI references.

Of course I undestand the /utility/ of this xml:base attribute. But
I'm wondering if is not a violation of the /logic/ of the layered
nature of XML. With a xml:base attribute added on the sub-tree root
elements, a choice at the entity level constrains the semantic level.

One could even say that if you are interesting in the "/meaning/", as
you said, of the URI reference, you should not store this information
in the SYSTEM part of an entity declaration, since this mechanism is
for storing the document at the physical level, not for expressing
information at all. The division of the document into file is not
itself an "XML information", if I understand correctly the XML
infoset. I can imagine that there is good reason, in the XInclude
spec, for adding this xml:base element, since, for instance, this
XInclude is a mechanism defined at the vocabulary level ; but adding
this attribute to trees extracted by external entity seems break
compatibility.

Am I wrong with this idea?

I am unconvinced that it makes sense for the *parser* to insert the
xml:base attributes however; I would expect it instead to provide a
way to determine the base URI of an element taking into account the
entity boundaries.

I agree ; even the startEntity(String name) method in SAX does not
report the URI of the entity, I don't understand why: it prevents from
serialising the document as it where received (the same sub-trees in
the same file), and limit the "bidirectionality" of SAX as an API.

Even McMahon fails validation	21	Nov 17, 2011
Custom error messages when XSD validation fails?	1	Apr 21, 2008
Validation using schematron?	2	May 7, 2010
simple ElementTree based parser that allows entity definition map	0	Dec 4, 2013
schema validation gotcha	0	Jul 16, 2008
javascript xml parser question.	4	Jul 5, 2004
Forcing schema validation	1	Nov 30, 2006
Xerces-C++ Schema validation	2	Oct 25, 2006

xml:base attribute added by a parser make validation fails

SL

Joe Kesselman

SL

Joe Kesselman

SL

George Bina

SL

Richard Tobin

SL

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads