xml:base attribute added by a parser make validation fails

S

SL

I try to validate against a schema a document stored in several files
thanks to external entities. The parseur add a 'xml:base="url"'
attribute on the root element of this sub-trees during parsing, so the
validation of the document fails.

Is there a recommanded solution to this situation ? I have no idea how
to handle the problem: I don't want to take into account at the
vocabulary level a question of syntax (the external entities) by
adding the xml:base attribute into my schema.

Thanks for any suggestion.
 
S

SL

Joe Kesselman a écrit :
Try another parser?

Yes... :) (I preferred to deteriorate my schemas rather than to
deteriorate the abstraction of the call to a jaxp-conform parser.)

Is this an eccentricity of the parsor (Xerces-j) or a normal behavior?
It it is the normal behavior, there is perhaps a advisable solution.
 
S

SL

Joe Kesselman a écrit :
There is a recently added feature (driven by the XInclude
recommendation) which may be related to this. See

http://xerces.apache.org/xerces2-j/features.html#xinclude.fixup-base-uris

Try turning that off, and see if it helps.

No, it doesn't help. There is a feature devoted to this in the DOM
API:

<http://apache.org/xml/features/dom/create-entity-ref-nodes>

according to this mail:

<http://marc.theaimsgroup.com/?l=xerces-j-dev&m=104057076909479&w=2>

Thanks for the pointer, it is a the right direction :)
 
S

SL

"George Bina" a écrit :
Hi,

The recommended solution is to make your schema aware of these
attributes and allow them. Look also over the following article
(including comments) for more details:
http://norman.walsh.name/2005/04/01/xinclude

Thanks ! This was exactly the kind of reference I was looking for. So
this behaviour is expected and the solution should be made into the
Schemas. I find quite curious the comment by Daniel Veillard on the
fact that xml:base is not necessary if the included tree is in the
same directory as the including one, it is rather a details.

I still find curious that Xerces add an xml:base attribute for
external /entities/. If I undestand correctly, external entities are a
mechanism from the syntactic side of XML (or perhaps even below, in
the "physical" side), and there is no reason to change vocabulary at
the semantic level.
 
R

Richard Tobin

I still find curious that Xerces add an xml:base attribute for
external /entities/. If I undestand correctly, external entities are a
mechanism from the syntactic side of XML (or perhaps even below, in
the "physical" side), and there is no reason to change vocabulary at
the semantic level.

The base URI for resolving a relative URI reference in an XML document
is the base URI of the element it appears in, and that (in the absence
of xml:base attributes) is the URI of the external or document entity
containing the element.

So if a document at http://example.org/doc.xml uses an external entity
foo/bar.ent, then the base URI for relative URI references in that
entity is http://example.org/foo/bar.ent. But if you parse that
document and write it out again as a single file, the entity
boundaries are lost. Inserting xml:base attributes *when you write it
out* is a way to preserve the meaning of relative URI references.

I am unconvinced that it makes sense for the *parser* to insert the
xml:base attributes however; I would expect it instead to provide a
way to determine the base URI of an element taking into account the
entity boundaries.

-- Richard
 
S

SL

Richard Tobin a écrit :
The base URI for resolving a relative URI reference in an XML
document is the base URI of the element it appears in, and that (in
the absence of xml:base attributes) is the URI of the external or
document entity containing the element.

So if a document at http://example.org/doc.xml uses an external
entity foo/bar.ent, then the base URI for relative URI references in
that entity is http://example.org/foo/bar.ent. But if you parse
that document and write it out again as a single file, the entity
boundaries are lost. Inserting xml:base attributes *when you write
it out* is a way to preserve the meaning of relative URI references.

Of course I undestand the /utility/ of this xml:base attribute. But
I'm wondering if is not a violation of the /logic/ of the layered
nature of XML. With a xml:base attribute added on the sub-tree root
elements, a choice at the entity level constrains the semantic level.

One could even say that if you are interesting in the "/meaning/", as
you said, of the URI reference, you should not store this information
in the SYSTEM part of an entity declaration, since this mechanism is
for storing the document at the physical level, not for expressing
information at all. The division of the document into file is not
itself an "XML information", if I understand correctly the XML
infoset. I can imagine that there is good reason, in the XInclude
spec, for adding this xml:base element, since, for instance, this
XInclude is a mechanism defined at the vocabulary level ; but adding
this attribute to trees extracted by external entity seems break
compatibility.

Am I wrong with this idea?
I am unconvinced that it makes sense for the *parser* to insert the
xml:base attributes however; I would expect it instead to provide a
way to determine the base URI of an element taking into account the
entity boundaries.

I agree ; even the startEntity(String name) method in SAX does not
report the URI of the entity, I don't understand why: it prevents from
serialising the document as it where received (the same sub-trees in
the same file), and limit the "bidirectionality" of SAX as an API.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,261
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top