including xml entities with their own doctypes

  • Thread starter the.computational biologist
  • Start date
T

the.computational biologist

hi all - this is a pretty newbie question, so sorry if it's easily
found (though i've searched for a while and can't find a definitive
answer)...

i have a DTD (doctype), say A:

<!DOCTYPE my_container [
<!ELEMENT my_container (my_parent_item)>
<!ELEMENT my_parent_item (my_child_item)>
<!ELEMENT my_child_item (EMPTY)>
]>

now i'd like to have an example of this container as (something like):

<!DOCTYPE my_container SYSTEM "my_container.dtd" [
<!ENTITY include_file SYSTEM "my_parent_item.xml">
]>
<my_container>&include_file;</my_container>

the problem is, i'd like this "my_parent_item.xml" file to be "stand-
alone" in the sense that it will have it's own DTD, with a DOCTYPE of
my_parent_item (i.e. i don't expect the my_parent_item to have to know
that it may be inside a my_container).
furthermore, the my_parent_item DOCTYPE definition may provide
additional features about the my_parent_item object that my_container
didn't know about (e.g. maybe a "name" attribute).

when a validator processes the &include_file; entity, will i wind up
with an error due to multiple DOCTYPE declarations (i.e. will most
validators try to read the DOCTYPE of an *included* XML file)?

thanks for any insight into this, and even some links in the right
direction with such an example would be wonderfully appreciated.

cheers!
 
B

Bjoern Hoehrmann

* the.computational biologist wrote in comp.text.xml:
<!DOCTYPE my_container SYSTEM "my_container.dtd" [
<!ENTITY include_file SYSTEM "my_parent_item.xml">
]>
<my_container>&include_file;</my_container>

the problem is, i'd like this "my_parent_item.xml" file to be "stand-
alone" in the sense that it will have it's own DTD, with a DOCTYPE of
my_parent_item (i.e. i don't expect the my_parent_item to have to know
that it may be inside a my_container).

That is not possible, XML does not permit a document type declaration
in an entity's replacement text; it's, to some extent, a literal text
substitution mechanism. Standards like XInclude allow this, but you'd
then need tools that support XInclude.
furthermore, the my_parent_item DOCTYPE definition may provide
additional features about the my_parent_item object that my_container
didn't know about (e.g. maybe a "name" attribute).

You can split the document type definition over multiple files, but
you cannot use the type definitions from a different XML document.
Technically it might be possible to arrange a RELAX NG schema like
that, but you'd be dealing with elements instead of special language
constructs like DTDs.
thanks for any insight into this, and even some links in the right
direction with such an example would be wonderfully appreciated.

http://www.w3.org/TR/xml/#NT-extParsedEnt has the requirements for
external parsed entities, apart from a text declaration they need
to match the `content` production, which allows for character data
and elements (does not require a single root element), but no docu-
ment type declaration.
 
T

the.computational biologist

thanks for the info!

it's a bit disappointing to find out, though.
with my particular application, i'd like to re-use some pieces of data
between XML documents, but have those data stand on their own, too.
specifically i'm working on annotated matrices, where the matrix
itself is made up of row information, col information, and the value-
containing cells.
since i'll be concatenating matrices often, sharing the row and column
keys between matrices is useful.
but furthermore, there is often semi-structured annotation attached to
each row and column that i'd like to be able to retrieve separately
from the value-holding matrices (i.e. displaying info about the
columns and rows doesn't require knowing what the cell values are for
any particular matrix).

i had envisioned separate column annotation and row annotation
documents (i.e. with their own DOCTYPE declaration), and then
embedding those column and row files into the individual matrices
(with *their* own DOCTYPE declaration).

ah well, maybe this is why proprietary (and thus less-exchangeable)
file formats will never completely die off :-/

cheers, and thanks again for the info and link!

-m

* the.computational biologist wrote in comp.text.xml:
<!DOCTYPE my_container SYSTEM "my_container.dtd" [
 <!ENTITY include_file SYSTEM "my_parent_item.xml">
]>
<my_container>&include_file;</my_container>
the problem is, i'd like this "my_parent_item.xml" file to be "stand-
alone" in the sense that it will have it's own DTD, with a DOCTYPE of
my_parent_item (i.e. i don't expect the my_parent_item to have to know
that it may be inside a my_container).

That is not possible, XML does not permit a document type declaration
in an entity's replacement text; it's, to some extent, a literal text
substitution mechanism. Standards like XInclude allow this, but you'd
then need tools that support XInclude.
furthermore, the my_parent_item DOCTYPE definition may provide
additional features about the my_parent_item object that my_container
didn't know about (e.g. maybe a "name" attribute).

You can split the document type definition over multiple files, but
you cannot use the type definitions from a different XML document.
Technically it might be possible to arrange a RELAX NG schema like
that, but you'd be dealing with elements instead of special language
constructs like DTDs.
thanks for any insight into this, and even some links in the right
direction with such an example would be wonderfully appreciated.

http://www.w3.org/TR/xml/#NT-extParsedEnthas the requirements for
external parsed entities, apart from a text declaration they need
to match the `content` production, which allows for character data
and elements (does not require a single root element), but no docu-
ment type declaration.
 
A

Alain Ketterlin

it's a bit disappointing to find out, though.
with my particular application, i'd like to re-use some pieces of data
between XML documents, but have those data stand on their own, too.

Define external entities (i.e., without doctype) that contain document
data, and then a separate document (i.e., with doctype) for each
possible assembly of entities you're interested in. Can be cumbersome.

Or use something like XInclude, provided the processors you use support
it. Actually, depending on how you are going to process the data, there
may be easier solutions. For instance, XSLT processors are able to
include external documents; typically, an XML document may refer to
other documents (not entities) in attribute values, which are then
included when necessary.

-- Alain.
 
P

Peter Flynn

thanks for the info!

it's a bit disappointing to find out, though.

I'm afraid it's part of the standard: this is one of those restrictions
inherited from SGML which has been a PITA for a couple of decades.

One way round it is to maintain the document fragments with their own
Document Type Declaration (as the first line) so that you can edit them
stand-alone, but export them without the top line to a related filename
after each edit, eg

$ tail -n +2 matrixfoo.xml >MatrixFoo.xml

If you use a programmable XML editor, you may be able to program it to
do this automatically each time you save-and-close.
ah well, maybe this is why proprietary (and thus less-exchangeable)
file formats will never completely die off :-/

There is no reason why this kind of file management shouldn't be built
into existing editors, but I don't know any which do it off-the-shelf.
I suspect it would be relatively trivial to do this for Emacs and for
the Arbortext Editor. *Lots* of people who still work with DTDs would be
very happy to see that.

///Peter
 
J

Joe Kesselman

XML Schemas are more flexible in this respect than DTD (as they are in
other respects, which is why folks are being encouraged to move in that
direction.) Of course, the declaration for the element into which you
insert the "foreign" element must say that this is permitted.

However, Schemas don't handle defining parsed entities. As others have
said, you'd need to use XInclude or some similar mechanism to achieve
that combination.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top