In a DTD, how do I specify that an element contains arbitrary othermarkup?

S

Simon Brooke

I maintain a DTD which is used to specify XML documents which are mostly
marked up in the dialect specified by the DTD ('ADL'), but in which there
are three elements whose contents are intended to be arbitrary XHTML 1.1.

Currently I've declared these as #PCDATA and simply ignore the parse
errors this causes, but this isn't very good and I'm not proud of it.

Is there a syntax for saying 'this element contains markup from this
other namespace' (I assume not, since SGML doesn't know about XML
namespaces?)? Otherwise, is there a syntax for saying 'this element
contains arbitrary markup'?

I know that I could include the XHTML DTD into my DTD but I'd prefer not
to do this as I'd prefer to keep the namespaces separate - to be able to
do:

<adl:topmatter>
<xhtml:div class="top">
<xhtml:p>This appears at the top of every page</xhtml:p>
</xhtml:div>
</adl:topmatter>

I know also that I really ought to move to an XSD schema, but I find them
just too prolix and awkward to work with!
 
M

Martin Honnen

Simon said:
Is there a syntax for saying 'this element contains markup from this
other namespace' (I assume not, since SGML doesn't know about XML
namespaces?)? Otherwise, is there a syntax for saying 'this element
contains arbitrary markup'?

You can say
<!ELEMENT foo ANY>
to allow any content for 'foo' elements but nevertheless any elements
then put inside of 'foo' elements are supposed to be declared in the DTD.
 
S

Simon Brooke

You can say
<!ELEMENT foo ANY>
to allow any content for 'foo' elements but nevertheless any elements
then put inside of 'foo' elements are supposed to be declared in the
DTD.

H'mmmm....

So, is there any mechanism for doing what I'm trying to do with a DTD, or
am I in fact forced to change to a schema?

(thanks for the answer, by the way)
 
M

Martin Honnen

Simon said:
So, is there any mechanism for doing what I'm trying to do with a DTD, or
am I in fact forced to change to a schema?

I am not sure, for instance there is modularized XHTML
http://www.w3.org/TR/xhtml-modularization/ which has DTD based modules
and talks about using such modules to create a new DTD but I have not
really mastered that stuff, frankly when it comes to composing elements
from different namespaces I prefer using the XML syntax of XML schemas
rather then all that parameterized entity stuff those DTD based examples
use.
 
S

Simon Brooke

I am not sure, for instance there is modularized XHTML
http://www.w3.org/TR/xhtml-modularization/ which has DTD based modules
and talks about using such modules to create a new DTD but I have not
really mastered that stuff, frankly when it comes to composing elements
from different namespaces I prefer using the XML syntax of XML schemas
rather then all that parameterized entity stuff those DTD based examples
use.

OK, thanks!
 
J

Joe Kesselman

So, is there any mechanism for doing what I'm trying to do with a DTD, or
am I in fact forced to change to a schema?

Well, you *could* give up DTD validation and move all the structural
checks into your application code... or run without them, if you trust
the folks generating your documents...

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
S

Simon Brooke

Well, you *could* give up DTD validation and move all the structural
checks into your application code... or run without them, if you trust
the folks generating your documents...

The benefit of a DTD (from my point of view) is largely that it allows
well written XML editors to prompt the user as to what elements/
attributes are legitimate at what point in the document. The document is
'interpreted' by a set of XSL transforms which generate SQL, Hibernate,
Velocity, C# and Java code. So in as much as the XSL will only transform
valid markup it can be said to do the structural checks, although it
generates errors only for a small number of unexpected constructs.
 
P

Peter Flynn

I maintain a DTD which is used to specify XML documents which are mostly
marked up in the dialect specified by the DTD ('ADL'), but in which there
are three elements whose contents are intended to be arbitrary XHTML 1.1.

Currently I've declared these as #PCDATA and simply ignore the parse
errors this causes, but this isn't very good and I'm not proud of it.

Indeed :)
Is there a syntax for saying 'this element contains markup from this
other namespace' (I assume not, since SGML doesn't know about XML
namespaces?)? Otherwise, is there a syntax for saying 'this element
contains arbitrary markup'?

Not in DTDs, as such. When the concept of namespaces was first mooted,
before they were even officially called namespaces, there was an
assumption that they might let you construct a DTD from modules "called"
from existing other DTDs, in the sense of saying "At this point, I'll
have lists done the way DocBook does them, tables done the way HTML does
then, sections done the way TEI does them, etc etc", but the lack of any
coherent way of modularising the world's DTDs put a stop to that.

You can of course declare an element type name "xhtml:h1" in a DTD, and
it is perfectly validatable with any modern XML validator. So one
approach is to copy and edit an ad-hoc compact version of as much of the
XHTML body content model as you wish to allow, and make it the content
model of your three element types, editing it to be as loose or tight as
you need.

Or, as Martin suggests, use ANY as the content model, having declared
all the necessary element types that will occur.
I know that I could include the XHTML DTD into my DTD but I'd prefer not
to do this as I'd prefer to keep the namespaces separate

But you can:

<!DOCTYPE adl:topmatter [
<!ELEMENT adl:topmatter (xhtml:div)>
<!ELEMENT xhtml:div (xhtml:p)+>
<!ATTLIST xhtml:div class (top|middle|bottom) #IMPLIED>
<!ELEMENT xhtml:p (#PCDATA)>
]>
<adl:topmatter>
<xhtml:div class="top">
<xhtml:p>This appears at the top of every page</xhtml:p>
</xhtml:div>
</adl:topmatter>

$ onsgmls -wxml -e -g -s -u /usr/share/xml/declaration/xml.dcl test.xml
onsgmls:/usr/share/xml/declaration/xml.dcl:1:W: SGML declaration was not
implied
Compilation finished at Sun Jul 18 23:10:40
$ rxp test.xml >/dev/null
$

Just be aware that although the colon is accepted as a valid name
character in DTDs, it does not get interpreted as a namespace delimiter.
I know also that I really ought to move to an XSD schema, but I find them
just too prolix and awkward to work with!

Many people would agree with you. It is a fallacy that they are a
requirement to use XML. But you should definitely consider expressing
your grammar in RelaxNG: that way you can generate a DTD or an W3C
Schema as and when needed, but use a more uman-friendly language for
defining it.

///Peter
 
S

Simon Brooke

Many people would agree with you. It is a fallacy that they are a
requirement to use XML. But you should definitely consider expressing
your grammar in RelaxNG: that way you can generate a DTD or an W3C
Schema as and when needed, but use a more uman-friendly language for
defining it.

Thank you very much indeed. I had not considered the possibility of using
RelaxNG - I'd heard of it, but didn't have a clear idea of what it was or
what its benefits were. I'll have a look.
 
S

Simon Brooke

Thank you very much indeed. I had not considered the possibility of
using RelaxNG - I'd heard of it, but didn't have a clear idea of what it
was or what its benefits were. I'll have a look.

And to follow up to myself (poor form, I know), I've just been playing
with Trang[1], and I'm /very/ impressed. It has converted my DTD into
RelaxNG, preserving the comments (important!), and the RelaxNG syntax is
indeed very readable (I mildly prefer the RNG XML syntax to the RNC
'compact' syntax). This looks very promising.

[1] http://code.google.com/p/jing-trang/
 
S

Simon Brooke

I maintain a DTD which is used to specify XML documents which are mostly
marked up in the dialect specified by the DTD ('ADL'), but in which
there are three elements whose contents are intended to be arbitrary
XHTML 1.1.

Peter Flynn helpfully pointed me to RelaxNG, which does indeed prove a
very nice syntax for specifying a grammar (I'm using the XML syntax which
I find easier than the 'compact' syntax, but as they're interchangeable
that's preference. I see that RelaxNG has a mechanism for referencing
external documents:

http://www.relaxng.org/tutorial-20011203.html#IDA04YR

I also found on W3C's website a specification - possibly out of date - of
XHTML 2.0 as a series of RelaxNG modules:

http://www.w3.org/TR/2003/WD-xhtml2-20030506/relax_module_defs.html

(I couldn't find anywhere these were downloadable as a zip or similar,
but I have copied and pasted into a set of working files to experiment
with).

However, I haven't worked out how these are supposed to work together
since they clearly depend on one another but make no use either of the
'externalRef' mechanism or of the 'include' mechanism. I do note that
they make heavy use of the 'combine' mechanism.

The RelaxNG tool I'm currently using, trang, allows one input grammar
file only - it doesn't permit several input grammar files to be
specified. So I haven't yet worked out how to use multiple XHTML2 modules
together. Also, when I try trang on .rng files which contain
externalRefs, I get:

simon@gododdin:~/workspace/adl/schemas$ java -jar /home/simon/Downloads/
useful/trang-20091111/trang.jar adl-1.4.rng test.xsd
/home/simon/workspace/adl/schemas/adl-1.4.rng:1011:52: error: sorry,
externalRef is not yet supported

so I don't know whether I'm doing what I'm doing right. But what I'm
trying to do is specify that (for example) an ADL headmatter element may
contain xhtml script, link, meta and style elements, so I in the adl.rng
I have:

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
ns="http://bowyer.journeyman.cc/adl/unstable/adl/">
....
<define name="headmatter">
<element name="headmatter">
<ref name="attlist.headmatter"/>
<externalRef href="permitted-html-head.rng"/>
</element>
</define>
<define name="attlist.headmatter" combine="interleave">
<empty/>
</define>

and in a separate file 'permitted-html-head.rng' I have

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
ns="http://www.w3.org/2002/06/xhtml2/">

<start>
<ref name="permitted-xhtml-head" />
</start>

<define name="permitted-xhtml-head">
<zeroOrMore>
<choice>
<element name="content">
<externalRef href="xhtml-2/xhtml-scripting.rng" />
<externalRef href="xhtml-2/xhtml-link.rng" />
<externalRef href="xhtml-2/xhtml-meta.rng" />
<externalRef href="xhtml-2/xhtml-style.rng" />
</element>
</choice>
</zeroOrMore>
</define>
</grammar>

What I hope this is specifying is, e.g.:

<adl:headmatter>
<adl:content>
<xhtml:link rel="stylesheet" type="text/css" href="styles.css" />
<xhtml:meta name="generator"
content="Application description language framework" />
</adl:content>
</adl:headmatter>

I'd much rather not have the <adl:content> tag in there but I haven't yet
worked out a way of getting rid of it. I do specifically want to keep the
namespaces 'adl:' and 'xhtml:' distinct.

So, the questions:

Given that trang does not (yet) handle externalRefs, is there a tool I
can use which will translate a RelaxNG grammar using externalRefs into an
XSD schema?

And, generally, am I on the right lines?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top