This is one of the major flaws of XML.
You are certainly right that XML has flaws, just like
everything else beneath the moon. I'm not sure I agree
with your diagnosis in detail, though.
An element is describing something. A description is an
assertion. An assertion might contain unary predicates or
binary relations.
No assertions of arity greater than 2? No assertions
involving entities other than the entity assumed to be
represented by each element instance?
I think your account of a natural XML semantics is too simple.
comparing this structure of assertions with the structure
of XML, it seems to be natural to represent unary
predicates with types and binary relations with
attributes.
Say, "x" is a rose and belongs to Jack. The assertion is:
rose( x ) ^ owner( x, "Jack" )
This is written in XML as:
Correction: What you show is *one* way to write it in XML.
Thus, my answer would be: use element types for unary
predicates and attributes for binary relations.
Unfortunately, in XML, this is not always possible,
because in XML:
- there might be at most one type per element,
- there might be at most one attribute value per
attribute name, and
- attribute values are not allowed to be structured in
XML.
This doesn't seem plausible to me. Apart from the fact that
(as various contributions to the thread, not to mention the
'match' attribute of XSLT, amply illustrate) the value of an
attribute may be written in any notation of one's choice,
there is the fact that NMTOKENS, IDREFS, ENTITIES, are native
attribute types in XML 1.0.
Therefore, the designers of XML document types are forced
to abuse element /types/, to describe the /relation/ of an
element to its parent element.
I don't think you have made even a prima facie case that this
constitutes abuse of any kind.
This /is/ an abuse, because the designation "element type"
obviously is supposed to give the /type of an element/,
i.e., a property which is intrinsic to the element alone
and has nothing to do with its relation to other elements.
? You seem to be taking as a premise that types and relations
have nothing to do with each other. Why would we assume that?
One useful way to distinguish types of things, in any
modeling, is to observe the relations they can legitimately
enter into. A registered student in good standing at a
university can be enrolled in a particular class; one way to
represent this is with a relation holding between the student
and the class. A human being who is not registered cannot be
enrolled in the class. We might infer from this that we wish
to define two distinct types: one for human beings in general,
and one for registered students.
The document type designers, however, are being forced to
commit this abuse, to reinvent poorly the missing
structured attribute values using the means of XML. If a
rose has two owners, it needs to be written:
<rose>
<owner>Jack</owner>
<owner>Jill</owner></rose>
? Not necessarily. As you point out below,
<rose owners="Jack Jill"/>
is a perfectly legitimate representation of the information.
Here the notion "element type" suggests that it is marked
that Jack is "an owner", in the sense that "owner" is
supposed to be the type (the kind) of Jack. The intention
of the author, however, is that "owner" is supposed to
give the /relation/ to the containing element "rose".
This is the natural field of application for attributes,
as the meaning of the word "attribute" outside of XML
makes clear, but it is not possible to use them for this
purpose in XML.
It seems to me your objection applies with greater force to
the relational model of databases, since in that model the
ownership attribute of the rose really must be separated from
other attributes whose values are guaranteed single and
atomic.
An alternative solution might be the following notation.
<rose owner="Alexander Marie" />
Here a /new/ mini language (not XML anymore) is used
within an attribute value, which, of course, can not be
checked anymore by XML validators.
What validation are you interested in? A DTD-based XML
validator can check to ensure that 'Alexander' and 'Marie' are
both NMTOKENs, or to ensure that they are both ID values on
some elements in the document. A schema-based validator can
do those or other things.
Even if I were to accept your premise that "Alexander Marie"
is "not XML", I would find it unsurprising that XML allows the
use of non-XML notations for information. Any human-readable
document is likely to have a great deal of information
expressed only in natural language; from the very beginning,
therefore, SGML and XML have been made compatible with the
view that there may be information in the document which is
not exhibited directly by the XML markup. I have occasionally
taken the view that structured information of any kind is
almost always best represented in XML, not in specialized
notations (so I favored an instance-based notation for
document grammars even in 1996, and have mocked ISO 8879
mercilessly for providing a distinct metalinguistic notation).
I still think that's a good rough rule. But as time has
passed I have noticed more and more cases where the position
taken by the designers of SGML seems to be the right one:
allow for the existence of non-SGML notations, and do not
insist on being the sole notation in which to write
information.
So in its main language XHTML, the W3C has to abandon XML
even to write class attributes. This is not such a good
accomplishment given that the W3C was able to use the
experience made with SGML and HTML when designing XML and
that XHTML is one of the most prominent XML applications.
Hmm. Never occurred to me to think that the definition of the
'class' attribute was a problem that needed fixing.
Space-delimited tokens are really not hard to handle in most
languages I've worked with. YMMV, of course.
The needless restrictions of XML inhibit the meaningful
use of syntax. This makes many document type designers
wondering, when attributes and when elements are supposed
to be used, which actually is an evidence of incapacity
Asking when a vocabulary designer is "supposed" to use
elements and when attributes feels to me a lot like
asking when a sketch artist is supposed to use straight
lines, and when they are supposed to use curved lines.
Of course, I don't believe that XML has a single way to
represent any particular unary or binary or n-ary
predicate -- nor do I believe that a particular set of
predicates or relations is ever likely to be the only
plausible set with which to represent a particular
body of information. Will we always write
rose( x ) ^ owner( x, "Jack" )
and never any of
member(x, roses) & owns("Jack", x)
rose(x) & person(jack) & relation(jack,x,owns)
rose(x) & person(jack) & relation(x,jack,chattel)
ownership-relation(y) & instance_of(z,y) &
arg1(z,jack) & arg2(z,rose)
jack(owner_of(r)) and r(rose) and jack(human)
time(t) & human(j) & flower(r) & variety(r,rose)
& relationship(o) & true(o,j,r,t)
or any of the infinity of other ways to formalize the
proposition that Jack owns a rose?
It's quite true that XML does not prescribe a particular usage
for elements and attributes. This follows from the fact that
XML does not prescribe any particular method of using XML to
encode information or assigning semantics to tags. Some
people paraphrase this point by saying that XML "has no
semantics" or "is just syntax". But any application of XML
does have semantics. It's just that the specification of
semantics is under the control of the vocabulary designer and
not under the control of the XML Working Group or the XML
spec. There is no set of semantic primitives to which all XML
vocabularies are automatically reducible (the way the
semantics of all TeX macros are ultimately reducible to ink on
paper), there is no pattern or structure to which the
semantics need conform (the way systems of first order logic
tend to need to talk about individuals and predicates taking
individuals as arguments). The semantics of an XML
application are limited only by human ingenuity.
Personally, I think that's one of the main reasons XML has
such wide applicability: the semantics of the markup can be
anything the designer can make them be. If the price of that
freedom is that the designer gets no binding rule about when
to use attributes and when to use elements, -- well, speaking
for myself I think that's a low price to pay.
-C. M. Sperberg-McQueen
World Wide Web Consortium