<Books>
<Book pages="808">
<Title>C# Design Patters</Title>
<Author>Burton, Kevin</Author>
<Publisher>Sams</Publisher>
What are the rules for determining that the number of pages of the
book should be an attribute of the <Book> element while the other
information about the book such as title, author, and publisher are
stored as child elements?
The short answer is, that the criteria that makes sense can
not be used in XML, and so instead heuristics are used,
actually this situation is a mess.
A slightly longer answer:
An element is a description of an entity (this is its
semantics). (No, not an "entity" in the sense of the
XML-specification, but in the general sense of the
word as "something".)
A description is an assertion. An assertion might use unary
predicates or binary relations.
The structure of predicates and relations, thus, suggests that
a unary predicate is mapped to an element type (one value),
while a binary relation is mapped to an attribute (two values).
For example, say "x" is a rose and belongs to Jack. The
assertion is
rose(x) and owner(x,jack)
this maps to
<rose owner="jack" />
in XML. So my answer would be: Use element types for unary
predicates and attributes for binary relations.
However, in XML, this is not always possible, because there
might be at most one element type and relation name per
element and relation values cannot be structured.
Therefore, people start to (ab)use element types as names of
binary relations, when their value is structured or multiple
values are needed, such as in:
<rose>
<owner>jack</owner>
<owner>jill</owner></rose>
So the restrictions of XML forbid to map the semantics
directly to syntax. (Without those restrictions there would be
less need for RDF, because the semantics of XML alone would be
nearly as expressive as RDF.)
Some people now suggest to always use sub-elements, because
one can never know, when a value of an attribute might need to
become structured. Or, recommend to use attributes only,
when one is confident that there will never be need for
a structured value or multiple values.
Others simply explain, that they use attributes as often as
possible, because they are easier to edit with their current
XML-editor.
Others make the choice based on how CSS might be used to
format their XML-documents.
(I am using my own XML-like notation, which allows for
structured and multiple values of attributes, and so I can
really map unary predicates to element types and binary
relations to attributes.)