Newbie question on elements vs. attributes

P

Pookee

I am new to XML and having trouble deciding when data should be stored
as an attribute of an element or as an element of its own. In the
example:

<Books>
<Book pages="808">
<Title>C# Design Patters</Title>
<Author>Burton, Kevin</Author>
<Publisher>Sams</Publisher>
....

What are the rules for determining that the number of pages of the
book should be an attribute of the <Book> element while the other
information about the book such as title, author, and publisher are
stored as child elements?

Thanks in advance for any help,

Greg
 
S

Stefan Ram

<Books>
<Book pages="808">
<Title>C# Design Patters</Title>
<Author>Burton, Kevin</Author>
<Publisher>Sams</Publisher>
What are the rules for determining that the number of pages of the
book should be an attribute of the <Book> element while the other
information about the book such as title, author, and publisher are
stored as child elements?

The short answer is, that the criteria that makes sense can
not be used in XML, and so instead heuristics are used,
actually this situation is a mess.

A slightly longer answer:

An element is a description of an entity (this is its
semantics). (No, not an "entity" in the sense of the
XML-specification, but in the general sense of the
word as "something".)

A description is an assertion. An assertions might use unary
predicates or binary relations.

The structure of predicates and relations, thus, suggests that
a unary predicate is mapped to an element type (one value),
while a binary relation is mapped to a attribute (two values).

For example, say "x" is a rose and belongs to Jack. The
assertion is

rose(x) and owner(x,jack)

this maps to

<rose owner=jack />

in XML. So my answer would be: Use element types for unary
predicates and attributes for binary relations.

However, in XML, this is not always possible, because there
might be at most one element type and relation name per
element and relation values cannot be structured.

Therefore, people start to (ab)use element types as names of
binary relations, when their value is structured or multiple
values are needed, such as in:

<rose>
<owner>jack</owner>
<owner>jill</owner></rose>

So the restrictions of XML forbid to map the semantics
directly to syntax. (Without those restrictions there would be
less need for RDF, because the semantics of XML alone would be
nearly as expressive as RDF.)

Some people now suggest to always use sub-elements, because
one can never know, when a value of an attribute might need to
become structured. Or, recommend to use attributes only,
when one is confident that there will never be need for
a structured value or multiple values.

Others simply explain, that they use attributes as often as
possible, because they are easier to edit with their current
XML-editor.

Others make the choice based on how CSS might be used to
format their XML-documents.

(I am using my own XML-like notation, which allows for
structured and multiple values of attributes, and so I can
really map unary predicates to element types and binary
relations to attributes.)
 
S

Stefan Ram

<Books>
<Book pages="808">
<Title>C# Design Patters</Title>
<Author>Burton, Kevin</Author>
<Publisher>Sams</Publisher>
What are the rules for determining that the number of pages of the
book should be an attribute of the <Book> element while the other
information about the book such as title, author, and publisher are
stored as child elements?

The short answer is, that the criteria that makes sense can
not be used in XML, and so instead heuristics are used,
actually this situation is a mess.

A slightly longer answer:

An element is a description of an entity (this is its
semantics). (No, not an "entity" in the sense of the
XML-specification, but in the general sense of the
word as "something".)

A description is an assertion. An assertion might use unary
predicates or binary relations.

The structure of predicates and relations, thus, suggests that
a unary predicate is mapped to an element type (one value),
while a binary relation is mapped to an attribute (two values).

For example, say "x" is a rose and belongs to Jack. The
assertion is

rose(x) and owner(x,jack)

this maps to

<rose owner="jack" />

in XML. So my answer would be: Use element types for unary
predicates and attributes for binary relations.

However, in XML, this is not always possible, because there
might be at most one element type and relation name per
element and relation values cannot be structured.

Therefore, people start to (ab)use element types as names of
binary relations, when their value is structured or multiple
values are needed, such as in:

<rose>
<owner>jack</owner>
<owner>jill</owner></rose>

So the restrictions of XML forbid to map the semantics
directly to syntax. (Without those restrictions there would be
less need for RDF, because the semantics of XML alone would be
nearly as expressive as RDF.)

Some people now suggest to always use sub-elements, because
one can never know, when a value of an attribute might need to
become structured. Or, recommend to use attributes only,
when one is confident that there will never be need for
a structured value or multiple values.

Others simply explain, that they use attributes as often as
possible, because they are easier to edit with their current
XML-editor.

Others make the choice based on how CSS might be used to
format their XML-documents.

(I am using my own XML-like notation, which allows for
structured and multiple values of attributes, and so I can
really map unary predicates to element types and binary
relations to attributes.)
 
J

Jukka K. Korpela

I am new to XML and having trouble deciding when data should be stored
as an attribute of an element or as an element of its own.

This was asked rather recently. Did you miss to check the recent
discussions?
<Book pages="808">

Maybe a borderline case, since the number of pages might be regarded as a
genuine property of an element. But maybe not; it's not that different
from other bibliographic information.

But if you used some other data format instead of XML, like a simple
sequence of
name: value
pairs, you would probably not have even dreamt of making the number of
pages something special, and you wouldn't have felt a compelling need for
some "attributes" beyond the simple structure.
 
A

Andy Dingley

I am new to XML and having trouble deciding when data should be stored
as an attribute of an element or as an element of its own. In the
example:

<Books>
<Book pages="808">
<Title>C# Design Patters</Title>
<Author>Burton, Kevin</Author>
<Publisher>Sams</Publisher>

There aren't many reasons to make something an attribute.

There are a few reasons that strongly suggest making it an element.
One is any need for internal structure, such as child elements.
Another is that this entity might need to exist independently in its
own XML document.

Apart from this, you're pretty much on your own. If it mattered more,
it would be easier to choose and more obvious.

In some cases (such as XHTML) there may be an assumed behaviour in
readers such that attributes of unknown elements gets ignored, but
text content within an element gets shown. This can influence you too.

If you look at rules for writing a few well-known schemas in XML (RDF
and Dublin Core spring to mind) then there are deliberately two
descriptions of how to do this, one based on attributes and one based
on elements with text content.

What are the rules for determining that the number of pages of the
book should be an attribute of the <Book> element while the other
information about the book such as title, author, and publisher are
stored as child elements?

I don't know - is this an example you took from somewhere ? It's a
pretty poor example of real practice, although it's OK as a
rudimentary XML teaching example.

As a critique, it confuses "book" and "edition". Imagine there's a
hardback and paperback version. These have the same author and title,
but the ISBN and binding are different. If they're represented as
entirely separate "books", then you're forced to duplicate a lot of
information between them and there's still no way to see that they're
related.

There's also the question of nominal data.

<Publisher>Sams</Publisher>

is OK, but it's hard to work with if you need to recognise many books
from the same publisher. Is
<Publisher>Sams</Publisher>
the same as
<Publisher>Sams Publishing Corp.</Publisher>
or
<Publisher>Sams Scientific and Technical</Publisher>


Better practice is to introduce a controlled vocabulary, and to re-use
the references to the entries as much as possible.


<Publisher>http://example.org/vocabs/biblio/publishers/Sams/</Publisher>

or even
<Book
publisher="http://example.org/vocabs/biblio/publishers/Sams/"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top