Attribute vs. Element

P

Philipp

Hello. OK I know this is the most asked question in XML (it says in some
tutorial), but still. Please give me your insight on this (as I'm a newbie).

I want to store parameters for a programm in an XML file. I can see 3
intelligent ways to this.


1)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML">0.1</LPAR>
<LPAR name="Frequency" unit="Hz">10^3</LPAR>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

2)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<Coverage unit="ML">0.1</Coverage>
<Frequency unit="Hz">10^3</Frequency>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

3)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML" value="0.1"\>
<LPAR name="Frequency" unit="Hz" value="10^3"\>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

As far as I see, all three are valid
- no multiple attributes with same name
- the "value" is atomic, so can be stored in an attribute
- I cannot think of any parameter (Coverage/Freq) which should need to
be extensible later on.
But
- one attribute (unit) modifies another (value) in version (3), which
seems to be bad practice
- I would like to address the parameters directly (using tinyXML), so
it's easier (is-it?) in version 2 where the Element name, reflects the
purpose. So you can write something like
(pseudocode)
doc.getElement("Coverage")
and not
if(doc.getElement("LPAR").getAttribute("name") == "Coverage"){...}

What do you think? Any good advice on which is better?
Thanks Philipp
 
A

Andy Dingley

Philipp said:
Hello. OK I know this is the most asked question in XML (it says in some
tutorial), but still. Please give me your insight on this (as I'm a newbie).

You've actually asked a rarer variant on it.

I want to store parameters for a programm in an XML file. I can see 3
intelligent ways to this.


1)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML">0.1</LPAR>
<LPAR name="Frequency" unit="Hz">10^3</LPAR>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

2)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<Coverage unit="ML">0.1</Coverage>
<Frequency unit="Hz">10^3</Frequency>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

3)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML" value="0.1"\>
<LPAR name="Frequency" unit="Hz" value="10^3"\>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

There is no real difference between #2 and #3. Personally I'd go with
#3 if it's strictly a config file, because it's consistent between the
handling of name and value (an entirely trivial human-friendliess
issue).

If there's a risk of the file being "viewed" though, then I'd favour
#2, although this is also a very minor issue. There's a vague de facto
standard (from the way HTML gets processed) that "unknown" elements in
the most anonymous default contexts are given a default human rendering
by showing their text content and hiding their attributes.

#1 is interesting though, and quite different. If we assume for the
moment that <LATTICE>
and <LPAR> have some generic meaning as "config files" then you've also
introduced the concepts of "Coverage" and "Frequency" into the XML DTD
and they're obviously very application-specific.

Neither of these is either good or bad, but they are different -- a DTD
that contains "Frequency" is now application specific, not just a
generic one for doing config files. That has significant implications
about project design - in general XML doesn't work well unless the
entire DTD is mapped out before implementing the data / code that uses
it. Getting this aspect is a regular source of problems, especially
for big projects. There are project techniques to work round it, or you
may even find yourself avoiding XML in favour or a more up-to-date
technique. In the extreme case this becomes the "Nominals" problem
which is a classic "hard problem" from the AI world.

So swap between elements and attributes without two much thought -- in
the simple case they're both simple atomic structures that are visible
in the XML Infoset and they really are interchangeable (cardinality or
internal structure might force one into becoming an element). When you
start moving application concepts from XML values to XML names though,
that's when it gets interesting.



I'm a little more concerned about the "10^3" markup for exponents
within the value itself. Although this is certainly a reasonable way of
representing such values, it's not mainstream. I'd use a more common
floating point notation such as "1E3" or "1.0E3" instead.

PS - UPPER CASE tagnames get tiring to read after a while. I'd suggest
you use lower case (mixed case is a pain)

As far as I see, all three are valid
- no multiple attributes with same name
- the "value" is atomic, so can be stored in an attribute

Good basic rules to follow
- I cannot think of any parameter (Coverage/Freq) which should need to
be extensible later on.

That's where your said:
- one attribute (unit) modifies another (value) in version (3), which
seems to be bad practice

That's fine. It's a reasonable and relevant qualification of the value
(giving it dimensions)
- I would like to address the parameters directly (using tinyXML), so
it's easier (is-it?) in version 2 where the Element name,

No. Any "useful" query language makes this almost transparent to you.
If it's hard, get another XML query platform.

The last statement isn't strictly accurate in complex cases involving
Reasoners -- but it's actually <LPAR name="Coverage" ...> that's the
easier case to process !
 
S

Stefan Ram

Philipp said:
Hello. OK I know this is the most asked question in XML (it says in some
tutorial), but still. Please give me your insight on this (as I'm a newbie).

When a new document type is to be defined, when should one
choose child elements and when attributes?

The criterion that makes sense regarding the meaning can not
be used in XML due to syntactic restrictions.

An element is describing something. A description is an
assertion. An assertion might contain unary predicates or
binary relations.

Comparing this structure of assertions with the structure
of XML, it seems to be natural to represent unary predicates
with types and binary relations with attributes.

Say, "x" is a rose and belongs to Jack. The assertion is:

rose( x ) ^ owner( x, "Jack" )

This is written in XML as:

<rose owner="Jack" />

Thus, my answer would be: use element types for unary
predicates and attributes for binary relations.

Unfortunately, in XML, this is not always possible, because
in XML:

- there might be at most one type per element,

- there might be at most one attribute value per attribute
name, and

- attribute values are not allowed to be structured in
XML.

Therefore, the designers of XML document types are forced to
abuse element /types/ in order to describe the /relation/
of an element to its parent element.

This /is/ an abuse, because the designation "element type"
obviously is supposed to give the /type of an element/,
i.e., a property which is intrinsic to the element alone
and has nothing to do with its relation to other elements.

The document type designers, however, are being forced to
commit this abuse, to reinvent poorly the missing structured
attribute values using the means of XML. If a rose has two
owners, the following element is not allowed in XML:

<rose owner="Jack" owner="Jill" />

One is made to use representations such as the following:

<rose>
<owner>Jack</owner>
<owner>Jill</owner></rose>

Here the notion "element type" suggests that it is marked
that Jack is "an owner", in the sense that "owner" is
supposed to be the type (the kind) of Jack.

The intention of the author, however, is that "owner" is
supposed to give the /relation/ to the containing element
"rose". This is the natural field of application for
attributes, as the meaning of the word "attribute" outside
of XML clearly indicates, but it is not possible to
always use attributes for this purpose in XML.

An alternative solution might be the following notation.

<rose owner="Alexander Marie" />

Here a /new/ mini language (not XML anymore) is used within
anattribute value, which, of course, can not be checked
anymore by XML validators. This is really done so, for
example, in XHTML, where classes are written this way.

So in its most prominent XML application XHTML, the W3C
has to abandon XML even to write class attributes. This
is not such a good accomplishment given that the W3C
was able to use the experience made with SGML and HTML
when designing XML.

The needless restrictions of XML inhibit the meaningful
use of syntax. This makes many document type designers
wondering, when attributes and when elements are
should be used, which actually is an evidence of
incapacity for the design of XML: XML does not have many
more notations than these two: attributes and elements.
And now the W3C failed to give even these two
notations a clear and meaningful dedication!

Without the restrictions described, XML alone would have
nearly the expressive power of RDF/XML, which has to repair
painfully some of the errors made in the XML-design.

Now, some "experts" recommend to /always/ use subelements,
because one can never know, whether an attribute value
that seems to be unstructured today might need to become
structured tomorrow. Other experts recommend to use
attributes only when one is quite confident that they
never will need to be structured. This recommendation
does not even try to make a sense out of attributes,
but just explains how to circumvent the obstacles
the W3C has built into XML.

Others recommend to use attributes for something they
call "metadata". They ignore that this limits "metadata"
to unstructured values.

Others use an XML editor that happens to make the input of
attributes more comfortable than the input of elements and
seriously suggest, therefore, to use as many attributes as
possible.

Still others have studied how to use CSS to format XML
documents and are using this to give recommendations about
when to use attributes and when to use subelements. (So
that the resulting document can be formatted most easily
with CSS.)

Of course: Mixing all these criteria (structured vs.
unstructured, data vs. "metadata", by CSS, by the ease of
editing, ...) often will give conflicting recommendations.

Other notations than XML have solved the problem by either
omitting attributes altogether or by allowing structured
attributes. I believe that notations with structured
attributes, which also allow multiple element types and
multiple attribute values for the same attribute name,
are helpful.
 
A

Andy Dingley

Stefan said:
Thus, my answer would be: use element types for unary
predicates and attributes for binary relations.

Unfortunately, in XML, this is not always possible,
[...]

Therefore, the designers of XML document types are forced to
abuse element /types/ in order to describe the /relation/
of an element to its parent element.

I disagree almost entirely with your fascinating analysis :cool:

The history of XML is that it's "SGML lite". It's a document syntax
that only had some semblance of a data model added to it 3 years later.
Any understanding of "How it got to be this way" has to remember it's a
document format that was defined formally, not a formal logic that had
a serialization defined for it. Any interpretation, no matter how
attractive it appears, has to be viewed through this perspective.

If you want a format from the other view (data model, then define the
serialization) then look at RDF.
 
B

Boris Kolpackov

Hi Philipp,

Philipp said:
1)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML">0.1</LPAR>
<LPAR name="Frequency" unit="Hz">10^3</LPAR>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

2)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<Coverage unit="ML">0.1</Coverage>
<Frequency unit="Hz">10^3</Frequency>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

3)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML" value="0.1"\>
<LPAR name="Frequency" unit="Hz" value="10^3"\>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

I would go with (2) since it is the least verbose and the most
straightforward (KISS ;-)). Accidently (or not), it is also the
easiest to use with data binding, should you decide to go that
way one day:

Lattice l = ...
Coverage c = l.Coverage ();
Frequency f = l.Frequency ();

hth,
-boris
 
S

Stefan Ram

Andy Dingley said:
Any understanding of "How it got to be this way" has to
remember it's a document format that was defined formally

The example that I showed, where the class-attribute had to be
augmented with an additional mini-syntax (beyond XML) was
exactly from this scope: a document format (XHMTL).
 
J

Joe Kesselman

This is a religious debate. We aren't going to settle it here.

Re "mini-languages" -- remember that XML is raw syntax. As soon as you
start getting into semantics, you *do* tend to wind up with structure in
the data itself. That doesn't invalidate the concept of structuring at
the XML level; no tool is equally appropriate at all levels of detail.
 
S

Stefan Ram

Joe Kesselman said:
This is a religious debate. We aren't going to settle it here.

Feel free to ignore it and not to take part in it.
Re "mini-languages" -- remember that XML is raw syntax. As soon
as you start getting into semantics, you *do* tend to wind up
with structure in the data itself.

Still, one can imagine a »raw syntax« allowing for multiple
attributes as in:

<p class="alpha" class="beta">example</p>

If this is too much freedom, it could be forbidden in
the DTD (Schema) for all or for individual attributes.
 
A

Andy Dingley

Still, one can imagine a »raw syntax« allowing for multiple
attributes as in:

<p class="alpha" class="beta">example</p>

Just look at the trouble RDF/XML got into going down that route!
<rdf:li> was fine, doing the same thing in attributes was anything but.
 
S

Stefan Ram

[Superseding]

Andy Dingley said:
Just look at the trouble RDF/XML got into going down that route!

I am not aware which fact exactly you are referring to.
<rdf:li> was fine, doing the same thing in attributes was anything but.

It would not be the same thing (li is for containees).
You do not give any reasons for your »anything but«
to the reader. So to me it is just an allegation, I can
not follow.
 
A

Andy Dingley

Stefan said:
I am not aware which fact exactly you are referring to.


It would not be the same thing (li is for containees).

Look at 2.15 "Container Membership Properties" in the RDF/XML syntax
spec [2004]

<rdf:li> works fine for containees as XML elements, but obviously can't
be used as multiple attributes (because their names would need to be
the same). So RDF/XML also permits contained properties to be named as
<rdf:_1>, <rdf:_2> etc. instead, which are also usable as XML
attribute names.

These names always struck me as particularly ugly, albeit an
unfortunate necessity.
 
S

Stefan Ram

Andy Dingley said:
<rdf:li> works fine for containees as XML elements, but obviously can't
be used as multiple attributes (because their names would need to be
the same). So RDF/XML also permits contained properties to be named as
<rdf:_1>, <rdf:_2> etc. instead, which are also usable as XML
attribute names.

Even with structured and repeatable attributes there is some
rationale to render containees as "contents" (sub-elements):

What is in between "<a>" and "</a>" I call the "contents" of
an element. So for a container rendered as an element it seems
natural - by this wording - for its containees to be the
contents of this element.

In sequential containers, the sequence of the containees is
relevant. "<seq><a/><b/></seq>" is not the same as
"<seq><b/><a/></seq>". This sequence can not be modelled by
two binary relations "a belongs to seq" and "b belongs to seq"
alone. Therefore, in this case, it seems reasonable to me to
render them as subelements, even if structured attributes with
repeatable names were available in the markup language used.

Still, the type of the containees would not be used to
express the relation to their container, but their types.
For example,

<list><human name="peter"/><dog name="carl"/></list>

Here, »human« and »dog« give the type of the containees, not
their relation to their superelement, which is already given
by the type of the superelement "list". So this complies to
the semantics that I prefer. The element clearly models that
the containees are contents of their container and that
their sequence is relevant, so it makes sense to me.
 
R

RickH

Philipp said:
Hello. OK I know this is the most asked question in XML (it says in some
tutorial), but still. Please give me your insight on this (as I'm a newbie).

I want to store parameters for a programm in an XML file. I can see 3
intelligent ways to this.


1)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML">0.1</LPAR>
<LPAR name="Frequency" unit="Hz">10^3</LPAR>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

2)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<Coverage unit="ML">0.1</Coverage>
<Frequency unit="Hz">10^3</Frequency>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

3)
<?xml version="1.0" ?>
<PARAMETERS>
<LATTICE>
<LPAR name="Coverage" unit="ML" value="0.1"\>
<LPAR name="Frequency" unit="Hz" value="10^3"\>
</LATTICE>
... (other params in other elements)
</PARAMETERS>

As far as I see, all three are valid
- no multiple attributes with same name
- the "value" is atomic, so can be stored in an attribute
- I cannot think of any parameter (Coverage/Freq) which should need to
be extensible later on.
But
- one attribute (unit) modifies another (value) in version (3), which
seems to be bad practice
- I would like to address the parameters directly (using tinyXML), so
it's easier (is-it?) in version 2 where the Element name, reflects the
purpose. So you can write something like
(pseudocode)
doc.getElement("Coverage")
and not
if(doc.getElement("LPAR").getAttribute("name") == "Coverage"){...}

What do you think? Any good advice on which is better?
Thanks Philipp

I'd use the syntax that jives with however your "Parameters" class will
naturally be serialized or de-serialized.

It will save you the need to write transformations when you instantiate
(de-serialize) or persist (serialize) the object instance of your
"Parameters" class. Most OO platforms have a "default" serialization
format (if you specify XML as opposd to binary), if it's a simple
object I'd just stick with that. Write your Parameters class,
instantiate it, and serialize it to a file, and thats your format.
Anything else and you'll have to write a custom serialization exit when
instantiating or persisting your objects. Some systems make object
properties into elements others use attributes. With option #2 I could
see a generic serializer infering a Parameters class object, then a
Lattice sub-class object with it's property names being assigned as the
Coverage and Frequency properties.

IOW, make your life easier by being able to instantiate your persisted
objects according to your platforms default inference rules, and only
write custom serialization routines when necessary.
 
P

Philipp

RickH said:
I'd use the syntax that jives with however your "Parameters" class will
naturally be serialized or de-serialized.

It will save you the need to write transformations when you instantiate
(de-serialize) or persist (serialize) the object instance of your
"Parameters" class. Most OO platforms have a "default" serialization
format (if you specify XML as opposd to binary), if it's a simple
object I'd just stick with that. Write your Parameters class,
instantiate it, and serialize it to a file, and thats your format.
Anything else and you'll have to write a custom serialization exit when
instantiating or persisting your objects. Some systems make object
properties into elements others use attributes. With option #2 I could
see a generic serializer infering a Parameters class object, then a
Lattice sub-class object with it's property names being assigned as the
Coverage and Frequency properties.

IOW, make your life easier by being able to instantiate your persisted
objects according to your platforms default inference rules, and only
write custom serialization routines when necessary.

Hello Thanks for your answer.
I actually develop very basic c++ (using Eclipse), so I don't have such
a thing as a "default platform serialization routine" :-(

Also the file should remain very basic, as it must be human
readable/writable (as input for the program), which is probably not
respected by XML serializers of other OO platforms.

After some thoughts, I will probably opt for 2) maybe 1) as I can
imagine some parameters being non-atomic in the future.

Thanks again all for your input Philipp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top