element farms (containers for repeated elements) needed?

W

Wolfgang Lipp

From: Lipp, Wolfgang [mailto:[email protected]]
Sent: Tuesday, 27?January?2004 13:26


<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors.
</annotation>



my question is: do we need container elements for
repeating elements in data-centric xml documents? or is
it for some reason very advisable to introduce
containers in xml documents even where not strictly
needed? how can a recommendation on this in the light of
existing tools like w3c xml schema and relaxng as well
es established practice be answered? i would greatly
appreciate any words, pointers, and links.

the exposition of the problem has become a rather long
one, done partly to make the matter clear to myself, and
most people will probably not have to read all of it.

to ease the discussion, let me introduce a very simple
data schema, one that describes a library with books,
employees, and readers. it looks like this:

#=====================================================
library
address
*book
*employee
*reader

book
*author
title
isbn

author extends person

employee extends person

reader extends person
card-id

person
name
last
first
#=====================================================

the star is to be read in the usual way as 'zero or more
instances of'. i believe the above structure, where
repeating elements are introduced without explicit
container elements, to be sufficient and extensible: in
case i plan to describe individual employees in more
detail, i can always amend the schema of <employee>
(which presently only holds first and last name) and
leave the schema of the <library> element untouched. (i
also believe that mixed content and order between
elements should be eschewed in most data-centric xml, so
i do not make an effort to express mixed content or
order between sibling elements in the above.)

now, there are people who do not agree with this kind of
schema (let's call it the implicit model) and insist on
container elements for repeatables. this means we have
to explicitly introduce <books>, <employees>, and
<readers>, so the library schema will look like this:

#=====================================================
library
books
employees
readers

books
*book

employees
*employee

readers
*reader

book
authors
title
isbn

authors
*author

author extends person

employee extends person

reader extends person
card-id

person
name
last
first
#=====================================================

the argument, if i understand correctly, goes that in
case i want to change the structure of a cointained
element, then only in the explicit model i can do so by
redefining e.g. <employee> (and perhaps <employees>),
but not the <library> element. it is also claimed that i
will only then be able to use typing and have employees
as an entity that i can change later on, and have it
changed in all the places it appears. third, it is
claimed that for reasons of object-oriented mapping,
container elements are desirable.

i would like to dub explicit container elements 'element
farms' (think of server farms -- many of the same
bundled) for short, and call the above set of claims the
'element farm constraint', which in essence says that
you should introduce a container element (a farm)
whenever you allow the repetition of elements in data-
centric xml.

now, the second argument is obviously correct in so far
as i can *only* in the explicit model modify an element
<employees> and have that change propagate everywhere,
for the simple reason there is no such element in the
implicit model. the question is, why should i want to do
such a thing? i think it is a design decision whether or
not a given entity or set of entities is modelled
explicitly or not. i do not have <books>, <readers>, or
<employees> in the implicit model since i have nothing
to say about these groups in general, only about each
individual. this could be different: for example, at
some point we discover that all readers are subject to a
same fee, and have a maximum of books to take out of the
library. then, the set of readers becomes more tangible,
and i will have to change the implicit model like this:

#=====================================================
library
address
*book
*employee
readers

readers
fee
maximum-number-of-books
*reader

reader extends person
card-id
#=====================================================

this is in fact a change in the model that did not so
automatically percolate through all tiers -- i had to
modify my definition of <library>. so what? new facts
are in town, and we make space for them. we did not
build a complete, all-embracing, all-extensible data
model with the first shot, but who ever will? sure the
explicit model would have made it easier, but it is also
somewhat bulkier. second, what do you do when you find
you have something new to say about the library itself?
you will have to change the <library> element, in both
models. but third and devastatingly, we are faced, in
both models, with the situation that not all repeated
elements are covered by container elements -- the
readers element, above, has two more children. that's
allright for the implicit model, but in order to satisfy
the element farm constraint, we must introduce one more
container <xxx>, like so:

#=====================================================
readers
fee
maximum-number-of-books
xxx

xxx
*reader
#=====================================================

at this juncture, it becomes clear that

* explicit containers for repeated elements will under
* the element farm constraint never be true useful
* entities in the sense of data modelling, since they
* are never allowed to hold any data pertaining to
* them per se.

by the way, i do not see a very strict reason why not to
add an element <readers> but not necessarily make it the
container for the <reader> elements -- sounds strange?
well:

#=====================================================
library
address
*book
*employee
*reader
readers

readers
fee
maximum-number-of-books

reader extends person
card-id

#=====================================================

this structure allows you to query for a collective
'readers' and to scan for individual instances of
'reader' -- in a way the collective is independent of
its members, since we can still say that there is a fee
to pay and a maximum number of books to take home even
with zero readers.

lastly, it is possible to model employees and readers
alike as sets of generic persons. in that case, we must
have both collective elements:

#=====================================================
library
...
employees
readers

employees
*person

readers
*person

#=====================================================

however, since it is easy to subclass and quite
foreseeable that employees and readers do differ from
generic persons in the eyes of a library's data
administration, this approach is perhaps not very much
to be recommended.

sorry again for the longish mail,

_wolfgang lipp
w.lipp at bgbm dot org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top