element farms (containers for repeated elements) needed?

W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>

From: Eric van der Vlist [mailto:[email protected]]
Sent: Tuesday, 27?January?2004 13:53

Hi,

my question is: do we need container elements for
repeating elements in data-centric xml documents?

No, I don't think so.
or is
it for some reason very advisable to introduce
containers in xml documents even where not strictly
needed? how can a recommendation on this in the light of
existing tools like w3c xml schema and relaxng

I tend to think that tools should have a limited impact on document
design (of course not going to the point where the documents can't be
processed at all) and that a good design isn't necessarily one which
imports all the restrictions of all the tools :) ...

That being said, there is absolutely no restriction in using RELAX NG
without container elements and even W3C XML Schema won't bit you either
unless you say that you want to allow the elements to appear in any
order (using xs:all is the only case I can think of that mandates
containers with WXS).
as well
es established practice be answered?

As you said, some developers (and even good ones for whom I have a lot
of respect) consider that containers are a good practice but I don't.
i would greatly
appreciate any words, pointers, and links.

Hope this helps.

Eric
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>


From: Uche Ogbuji [mailto:[email protected]]
Wednesday, 28-January-2004 19:18


Eric has already given an excellent answer to this, especially mark
his
words:

"I tend to think that tools should have a limited impact on document
design (of course not going to the point where the documents can't be
processed at all) and that a good design isn't necessarily one which
imports all the restrictions of all the tools :) ..."

That's a para that should be engraved somewhere.

I believe there is no one rule that works in all cases when deciding
whether or not to use container elements. Here is the informal rule
of
thumb I use in my own practice:

* Use a container element only when it has a natural analogue to some
meaningful entity in the problem space.

In other words, don't invent an abstract concept for no other reason
than to hold elements together.

So using your example, I would go with

library
*book
*employee
*reader

Each element then conforms to an actual concern in the problem space.
If you use

library
books
*book
employees
*employee
readers
*reader

Then in my opinion the added elements are purely contrivances to make
one feel ore comfortable about not having a container. I believe in
most cases they don't correspond to any useful entity in the problem
space.

Just for clarity, if I wanted to organize my library into a collection
of books donated at the same time, I might be comfortable with:

library
books (@donor='George Soros')
*book
books (@donor='Warren Buffett')
*book

Although I would probably find a name more suitable to the
corresponding
entity:

library
endowment (@donor='George Soros')
*book
endowment (@donor='Warren Buffett')
*book

HTH.


--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
A survey of XML standards: Part 1 -
http://www-106.ibm.com/developerworks/xml/library/x-stand1.html
Building Dictionaries With SAX -
http://www.xml.com/pub/a/2004/01/14/py-xml.html
Learning Objects Metadata -
http://www-106.ibm.com/developerworks/xml/library/x-think21.html
Python Web services developer: The real world, Part 1 -
http://www-106.ibm.com/developerworks/webservices/library/ws-pyth14/
The State of the Python-XML Art, 2003 -
http://www.xml.com/pub/a/2003/09/10/py.html
Objects. Encapsulation. XML? -
http://www.adtmag.com/article.asp?id=8596
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>


From: David Mertz, Ph.D. [mailto:[email protected]]
Wednesday, 28-January-2004 20:37
"I tend to think that tools should have a limited impact on document
design (of course not going to the point where the documents can't be
processed at all) and that a good design isn't necessarily one which
imports all the restrictions of all the tools :) ..."

That's a para that should be engraved somewhere.

Generally, I quite concur with my colleagues Uche and Eric. There
certainly is a negative tendency to over abstract in XML document
design.
* Use a container element only when it has a natural analogue to some
meaningful entity in the problem space.

I wonder if Uche dislikes Java for this reason (or most C++ class
libraries, for that matter). It's not exactly the same thing, but
abstract classes--or generally, deep class hierarchies--are a definite
analogue of container elements. And I tend to dislike them for the
same reason.
So using your example, I would go with
library
*book
*employee
*reader
Each element then conforms to an actual concern in the problem space.
If you use
library
books
*book
employees
*employee
readers
*reader
Then in my opinion the added elements are purely contrivances to make
one feel ore comfortable about not having a container.

I'm not sure if I quite agree here. While there is certainly a point
to not forcing the data structure into the mold of the programming
tool, there are a lot of XML bindings that deal nicely with category
hierarchies. For example, using gnosis.xml.objectify, I might
enumerate over books in the latter scheme with:

for book in library.books:
doSomething(book)

Under Uche's preferred system, I'd have to do something more like:

for book in filter(lambda e: tagname(e)=='book', library):
doSomething(book)

The first is certainly clearer to intent. Of course, some binding use
XPath to do the filtering instead (ElementTree, Anobind, REXML,
etc.)... but while there is something desirable in that uniform syntax,
it is still basically just a filter. Enumerating over books seems like
a pretty natural thing to want to do, IMO.

Think of what you'd do in an OOP framework also--never mind the XML
issue. If I were generating a library object, I would find it much
more natural to have it contain a .books attribute that was a
list/array of books than I would to create a .everything attribute that
was a heterogeneous list of books, employees and readers.

In a way, I would suggest that Uche and Wolfgang are avoiding the
Scylla of letting the data follow the tools, but falling to the
Charybdis of letting the surface representation of XML dictate the data
structure.

Yours, David...
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>


From: Uche Ogbuji [mailto:[email protected]]
Wednesday, 28-January-2004 21:29


I wonder if Uche dislikes Java for this reason (or most C++ class
libraries, for that matter). It's not exactly the same thing, but
abstract classes--or generally, deep class hierarchies--are a definite
analogue of container elements. And I tend to dislike them for the
same reason.

Yes. I have the same problem with deep object
hierarchies. The C++ NIH classes were the classic
example: you either saw them as the paragon of OO
design, or thought they were the perfect demonstration
that OO without generics is bad. I fell quickly into
the latter camp.

I'm not sure if I quite agree here. While there is certainly a point
to not forcing the data structure into the mold of the programming
tool, there are a lot of XML bindings that deal nicely with category
hierarchies. For example, using gnosis.xml.objectify, I might
enumerate over books in the latter scheme with:

for book in library.books:
doSomething(book)

That's pull processing. For a long time I've preferred
push processing n XMl precisely because I think pull
processing often results in contrivances for the benefit
of the code, rather than the best pure XML design.

Under Uche's preferred system, I'd have to do something more like:

for book in filter(lambda e: tagname(e)=='book', library):
doSomething(book)

The first is certainly clearer to intent. Of course, some binding use
XPath to do the filtering instead (ElementTree, Anabind, REXML,
etc.)... but while there is something desirable in that uniform syntax,
it is still basically just a filter. Enumerating over books seems like
a pretty natural thing to want to do, IMO.

I think you hit the nail o the head by talking about
XPath. XPath-based triggers, I think, are the best way
to perform such processing in an imperative language
such as Python. I hope I don't give offense by saying
that the fact that this is not all that well supported
in earlier Python data bindings was the primary
motivation for my developing Anobind rather than
adopting any similar, existing tool.

But I don't want to muddy the issue with comparisons
*between* tools. I can illustrate just as well with
XSLT

Pull:

<xsl:template match="books">
<xsl:for-each select="book">
<xsl:value-of select="title"/> <!-- Spurious in this case, but
battles between apply-templates and value-of are almost inevitable in
non-trivial pull-type processing -->
</xsl:for-each>
</xsl:template>

Push:

<xsl:template match="title">
<xsl:apply-templates/>
</xsl:template>

<!-- For this trivial example, a template for book is
not needed, but it usually is for non-trivial cases -->
<xsl:template match="book">
<xsl:apply-templates/>
</xsl:template>

I'm a strong advocate of push processing, and I think
almost all XSLT experts agree that it leads to clearer
and more maintainable code.

Think of what you'd do in an OOP framework also--never mind the XML
issue.

I think this is a different topic. I do not design for
XML as I do for OO. In fact I argue strenuously against
such (IMHO) mix-up.

If I were generating a library object, I would find it much
more natural to have it contain a .books attribute that was a
list/array of books than I would to create a .everything attribute that
was a heterogeneous list of books, employees and readers.

The conceptual confusion between the slot and the
referent frame itself is a problem that OO has inherited
from its ancestors. I think it argues a problem with OO
rather than a good direction for XML design.

In a way, I would suggest that Uche and Wolfgang are avoiding the

and Eric?

Scylla of letting the data follow the tools, but falling to the
Charybdis of letting the surface representation of XML dictate the data
structure.

Structure of the code? In most cases I've worked on,
the code serves the data, not the other way around, so I
think it's right to let the data representation of the
problem space dictate the structure of the code.

This is a really nice topic, but I may not be able to
contribute too much more to the thread: I've already
been neglecting burning fires at work to converse this
much :)

Thanks, all.


--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
A survey of XML standards: Part 1 -
http://www-106.ibm.com/developerworks/xml/library/x-stand1.html
Building Dictionaries With SAX -
http://www.xml.com/pub/a/2004/01/14/py-xml.html
Learning Objects Metadata -
http://www-106.ibm.com/developerworks/xml/library/x-think21.html
Python Web services developer: The real world, Part 1 -
http://www-106.ibm.com/developerworks/webservices/library/ws-pyth14/
The State of the Python-XML Art, 2003 -
http://www.xml.com/pub/a/2003/09/10/py.html
Objects. Encapsulation. XML? -
http://www.adtmag.com/article.asp?id=8596
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>


From: Eric van der Vlist [mailto:[email protected]]
Wednesday, 28-January-2004 21:32


Hi David,

While there is certainly a point
to not forcing the data structure into the mold of the programming
tool, there are a lot of XML bindings that deal nicely with category
hierarchies. For example, using gnosis.xml.objectify, I might
enumerate over books in the latter scheme with:

for book in library.books:
doSomething(book)

That doesn't necessarily mean that a container needs to
be found in the XML document. I am working on my own
library (similar to gnosis.xml.objectify but not ready
to be published yet), and without any container I can
write:

for book in library.book:
book.doSomething()
Under Uche's preferred system, I'd have to do something more like:

for book in filter(lambda e: tagname(e)=='book', library):
doSomething(book)

The first is certainly clearer to intent. Of course, some binding use
XPath to do the filtering instead (ElementTree, Anobind, REXML,
etc.)... but while there is something desirable in that uniform syntax,
it is still basically just a filter. Enumerating over books seems like
a pretty natural thing to want to do, IMO.

Sure, but the abstraction layer can easily be smart
enough to let you do so without imposing it in the XML
document.
Think of what you'd do in an OOP framework also--never mind the XML
issue. If I were generating a library object, I would find it much
more natural to have it contain a .books attribute that was a
list/array of books than I would to create a .everything attribute that
was a heterogeneous list of books, employees and readers.

In a way, I would suggest that Uche and Wolfgang are avoiding the
Scylla of letting the data follow the tools, but falling to the
Charybdis of letting the surface representation of XML dictate the data
structure.

Hmmm... aren't you the one who assumes that the data
structure is directly derived from the "surface
representation of XML" when you say that a container is
needed because a list of homogeneous objects is easier
to manage with a XML binding tool :) ???

My feeling is that it's because the data model isn't
necessarily dictated by the XML that containers aren't
required.

Eric
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>



From: Lipp, Wolfgang
Thursday, 29-January-2004 11:03

i think i have learned that occam's razor
applies to xml modelling as well: if an element is not
arguably needed, don't use it -- with the addition that
when designing an xml format for a specific application,
then the requirement to enable straightforward iteration
over some kind of repeated element using a given api may
mean the set of repeated elements becomes an entity
since there is something i do with 'it'. the
availability of techniques like xpath etc. somewhat
weakens the point. generally, there seems to be a
feeling that one should do things the xml way in xml,
and the oop way in oop, and not let too many concerns
from one domain influence decisions in the other. for
someone writing a lot of oop things, this may be hard to
do, since ~.books is such a natural and inevitable
choice there.
The conceptual confusion between the slot and the
referent frame itself is a problem that OO has
inherited from its ancestors. I think it argues a
problem with OO rather than a good direction for XML
design.

can you elaborate a bit on this? i *think* it is about
the thing that made me wonder a lot about xml until i
found out that the things in the pointy brackets are
really 'element type names', but i do not fully grasp
the meaning of your remark.

_wolfgang
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>


From: Robert A. Morris [mailto:[email protected]]
Donnerstag, 29. Januar 2004 14:12


It's interesting that the thread seems to, slightly,
reflect these points of view:

service centric => rigourously use containers
data centric => model as convenient

I happen to think the former point of view, being more
abstract, is more extensible and robust, and subsumes
the latter(*). But several writers would naturally put
me in the camp of the over-general. Despite the well
reasoned examples from quite respected writers in the
discussion, my own experience remains that abstraction
takes you longer to develop with, but at the end has
products with longer life and cheaper maintenance. On
the other hand, I live in a world of 3-5 year funding
cycles, from an agency that /wants/ to see me develop
with targets that are over the horizon rather than get
something useful as fast as possible. Indeed, if you
make a proposal to the U.S. National Science Foundation
which "merely" proposes the rapid deployment of a
database---no matter how important---it will rarely, if
ever, be funded. On the other hand, if you can make the
case that, when done, lots of other projects and
consituencies can use your work, you have a good shot at
funding. For NSF proposals, you are /required/ to make
an explicit case for broader impact than the specific
science at hand. Right now I am working on a proposal to
develop a framework for producing spatially referenced
scientific observation systems. The major instance
supporting the proof of concept will be a production
quality invasive species reporting system with data
referenced to the earth, and deployed by an organization
that is presently gathering data with a brittle but
useful system. But another will be a clinical breast
cancer management system with data referenced to the
organ extending the personal database of a clinical
oncologist who learned some Access on his own. My
feeling is that, left to the data centric community,
these two systems would take, say, 2X the effort of
either one of them because they would basically repeat
most of the infrastructure. A more abstract approach
might make the total time 1.1 times the time for either
one.

Given the reasonableness of both sides of this argument,
my guess is that matters will come down to social
arguments, not technical arguments. This is too bad. One
of the things we teach software engineers is that the
client for a system should dictate the behavior not the
implementation. We go to great lengths in our year-long
software engineering course to keep development details
out of view of the people who commission the project.
After a month of intensive requirements negotiation with
them, they rarely get more than a few hours a month with
the development team until something has started to
emerge that purports to meet the requirements.

Bob

(*)It probably will come as no surprise that I have a
mathematics Ph.D. and before turning to computer science
spent 10 years as an algebraic geometer and homological
algebraist. These subjects are so abstract that,
literally I can no longer understand the very papers I
published in the 1970s...



Robert A. Morris
Professor of Computer Science
UMASS-Boston
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>


From: Lipp, Wolfgang
Thursday, 29-January-2004 17:07
Thanks. It's interesting that the thread seems to, slightly, reflect
these points of view:
service centric => rigourously use containers
data centric => model as convenient

i can see that the service centric thing has something
to it, and it is also a very simple rule to unify data
structures. as in procedural programming, quality
probably improves when following clear design patterns.

btw, i am more on the receiving end of the schema -- i
don't develop it, i have to use it -- but i'd rather
live with less containers, shorter element paths and
slightly more involved oo mapping. you wrote earlier
that the attempt to get rid of containers is most of the
time done in the fallacious assumption that people have
to be able to read the xml. well, in our case i think
this is exactly what happens, because people who map
databases according to our xml schema do so in terms of
associating xpaths to database entities -- which is why
the schema has elements named in a way so humans can
read them in the first place. however, i do not want to
put obstacles in the way of a future development of the
schema, and you mentioned there may be trouble ahead
when it comes to questions of schema extensibility:
Furthermore, if you use strong enough typing,
this means that you can have "group of elments
of type X" be reused in many places and have
only to change the type definition of X to change
them all. I could probably go further down this
road invoking inheritance examples that are at
least as persuasive, though those might be too
technical for the people who make these requests.

in my example, i had
library
address
*book
*employee
*reader

book
*author
title
isbn

-- can you point out to me where inheritance bites
you with this kind of structure?

_wolfgang
 
W

Wolfgang Lipp

<annotation>
the first eleven contributions in this thread started
as an off-list email discussion; i have posted them
here with the consent of their authors. -- _w.lipp
</annotation>


From: Robert A. Morris [mailto:[email protected]]
Friday, 30-January-2004 05:25


I agree you are exactly the audience that has to read
XML documents. But to a certain extent, this is a
similar case made by a point in the discussion that the
schema shouldn't be held hostage to the code
implementing applications on it. (However, my
recollection is that this argument was /against/ my
position on containers!) In the SDD work, we are taking
the position that we should rather build some tools to
address maintenance and consumption needs than change
the schema. For example, SDD is heavily dependent on
key/keyref mechanisms and it can be quite difficult for
a human to understand to which key a particular keyref
points, because you also have to examine the identity
constraints on the keys and the XPaths involved. So
rather than give up the mechanism, every place SDD has a
keyref attribute, we also have an optional attribute
named "refdebug" and one of my graduate students wrote a
small XSLT utility that does the necessary traversals
and heuristically chooses a label from the element that
has the correct key and inserts it in the refdebug. In
an rdb, this would be the same as examining secondary
keys, tracing all the relations, and replacing the
secondary key with some reasonably meaningful---if not
unique---value from the related table.

Thanks for provoking this discussion. We certainly had
it many times during the drafting of SDD, and it will
certainly come up again in the discussion of the draft.
When you put it all together, send it to me and I'll put
it on our wiki,
http://efgblade.cs.umb.edu/twiki/bin/view/SDD/WebHome at
which we invite discussion!

Bob



Robert A. Morris
Professor of Computer Science
UMASS-Boston
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top