S
Steve Jorgensen
I recently produced an XML Schema to support several kinds of transactions
within a particular business domain. In the process, I learned pretty much
all of how W3C XML Schema works, learned some Schematron, read up on XML
design patterns and best practices, and thought I knew what I was doing.
Since there was more overlap than not between the contents of the different
transaction types, I designed a single schema with a single root, and used
some xs:choice elements to handle the different variations. I thought I was
doing a good job of designing a schema that could be gracefully extended to
handle different cases.
Next, the requirements had a medium-big change, so I went to try to extend the
schema to handle a new transaction type that was a bit different, and the
whole schema came tumbling down. My exquisitely designed schema built to deal
with change turned out to be a house of cards that blew over in the next
breeze.
I realized a few things from this:
1. Hierarchical systems are even less flexible than they first appear.
2. Making XML Schemas flexible is really, really hard and requires a knowledge
of specific techniques to achieve it.
3. There's lots of good advice on the Internet regarding how to best use
schema constructs and namespaces, but not much on how to actually design the
node hierarchies in a schema for maximum flexibility.
4. After my schema downfall, invented a couple of patterns that really seem to
help, but I still don't know where to find more advice along these lines.
A big difficulty with XML is that it encourages us to choose a hierarchical
arrangement early that may not work for all cases, because XML is based on the
use of hierarchies, and the alternative seems to be to add IDREFs or keyrefs
that make the code and the document more convoluted - it starts to look more
like a relational database schema than a tree structure. The partial solution
I've found to this problem is to add layers of abstraction such that instead
of making one entity a child of another, both elements get a common parent.
Example...
<BillingAccount>
<Customer><Name value="Foo, Inc."/></Customer>
<Invoice><InvoiceNumber value="1234"/></Invoice>
<Invoice><InvoiceNumber value="1255"/></Invoice>
</BillingAccount>
Say we start with a schema that contains billing account details and invoices,
and there is a 1-to-many relationship between accounts and invoices. The
obvious construction for 1-to-many is to make invoices children of billing
accounts (as above), but if we go down that path, then what do we do about
another document type with billing accounts, but not invoices? We can't just
reuse the billing account element without allowing invoice elements as well,
and that makes no sense. We can make a complex type definition and restrict
or extend the type, but that's another can of worms that gets quickly out of
hand. We can un-nest the elements and use keys and keyrefs, but now the
documents are much harder to process.
<BillingAccount id="123">
<Customer><Name value="Foo, Inc."/></Customer>
</BillingAccount>
<Invoice billingAccountID="123">
<InvoiceNumber value="1234"/>
</Invoice>
<Invoice billingAccountID="123">
<InvoiceNumber value="1255"/>
</Invoice>
To get out of this, what we need is a shared parent for the 2 element types in
the original schema. If we add an element for account activity such that each
account activity element has one billing account child and zero or more
invoice children, that supports the needs of the current schema, but it still
leaves the billing account element usable in a document that does not deal
with invoices.
<AccountActivity>
<BillingAccount>
<Customer><Name value="Foo, Inc."/></Customer>
</BillingAccount>
<Invoice><InvoiceNumber value="1234"/></Invoice>
<Invoice><InvoiceNumber value="1255"/></Invoice>
</AccountActivity>
This is a partial solution, because it's still terribly not hard to come up
business rule changes that can break it, but it's much more resillient than
the original account/invoice hierarchy, and it's much less messy than looking
up the accounts by key reference. All in all - a good compromise.
Where, if anywhere, can I go to find more helpful advice along these lines?
within a particular business domain. In the process, I learned pretty much
all of how W3C XML Schema works, learned some Schematron, read up on XML
design patterns and best practices, and thought I knew what I was doing.
Since there was more overlap than not between the contents of the different
transaction types, I designed a single schema with a single root, and used
some xs:choice elements to handle the different variations. I thought I was
doing a good job of designing a schema that could be gracefully extended to
handle different cases.
Next, the requirements had a medium-big change, so I went to try to extend the
schema to handle a new transaction type that was a bit different, and the
whole schema came tumbling down. My exquisitely designed schema built to deal
with change turned out to be a house of cards that blew over in the next
breeze.
I realized a few things from this:
1. Hierarchical systems are even less flexible than they first appear.
2. Making XML Schemas flexible is really, really hard and requires a knowledge
of specific techniques to achieve it.
3. There's lots of good advice on the Internet regarding how to best use
schema constructs and namespaces, but not much on how to actually design the
node hierarchies in a schema for maximum flexibility.
4. After my schema downfall, invented a couple of patterns that really seem to
help, but I still don't know where to find more advice along these lines.
A big difficulty with XML is that it encourages us to choose a hierarchical
arrangement early that may not work for all cases, because XML is based on the
use of hierarchies, and the alternative seems to be to add IDREFs or keyrefs
that make the code and the document more convoluted - it starts to look more
like a relational database schema than a tree structure. The partial solution
I've found to this problem is to add layers of abstraction such that instead
of making one entity a child of another, both elements get a common parent.
Example...
<BillingAccount>
<Customer><Name value="Foo, Inc."/></Customer>
<Invoice><InvoiceNumber value="1234"/></Invoice>
<Invoice><InvoiceNumber value="1255"/></Invoice>
</BillingAccount>
Say we start with a schema that contains billing account details and invoices,
and there is a 1-to-many relationship between accounts and invoices. The
obvious construction for 1-to-many is to make invoices children of billing
accounts (as above), but if we go down that path, then what do we do about
another document type with billing accounts, but not invoices? We can't just
reuse the billing account element without allowing invoice elements as well,
and that makes no sense. We can make a complex type definition and restrict
or extend the type, but that's another can of worms that gets quickly out of
hand. We can un-nest the elements and use keys and keyrefs, but now the
documents are much harder to process.
<BillingAccount id="123">
<Customer><Name value="Foo, Inc."/></Customer>
</BillingAccount>
<Invoice billingAccountID="123">
<InvoiceNumber value="1234"/>
</Invoice>
<Invoice billingAccountID="123">
<InvoiceNumber value="1255"/>
</Invoice>
To get out of this, what we need is a shared parent for the 2 element types in
the original schema. If we add an element for account activity such that each
account activity element has one billing account child and zero or more
invoice children, that supports the needs of the current schema, but it still
leaves the billing account element usable in a document that does not deal
with invoices.
<AccountActivity>
<BillingAccount>
<Customer><Name value="Foo, Inc."/></Customer>
</BillingAccount>
<Invoice><InvoiceNumber value="1234"/></Invoice>
<Invoice><InvoiceNumber value="1255"/></Invoice>
</AccountActivity>
This is a partial solution, because it's still terribly not hard to come up
business rule changes that can break it, but it's much more resillient than
the original account/invoice hierarchy, and it's much less messy than looking
up the accounts by key reference. All in all - a good compromise.
Where, if anywhere, can I go to find more helpful advice along these lines?