Relational data to XML - Are there any standards?

P

Pradeep

Hello,

I need to take a set of input tables and create an XML output file. The
format of the XML output must be user-definable and must be intuitive
enough for non-techies to use.

input table(s) + SomeSchemaDefinition ==> XML file

I have seen examples of XML file generation with fixed scope. For
example, if input table (called customer) is as follows:

Id Name
101 Clark Kent
102 Peter Parker
103 Bruce Banner

The output XML generated is:
<root>
<customer>
<Id>101</Id>
<Name>Clark Kent</Name>
</customer>
<customer>
....

However, the user never had a chance to control the output format. In
my case, the user must have the abilitlity to define which columns are
attributes and which columns are child elements.

I am wondering if there is a standard that is already in place that I
must look at. Any other pointer is appreciated as well. Especially, how
are multiple level nestings handled?

Thank you in advance for your help.

Pradeep
 
P

Philippe Poulard

hi,
Hello,

I need to take a set of input tables and create an XML output file. The
format of the XML output must be user-definable and must be intuitive
enough for non-techies to use.

input table(s) + SomeSchemaDefinition ==> XML file

I have seen examples of XML file generation with fixed scope. For
example, if input table (called customer) is as follows:

Id Name
101 Clark Kent
102 Peter Parker
103 Bruce Banner

The output XML generated is:
<root>
<customer>
<Id>101</Id>
<Name>Clark Kent</Name>
</customer>
<customer>
....

However, the user never had a chance to control the output format. In
my case, the user must have the abilitlity to define which columns are
attributes and which columns are child elements.

I am wondering if there is a standard that is already in place that I
must look at.

SQL to XML is not standardized ; many RDBMS vendors provide proprietary
mechanisms to map SQL to XML, which can be enough for simple XML
targets, but can't deal with more complex XML structures

Any other pointer is appreciated as well. Especially, how
are multiple level nestings handled?

have a look at RefleX

an example here :
http://reflex.gforge.inria.fr/tutorial.html#N801062

this tool can map any SQL select statement to the XML structure you
would expect : you can choose
-which items become elements, attributes, or text,
-what are the names of attributes and elements if the names of the
columns doesn't suit,
-insert some container elements here and there,
-etc

it can output DOM, SAX or write to a file

you can launch it either from the command line or within a web
application, or embed it in your application

enjoy !
Thank you in advance for your help.

Pradeep

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
 
J

Joseph Kesselman

The most standards-oriented solution I can think of offhand would be to
export/extract the relational data to XML in a fairly straightforward
manner, and then run that through XSLT to get your user-defined
formatting layer.
 
S

Stefan Ram

Joseph Kesselman said:
The most standards-oriented solution I can think of offhand would be to
export/extract the relational data to XML in a fairly straightforward
manner, and then run that through XSLT to get your user-defined
formatting layer.

I would consider to use a document per table, an ID-attribute
for the primary key and IDREF-attributes for foreign keys.

I believe the common opinion that XML favors hierarchical data
to be wrong. It might also be used for relation data quite
easily, as shown above, and since relations can model any
other kind of structure so can XML.
 
A

Andy Dingley

Stefan said:
I would consider to use a document per table, an ID-attribute
for the primary key and IDREF-attributes for foreign keys.

Why would you want to model the tables so exactly in the XML document ?
Aspects like keys are quite specific to a specific implementation
through an RDBMS and it's not necessarily important to preserve them.

Assuming that we're talking about some "application" purpose here,
rather than simply replicating an entire database, then the likelihood
is that we care less about "tables" and more about an
application-centred denormalised view across multiple tables. This view
doesn't require foreign keys (it's now a single view) and XML's
hierarchical nature can represent it easily.

I believe the common opinion that XML favors hierarchical data
to be wrong.

ID & IDREF suck. Therefore XML _favours_ hierarchical data. It's not
hierarchical to the exclusion of all else, but it's a strong
favouritism.

It might also be used for relation data quite
easily, as shown above, and since relations can model any
other kind of structure so can XML.

Tapeworms are Turning complete, therefore I can compute anything with
them.
But it's hardly practical, is it ?
 
S

Stefan Ram

Andy Dingley said:
Why would you want to model the tables so exactly in the XML document ?
Aspects like keys are quite specific to a specific implementation
through an RDBMS and it's not necessarily important to preserve them.

Keys are inherent to the relational model of the /data/ -
they are not an implementation detail of a specific RDBMS.
Assuming that we're talking about some "application" purpose
here, rather than simply replicating an entire database, then
the likelihood is that we care less about "tables" and more
about an application-centred denormalised view across multiple
tables. This view doesn't require foreign keys (it's now a
single view) and XML's hierarchical nature can represent it
easily.

In specific applications, there might well be reasons to
diverge from my suggestion. I am just not aware of them right now.
ID & IDREF suck. Therefore XML _favours_ hierarchical data.
It's not hierarchical to the exclusion of all else, but it's a
strong favouritism.

I only know about them from the XML-application "XHTML 1.1",
where the id-attribute has ID-type and "for" and "usemap" have
IDREF-type, and was not aware of problems with this approach.

But then, I really have no other experience with ID and IDREF,
so you might know more about this than I do.
But it's hardly practical, is it ?

It's not so hard storing relational data in XML with one
element per set and one element per tuple, in fact, this
seems quite natural to me.
 
J

Joseph Kesselman

Andy said:
ID & IDREF suck. Therefore XML _favours_ hierarchical data.

There's a lot more than ID/IDREF available once you move to schemas. Not
to mention the option of simply using XPaths in your own document syntax.

XML's native syntax is certainly tree-structured. But what you read that
into for manipulation is up to the application. DOM and SAX and such are
conveniences/tools, *NOT* universal solutions for all tasks.

(Of course IBM's now added XML support to DB2, recognizing that
sometimes a dataset is best manipulated hierarchically as an XML
infoset. Tools for tasks.)
 
P

Peter Flynn

Don't forget that XML was designed for normal text documents,
where ID/IDREF is a useful and robust mechanism. It is not
related in any way to whether or not XML favours hierarchical
data.

If you use XML for rectangular data, you have to understand that
you are pushing the limits of what XML was designed for.
There's a lot more than ID/IDREF available once you move to schemas.

Possibly. But that's a penalty you have to pay.

///Peter
 
A

Andy Dingley

Stefan said:
Keys are inherent to the relational model of the /data/ -
they are not an implementation detail of a specific RDBMS.

The concept of "keys" is relevant to any current "relational"
implementation of the data.

However the _specific_ use of keys is specific to the implementation.
There's a question of how far you normalise your data when designing a
relational model for it. You don't have to normalise to the same form
each time, and you don't have to use identical key structures.

If we see this "XML output" of the database model as being application
centric, then we don't care about such design choices. No matter how
normalised the data was when it was stored internally, we want the same
denormalised view for the output. As different implementations may have
used a different data model (an Access implementation was probably
de-normalised compared to a SQL Server implementation), this difference
is now irrelevant, inappropriate and possibly misleading. Our XML
representation shouldn't preserve these keys.
I only know about them from the XML-application "XHTML 1.1",
where the id-attribute has ID-type and "for" and "usemap" have
IDREF-type, and was not aware of problems with this approach.

Read the RDF documentation. Much of RDF's work was in overcoming the
shortcomings of XML, in providing a usable data model for ID &
IDREF-like concepts.

XML has two major shortcomings here:

* To use IDREF, you must first have an ID. What happens if you want to
refer to a node that's identifiable, but not explicitly labelled ? It's
a valid requirement.

* ID & IDREF only work within a single document. To make small
appplications that can inter-work in a large universe, we need tools
that can refer outside their immediate frame of reference. XPointer is
an attempt here, but there's still a lot lacking with XML in this
context.

It's not so hard storing relational data in XML with one
element per set and one element per tuple, in fact, this
seems quite natural to me.

What's a "tuple" here ? A tuple as held in a table, or a tuple as a
row in a relational view ? I have no real interest in
tuples-from-tables, they're too low level and only really useful for
"database replication between databases with identical table structures
and data models".

If we look at the more interesing case of tuples from a view, then
these will be de-normalised (i.e. they have structure that would have
been normalised into multiple tables). An appropriate XML
representation of these is also normalised. Now we can still say "one
element per tuple" simply, but it has to become "one parent element for
one or more tuples" and "potentially more than one level of element
hierarchy within a tuple"

I strongly recommend studying MS SQL 2000 and the splendid hack with
which they implemented the "AS XML" select query, without changing
anything in the database itself. If you search the MSDN SDK for
"Universal table" then there's a good explanation of it. Basically any
"AS XML" query produces a huge denormalised scratch table called the
"Universal table", then a trivial row scanner runs through this and
generates new element hierarchies when column values change. Quite
useful, and a splendid low-effort hack.
 
J

Joe Kesselman

Andy said:
* To use IDREF, you must first have an ID. What happens if you want to
refer to a node that's identifiable, but not explicitly labelled ? It's
a valid requirement.

Usual solution is XPath -- structural/content-based crossreferencing
rather than pre-tagged -- and solutions derived from it such as
XPointer, or schema's keys, or the similar capabilities in XSLT and XQuery.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top