XML and its Uses

J

Joe Kesselman

Geoff said:
How do other things layer on to xml?
> I mean like dtd, wsdl, soap, etc.?

Could you clarify what question you're asking?

DTDs are part of the XML spec. They're one way of defining a document
type. Schemas are another, more recent and more powerful, solution.

WSDL and SOAP are applications of XML -- document types, or languages if
you prefer, implemented in XML.
 
G

Geoff

Could you clarify what question you're asking?

More specifically, for example, if I have an xml spec that says:

</lastname>smith</lastname>

.. . . and continue on expressing everything about a person, first name, age,
height, address, etc. What does a dtd do or what additional info does it
add to that? Same question for schemas.

Wsdl has to do with web services but besides an xml file, etc. one might see
a wsdl file as well, what info does a wsdl file add to the above?

Thanks.

-g
 
J

Joe Kesselman

Geoff said:
. . . and continue on expressing everything about a person, first name, age,
height, address, etc. What does a dtd do or what additional info does it
add to that? Same question for schemas.

The DTD, or schema, provides a formal description of what the proper
structure is for that file, what values are acceptable, what default
values are.... Part documentation, part error-checking (since validation
will check to make sure the document matches this description). Schema
can also specify data types and value ranges, to further constrain this.

(DTDs are arguably obsolete since -- unlike schemas -- they don't
understand how to work with XML Namespaces. Some folks will challange
that assertion.)
Wsdl has to do with web services but besides an xml file, etc. one might see
a wsdl file as well, what info does a wsdl file add to the above?

Additional document structure to describe what service you're talking to
and what you want it to do for you.

(BTW, you should try to think in terms of documents, not files. A lot of
XML never actually gets stored in a file, existing only on the network
and in memory buffers.)
 
P

Peter Flynn

Geoff said:
More specifically, for example, if I have an xml spec that says:

</lastname>smith</lastname>

...and continue on expressing everything about a person, first name, age,
height, address, etc. What does a dtd do or what additional info does it
add to that?

Whatever you specify. A DTD is a description of the names for your
element types, and how they fit together. You can write it, so you
get to decide what information you want to describe, and how.

Alternatively, you use one someone else has written.
> Same question for schemas.

Ditto, except that schemas allow much tighter specification of the
types of data you want to allow (eg surname must be alphabetic; or
date must be numeric; or whatever), which is useful for data-based
applications, not so useful for normal text documents.

///Peter
 
P

Peter Flynn

Joe said:
(DTDs are arguably obsolete since -- unlike schemas -- they don't
understand how to work with XML Namespaces. Some folks will challange
that assertion.)

It was very clear from this summer's Extreme Markup Conference that
DTDs are anything but obsolete. Namespaces are irrelevant for large
classes of documents, especially in the publishing field, where
schemas have little if anything to offer apart from enforcing the
format of dates. Many publishing applications also require the
facilities offered by declared entities, which are not available in
W3C Schemas.

///Peter
 
J

Joe Kesselman

Peter said:
It was very clear from this summer's Extreme Markup Conference that
DTDs are anything but obsolete.

I did say "arguably". I agree they're still being used; there's less
consensus on whether that's a good thing.
> Namespaces are irrelevant for large
classes of documents, especially in the publishing field

Definitely disagree. That's true initially when you aren't exchanging
data with other applications or working with applications smart enough
to allow plug-in extensions. As soon as those assertions start breaking
down, namespaces become valuable and DTDs start falling apart. (I've
been involved in this since before namespaces existed, so I've had the
experience of trying to design namespaces into a DTD. Even if you kluge
like mad, it's still putting lipstick on a pig -- the results are not
pretty and the pig doesn't enjoy it at all.)
> Many publishing applications also require the
facilities offered by declared entities, which are not available in
W3C Schemas.

Correction: They require macro/import capabilities. Entities are
certainly the traditional way of doing that, inherited from SGML. and I
agree that schema doesn't support either. XML uses other tools --
XInclude, simple XSLT transformations, XPointer and so on -- to provide
similar capabilities.

The problem here is that the document world, understandably given their
SGML heritage. wants to treat XML as much like SGML as possible. They're
among the few who really like the DTD syntax, having "grown up with
it". They like entities because they're SGMLish. They dislike namespaces
because SGML didn't have them. I understand the resistance to change,
but I disagree that this resistance is well founded, and I feel that we
should be gently encouraging them to accept that XML is *not* SGML and
has its own (equivalent) solutions... for exactly the same reasons we
made the effort to teach C++ coders that just because they can write "as
if it was C" doesn't mean those are still the best answers in the new
environment.

I'm not dogmatic about this. People should do what's needed to get the
job done. But they should also design with an eye toward the future, and
I really don't think DTDs are it.

Your milage may vary.
 
A

Andy Dingley

Peter said:
Namespaces are irrelevant for large
classes of documents, especially in the publishing field,

Assume some witty rebutal of that on the general theme of "bollocks"
 
A

Andy Dingley

Peter said:
Namespaces are irrelevant for large
classes of documents, especially in the publishing field,

OK, I've got back to a keyboad where all the keys work (I was on an
evil Sony Vaio wireless thing!)

Namespaces are useful. Namespaces are _especially_ useful in a metadata
processing context, and that is particularly relevant to document
publishing.

Taking the "sliding windows" view of data and metadata, we have a
layered view of our data in terms of abstraction. The stuff "in the
window" is treated as relevant data that's needed by the operation
currently being performed. Anything "below" this abstraction is opaque
data that's merely transported unchanged, anything "above" it is
metadata that's not immediately of relevance.

The power of this model is that it allows us to build
application-independent tools. A tool that understands email (or
internal document routing) can route documents around as unopened black
boxes, just using the RFC822 headers. An indexing tool can use Dublin
Core properties, without needing to know how to edit it. The editor can
display the Dublin Core metadata, and even permit some generalised
processing of it, because it's visible, it knows crude data typing of
it (just as strings or URLs) and it leaves the rest (and the meaning)
to the human user.

Gaining this sort of layered metadata handling is very easily done, if
you reference standard namespaces, such as Dublin Core or some shared
distribution vocabulary. Namespaces are ideal for providing that as a
simple implementation that allows complex documents to be aggregated
from simple known prototcols.
 
J

Juergen Kahrs

Peter is talking about the observed situation, de-facto.
Namespaces are useful. Namespaces are _especially_ useful in a metadata
processing context, and that is particularly relevant to document
publishing.

Peter is not expressing doubts about conceptual usefulness of namespaces.
 
A

Andy Dingley

Juergen said:
Peter is not expressing doubts about conceptual usefulness of namespaces.

"Namespaces are irrelevant for large classes of documents, especially
in the publishing field, where schemas have little if anything to offer
apart from enforcing the
format of dates. "

"Irrelevant for" and "little [...] to offer" (future tense) seem, IMHO,
to be expressing strong doubts about the conceptual usefulness, and the
future potential of, namespaces. I don't read these statements as
merely applying to a current situation.
 
J

Juergen Kahrs

Andy said:
Juergen said:
Peter is not expressing doubts about conceptual usefulness of namespaces.

"Namespaces are irrelevant for large classes of documents, especially
in the publishing field, where schemas have little if anything to offer
apart from enforcing the
format of dates. "

"Irrelevant for" and "little [...] to offer" (future tense) seem, IMHO,
to be expressing strong doubts about the conceptual usefulness, and the
future potential of, namespaces. I don't read these statements as
merely applying to a current situation.

Peter talked about the relevance of namespaces in a special context.
In this special context (in the publishing field), did he mention
his doubts. But he did not talk about namespaces in general or in
all fields.

To summarize: I see no contradiction between his statement and yours.
 
J

Joe Kesselman

Juergen said:
Peter talked about the relevance of namespaces in a special context.
In this special context (in the publishing field), did he mention
his doubts.

Agreed. I think he's accurately reporting the current state of practice
(for the reasons I discussed, largely that publishings adoption of XML
has largely been by minimal-effort migration from SGML), but I still
believe that his implication that this is a *desirable* state of affairs
is unjustified and misguided.

Namespaces do have value in that space, whether they're being applied or
not. Schemas could also have value in those places where you want to
constrain a value (eg, detect cases where a page-layout attribute is
outside the meaningful range) or where you're working with a document
(eg a textbook?) with structured formalisms that are outside DTD's
checking ability but within that of schema.

I agree that people have been getting by without schemas, and many are
getting by without validation at all. Heck, most HTML users have been
getting by with flagrant violation of HTML's (SGML-based) DTD, mostly
because browsers have been coded to tolerate broken documents -- but
those of us who have to write that guess-the-user's-intent code know how
much it has cost them in lost interoperability, code bloat, and so on.
 
A

Andy Dingley

Juergen said:
Peter talked about the relevance of namespaces in a special context.

"It was very clear from this summer's Extreme Markup Conference that
DTDs are anything but obsolete. Namespaces are irrelevant for large
classes of documents, especially in the publishing field, where
schemas have little if anything to offer apart from enforcing the
format of dates. "

This makes a number of statements:

1. DTD are not obsolete
2. Data typing by Schema is not significantly useful
3. Schema is not useful (in this area)
4. Namespacing is not useful.

5. These current observations are unlikely to change in the
forseeable future.
6. All of these observations are noted for, and apply to the
publishing field.

Obviously 1 is true generally if it is true for any special case, so {
1 } is just as valid as { 1 if and only if 6 }. Also if any of 1..4 are
already untrue by now, then 5 would be irrelevant

I don't happen to agree with any of these 1..4. I see 1 and 3 as untrue
simply because Schema supersedes DTD for simple usability reasons.
However I don't feel particularly strongly over it.

My specific point is that 4 is emphatically incorrect, even in the case
of 6. My own background is in publishing, in particular the use of
complicated and powerful content assembly engines that can work
independently of an application-specific content vocabulary [Dingley &
Shabajee, 2003]. These work almost entirely by taking three common and
"well known" vocabularies (RDF for a structural model, HTML / DocBook
for text structure and Dublin Core / MPEG-7 for metadata
representation) and combining them with custom application-specific
vocabularies to instantiate a powerful content-domain-aware publishing
engine "from nowhere". Namespaces are very powerful and publishing is a
central target field that benefits from them already.
 
P

Peter Flynn

Andy said:
Juergen said:
Peter is not expressing doubts about conceptual usefulness of namespaces.

"Namespaces are irrelevant for large classes of documents, especially
in the publishing field, where schemas have little if anything to offer
apart from enforcing the format of dates. "

"Irrelevant for" and "little [...] to offer" (future tense) seem, IMHO,
to be expressing strong doubts about the conceptual usefulness, and the
future potential of, namespaces. I don't read these statements as
merely applying to a current situation.

I should perhaps clarify this (thanks to both of you for picking up on
what I wrote). I was looking at it from the position of editing a text
document for publication (either authoring or editing). During this
process, namespaces are not particularly relevant, and while a schema
offers some more validation, there is often little to validate except
ID/IDREF links, and perhaps a few data formats (dates are a known
problem). In deeply technical documents (math is one obvious example),
schema-level validation may offer more. In preparing a novel for press,
less.

After publication, where you are preparing other derivative material,
I don't dispute that namespaces offer more, and metadata is a good
example. But at this stage the text is fixed (by definition, in most
cases, since the document has now been published), at least for the
time being, and if the document structure management has been done
with (for example) RNG, then a schema can replace the DTD for these
tasks -- Andy's second example is a case in point (the first can be
done just as easily with effectivities).

(And yes, it would be nice if the author had access to namespaces and
would use them to add value to the work being written, but we're not
yet even able to provide authors with a usable XML editor to write
with, so such aspirations are pie in the sky for the moment.)

In any event, in my business we have to deal with things as they are,
and hope that we can persuade the client to move gently towards things
as we might wish them to be. I have three current clients with their
text in SGML and no intention of moving it to XML yet (one day) because
(as Joe pointed out) it means moving from something they know to some
new state, and that's a business decision, with costs and benefits.
I had one client who moved prematurely from SGML to XML against the
recommendation of one senior editor (and mine) and is now paying
a stiff price, not just in cash but in human terms.

The discussion on DTDs in Montreal was interesting. The argument for
entities is unfortunately not completely resolved by the tools Joe
mentions, although they help. Images are a case in point. The problem
with XSLT 1.0 is that in the headlong rush to support data-based
applications it ignored some major features which publishers rely on
(such as the handling of white-space in Mixed Content, and the
processing of ENTITIES attribute values). Some of the other arguments
were regarded as non-prohibitative, but still constitute significant
cost-bearing blockages to using XML to its fullest for document
publishing.

Some of the blame must also rest with publishers: if they are so
anxious to have things done a certain way, they need to get together,
join the W3C, nominate people to the relevant committees, and
ultimately cut some code. Sitting there and whining about it post
hoc is unconvincing, when those who cared were trying to argue
the case unsupported and unfunded.

The dislike of namespaces is not because they weren't there in SGML
(actually, CONCUR, LINK and entities between them provide a large
amount of their functionality) but because they were implemented as
static labels, and the implementations don't provide for proper
inheritance or persistence. But we argued long over this at the time,
and the view was ultimately that the namespace URI did not need to
be dereferenced, nor that their should be anything useful at it to
resolve, which may turn out to be a mistake.

Joe's comparison between C and C++ coding style raised a smile: at
one stage mid-way through the discussions about XML Data, as it was
at the time, I saw a post from someone wishing "that XML would have
a GOTO statement" :)

///Peter
 
J

Joseph Kesselman

Peter said:
But we argued long over this at the time,
and the view was ultimately that the namespace URI did not need to
be dereferenced, nor that their should be anything useful at it to
resolve, which may turn out to be a mistake.

Having been involved in this debate:

The conclusion was that there was no consensus on what, if anything,
should be retrieved via a dereferenced namespace URI. There were too
many different things people wanted to hang off that reference. It
quickly became evident that what was needed was some way to associate
*multiple* data items with a namespace URI, and that this fan-out was
going to have to be addressed by a separate specification. Tim B-L is
supposed to be tackling that as part of the Semantic Web effort, or at
least that was where we left it at the close of the namespace debate.
 
P

Peter Flynn

Joseph said:
Having been involved in this debate:

The conclusion was that there was no consensus on what, if anything,
should be retrieved via a dereferenced namespace URI. There were too
many different things people wanted to hang off that reference. It
quickly became evident that what was needed was some way to associate
*multiple* data items with a namespace URI, and that this fan-out
was going to have to be addressed by a separate specification.

XML Link might have resolved this issue...
Tim B-L is supposed to be tackling that as part of the Semantic Web
effort, or at least that was where we left it at the close of the
namespace debate.

We'll get there one day :)

///Peter
 
M

Magnus Henriksson

Peter said:
XML Link might have resolved this issue...

Well, RDDL (http://www.rddl.org/) is one attempt at using XLinks to
identify various resources that can be associated with a namespace. RDDL
makes it possible to, for example, automatically identify one or schemas
that can be used to validate documents (and much more).

However, I've never seen a tool or API that makes use of RDDL in this way.

// Magnus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,005
Messages
2,570,264
Members
46,860
Latest member
JeremiahCo

Latest Threads

Top