Joe said:
Uhm. I agree that schemas are taking longer to find their way in than
might have been expected, partly because they're a syntax only a
database expert or computer science geek could love. (Though frankly the
DTD syntax is also pretty hideous.)
Only a syntax geek would love it, but it has the advantage of being very
terse, and once learned, quite expressive. RelaxNG seems to be the way
forward, but I still feel we did the community a disservice by not
properly investigating the possibility of adding datatyping to DTDs
before running amok with W3C Schemas. Ah well. Another time.
However, entities are definitely on the way out. The problem is that
they really aren't all that useful unless there's a fragment that will
appear in a huge number of instances of this kind of document, and even
then they're only a significant advantage when producing the document by
hand;
Actually there is rather a lot of stuff out there that does this.
it is a significant pain for software to recognize that the
opportunity exists to take advantage of a parsed entity, and there
usually isn't much to be gained by doing so.
For parsed entities, yes. Legal boilerplate, tech doc, and chapter
files for long documents are the only real candidates.
Parameter entities are a different matter.
Entities had value when most docs were produced by humans pounding on
raw XML text; they really aren't useful for docs produced by smarter
editors. Most of the things you might still want to use them for can be
handled better by an appropriate tool -- an editor that lets you see and
enter the actual characters rather than their named equivalents, for
This refers to character entities. Sadly, editors are still in their
infancy when it comes to the interface (hence my thesis topic), and
there are still a gazillion so-called plaintext editors (non-XML) out
there that XML beginners use, which seriously screws up their chances
when they start editing UTF-8. For this reason, several companies and
projects I have been dealing with have made it policy for the moment
to create ISO-8859-1 files only, and ALL other characters go in as
character entity references or numeric references (fortunately for them
they deal only with western languages in Latin scripts).
example, or a syntax that's actually defined in the document rather than
in a non-tag-language secondary file. Among other things, that permits
different documents to reference different resource rather than having
only a single set, hard-wired into the DTD, that they can name.
I did put it in the imperfect tense...
Sorry, I was being deliberately provocative.
Part of the problem is that we're
finding that the need for a portable syntax for documents referencing
other documents isn't as universal as we expected. Or at least isn't so
right now.
Ahead of the curve as usual
Although the demand for a syntax to
refer from one document to another is slowly approaching FAQ-level.
It's just embarrassing that we had multi-way bidirectional 3rd-party
linking in the Panorama plugin a decade ago, and still nothing to
replace it.
If we'd designed XML completely before releasing it to the public,
We'd still be discussing it.
would have started with the infoset (including namespaces and schemas
and includes and links), then designed the syntax and APIs from that,
Instead the W3C started with the syntax and a known-inadequate schema
language (DTDs), and has build everything out from there. The upside is
that folks had a chance to start using XML much earlier, and we've
gotten some benefit from seeing which directions everyone has gone with
I like the description, although I disagree about the infoset. Coming
from the tech doc background, I would have preferred to see some of the
useful SGML features retained and more attention paid to the usability
of markup. Pretending that a document is a tree when it's not (it's a
document!) was a mistake we are still paying for. Starting with the
syntax was OK, IMHO, and pretty much 99% of what we did was right. But
schemas were a later development, a bolt-on which only came when the
XML-Data folks saw the market for the syntax (and that's something else
we'll end up paying for -- I see way too many slabs of data being done
into XML when CSV would be much more sensible).
it. The downside is that there have been some warts and hiccups and
direction changes along the way, and tools have not always been quick to
catch up -- and even when they have, folks who have working solutions
using the old stopgaps are often reluctant to make the effort to move
over.
This is going to be the interesting bit. New tools -- *really good* new
tools -- are few and far between. And there are too many good old tools
which have become unavailable just at the point when they were most
needed, because of corporate buyouts resulting in technically-unaware
people dropping the ball.
Which leaves all of us with the job of supporting multiple ways of
doing things and trying to gently push folks toward the ones that will
make their life -- and ours -- easier in the long run.
It does work eventually. I've only had one breakage so far, and that was
due to sabotage.
Oh well. The cutting edge usually has a few nicks in it.
Mind that axe, Eugene.
///Peter