DTD or Schema -- Ignore Undefined Tags

G

gregmcmullinjr

I am wondering if there is a way to use a DTD or Schema to instruct an
XML parser to ignore tags that are not defined.

That is, if my list of acceptable tags is <body> and <content>, then in
the following example:

<body>
We may have some text <b>and some <u>other tags</u></b>
<content> but I want the text and undefined tags to be part of the
text-node
of the body tag.
</content>
</body>

So the tree would be like:
<body>
#Text
<content>
#Text
</content>
</body>

I want the first text node to contain "We may have some text <b>and
some <u>other tags</u></b>"

Is there some way of doing this with Schemas or DTDs? Or perhaps using
a stylesheet?

Using a stylesheet I would need to do find a way of matching all tags
that arent in a certain list and then re-writing them with $lt;
entities I suppose, but I'm really not sure what the best way to do
this is.

Any help is appreciated,

Greg
 
J

Joseph Kesselman

Using a stylesheet I would need to do find a way of matching all tags
that arent in a certain list and then re-writing them with $lt;
entities I suppose

Not a good solution. Elements are semantically meaningful; &lt;foo&gt;
is NOT the same thing as <foo>.

If you're working with schemas, you can use xsd:any with lax validation
to indicate that the contents of certain elements should be accepted
even if not valid.

Another alternative, of course, is to insist only on well-formed
documents and not attempt to validate them.
 
P

Peter Flynn

I am wondering if there is a way to use a DTD or Schema to instruct an
XML parser to ignore tags that are not defined.

No.A schema or DTD is for doing exactly the reverse: enforcing the use
only of elements that have been declared.

BTW elements, not "tags": see http://xml.silmaril.ie/authors/makeup/
That is, if my list of acceptable tags is <body> and <content>, then in
the following example:

<body>
We may have some text <b>and some <u>other tags</u></b>
<content> but I want the text and undefined tags to be part of the
text-node
of the body tag.
</content>
</body>

So the tree would be like:
<body>
#Text
<content>
#Text
</content>
</body>

If you want to do this, process the XML in non-validated mode, just
well-formed but with no DTD or schema.
I want the first text node to contain "We may have some text <b>and
some <u>other tags</u></b>"

Is there some way of doing this with Schemas or DTDs? Or perhaps using
a stylesheet?

XSLT is your friend.
Using a stylesheet I would need to do find a way of matching all tags
that arent in a certain list and then re-writing them with $lt;
entities I suppose, but I'm really not sure what the best way to do
this is.

Whoah! This is a different question entirely. Are you implying that you
still want to *keep* the otherwise unrecognised element markup? Your
example above implied that you wanted to discard it.

You definitely don't want to fiddle with making them all &lt;...&gt; --
that way madness lies. See http://xml.silmaril.ie/authors/html/

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top