T
Tuomas Rannikko
Hello,
I'm currently writing a XML processor for the fun of it. There is
something I don't understand in the spec though. I'm obviously missing
something important.
The spec states that both Internal General and Character references are
included when referenced in content. And "included" means:
<quote>
4.4.2 Included
[Definition: An entity is included when its replacement text is
retrieved and processed, in place of the reference itself, as though it
were part of the document at the location the reference was recognized.]
The replacement text MAY contain both character data and (except for
parameter entities) markup, which MUST be recognized in the usual way.
(The string "AT&T;" expands to "AT&T;" and the remaining ampersand
is not recognized as an entity-reference delimiter.) A character
reference is included when the indicated character is processed in place
of the reference itself.
</quote>
If I understand correctly the specification contradicts itself when it
says the replacement text is processed in place of the reference itself
and markup MUST be recognized. Shouldn't the "&T;" in "AT&T;" then be
actually BE recognized? I understand that if it actually were recognized
then the character '&' could not be expressed in XML (nor '<' for that
matter). The question is then, when should the markup in the replacement
text be recognized and when it shouldn't?
Thank you in advance for your reply.
- Tuomas
I'm currently writing a XML processor for the fun of it. There is
something I don't understand in the spec though. I'm obviously missing
something important.
The spec states that both Internal General and Character references are
included when referenced in content. And "included" means:
<quote>
4.4.2 Included
[Definition: An entity is included when its replacement text is
retrieved and processed, in place of the reference itself, as though it
were part of the document at the location the reference was recognized.]
The replacement text MAY contain both character data and (except for
parameter entities) markup, which MUST be recognized in the usual way.
(The string "AT&T;" expands to "AT&T;" and the remaining ampersand
is not recognized as an entity-reference delimiter.) A character
reference is included when the indicated character is processed in place
of the reference itself.
</quote>
If I understand correctly the specification contradicts itself when it
says the replacement text is processed in place of the reference itself
and markup MUST be recognized. Shouldn't the "&T;" in "AT&T;" then be
actually BE recognized? I understand that if it actually were recognized
then the character '&' could not be expressed in XML (nor '<' for that
matter). The question is then, when should the markup in the replacement
text be recognized and when it shouldn't?
Thank you in advance for your reply.
- Tuomas