Whitespace in Canonicalized XML

C

Celedor

If I understand correctly, canonicalized XML is a simplified, or
rather, "standardized" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Thank you for your reply...
 
D

Douglas A. Gwyn

Celedor said:
If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Anything that affects how the image will appear is obviously part of
the information.
 
K

Kenneth Stephen

Celedor said:
If I understand correctly, canonicalized XML is a simplified, or
rather, "standardized" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
Hi,

The characteristics and properties of a "presentation" depend very much
on who / what the intended recipient is. In the case of XML, by design,
humans are not the only possible recipients. XML is intended to also convery
data to machines, and these machines should be capable to processing XML
without any ambiguity messing up the works. To accomplish this, XML has
defined a very simple rule : anything in "tags" is XML markup, and
everything else is data.

If you look at the XML spec, you can see that there are different XML
node types defined. One of them is the text node. Consider the example below
:

<a>This is a text node
<ThisIsAnElementNode x="this is an attribute node">This is also a text
node</ThisIsAnElementNode></a>

This is perfectly valid XML. There are no assumptions that you can make
in general about the content of the text nodes. They may be completely
whitespace, or not, and only the recieving application / entity can tell you
if the whitespace is significant. When writing a spec, obviously, the
general case is what needs to be catered to, and hence, pure whitespace text
nodes cannot be "normalized" away.

That being said, the "xml:space" attribute exists to help normalization
of pure whitespace nodes. When the XML / higher-level application processor
(example XSL processor) encounters xml:space, it may or may not normalize -
it depends on the application.

Regards,
Kenneth
 
P

Peter Flynn

Celedor said:
If I understand correctly, canonicalized XML is a simplified, or
rather, "standardized" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Only if you have a DTD or Schema that tells you where PCDATA is allowed.

Without one, you have to assume character data can occur anywhere, which
makes *all* white-space significant.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top