Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Whitespace in Canonicalized XML

Reply
Thread Tools

Whitespace in Canonicalized XML

 
 
Celedor
Guest
Posts: n/a
 
      12-25-2003
If I understand correctly, canonicalized XML is a simplified, or
rather, "standardized" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Thank you for your reply...
 
Reply With Quote
 
 
 
 
Douglas A. Gwyn
Guest
Posts: n/a
 
      12-25-2003
"Celedor" <> wrote...
> If this is the case, then why is whitespace outside of any tags still
> preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
> Isn't that whitespace only useful for formatting purposes (ie. so that
> it will look pretty on your text viewer)? Or am I missing something
> important?


Anything that affects how the image will appear is obviously part of
the information.


 
Reply With Quote
 
 
 
 
Kenneth Stephen
Guest
Posts: n/a
 
      12-29-2003

"Celedor" <> wrote in message
news: m...
> If I understand correctly, canonicalized XML is a simplified, or
> rather, "standardized" form of XML. It is in such a form such that
> two documents that are written in different ways, but contain the same
> information, will normalize towards one form. This standard form can
> then be used as the basis for encryption or digital verification (such
> as XML Digital Signature).
>
> If this is the case, then why is whitespace outside of any tags still
> preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
>

Hi,

The characteristics and properties of a "presentation" depend very much
on who / what the intended recipient is. In the case of XML, by design,
humans are not the only possible recipients. XML is intended to also convery
data to machines, and these machines should be capable to processing XML
without any ambiguity messing up the works. To accomplish this, XML has
defined a very simple rule : anything in "tags" is XML markup, and
everything else is data.

If you look at the XML spec, you can see that there are different XML
node types defined. One of them is the text node. Consider the example below
:

<a>This is a text node
<ThisIsAnElementNode x="this is an attribute node">This is also a text
node</ThisIsAnElementNode></a>

This is perfectly valid XML. There are no assumptions that you can make
in general about the content of the text nodes. They may be completely
whitespace, or not, and only the recieving application / entity can tell you
if the whitespace is significant. When writing a spec, obviously, the
general case is what needs to be catered to, and hence, pure whitespace text
nodes cannot be "normalized" away.

That being said, the "xml:space" attribute exists to help normalization
of pure whitespace nodes. When the XML / higher-level application processor
(example XSL processor) encounters xml:space, it may or may not normalize -
it depends on the application.

Regards,
Kenneth


 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      01-24-2004
Celedor wrote:
> If I understand correctly, canonicalized XML is a simplified, or
> rather, "standardized" form of XML. It is in such a form such that
> two documents that are written in different ways, but contain the same
> information, will normalize towards one form. This standard form can
> then be used as the basis for encryption or digital verification (such
> as XML Digital Signature).
>
> If this is the case, then why is whitespace outside of any tags still
> preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
>
> Isn't that whitespace only useful for formatting purposes (ie. so that
> it will look pretty on your text viewer)? Or am I missing something
> important?


Only if you have a DTD or Schema that tells you where PCDATA is allowed.

Without one, you have to assume character data can occur anywhere, which
makes *all* white-space significant.

///Peter

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Splitting text at whitespace but keeping the whitespace in thereturned list MRAB Python 3 01-26-2010 11:36 PM
Structure using whitespace vs logical whitespace cmdrrickhunter@yaho.com Python 10 12-16-2008 03:51 PM
Creating a canonicalized url Dan Cuddeford Ruby 9 01-26-2008 12:18 AM
xml.parsers.expat loading xml into a dict and whitespace kaens Python 6 05-23-2007 11:44 AM
Whitespace where I don't want whitespace! Oli Filth HTML 9 01-17-2005 08:47 PM



Advertisments