Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Quick question on the presence of CDATA

Reply
Thread Tools

Quick question on the presence of CDATA

 
 
Dilip
Guest
Posts: n/a
 
      10-25-2006

I have been out of the XML world for a while and have sort of forgotten
the exact difference between:

<Symbol><![CDATA[IBM]]></Symbol>

and just:

<Symbol>IBM</Symbol>

Can anyone tell me why one is preferred over the other?

thanks!

 
Reply With Quote
 
 
 
 
Joseph Kesselman
Guest
Posts: n/a
 
      10-25-2006
Followup to the Microsoft list doesn't work through my servers, so
answering here...


Dilip wrote:
> <Symbol><![CDATA[IBM]]></Symbol>
> <Symbol>IBM</Symbol>


Identical meaning, since there aren't any special characters in the value.

<!CDATA[]]> sections are an alternative to character-by-character
escaping of characters that would otherwise confuse XML syntax (such as
"<" and "&"). It escapes its entire contents -- with the exception of
any ]]> sequences, which require special handling.

Generally the only time you care about this is when you're hand-editing
XML, want to drop non-XML text into the value of an XML element (note
that you can't use this kluge for attribute values), and are too lazy to
fix it up by hand. If you build your XML using any XML-aware tool, it
should take care of the escaping for you and you don't have to care
whether it escapes individual characters or uses <!CDATA[]]>


--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
 
Reply With Quote
 
 
 
 
Dilip
Guest
Posts: n/a
 
      10-25-2006
Joseph Kesselman wrote:
> Followup to the Microsoft list doesn't work through my servers, so
> answering here...
>
>
> Dilip wrote:
> > <Symbol><![CDATA[IBM]]></Symbol>
> > <Symbol>IBM</Symbol>

>
> Identical meaning, since there aren't any special characters in the value.
>
> <!CDATA[]]> sections are an alternative to character-by-character
> escaping of characters that would otherwise confuse XML syntax (such as
> "<" and "&"). It escapes its entire contents -- with the exception of
> any ]]> sequences, which require special handling.
>
> Generally the only time you care about this is when you're hand-editing
> XML, want to drop non-XML text into the value of an XML element (note
> that you can't use this kluge for attribute values), and are too lazy to
> fix it up by hand. If you build your XML using any XML-aware tool, it
> should take care of the escaping for you and you don't have to care
> whether it escapes individual characters or uses <!CDATA[]]>


Just so that I got this straight, from the standpoint of the XML parser
does the 2 forms of elements make a difference? I mean, if I use XPath
to locate that element to retrieve its value, will I get back IBM or
something else?

Sorry if the question sounds stupid. I remember what CDATA is about
but I have forgotten what happens when a parser encounters it. (It
probably just treats whatever is inside as plain text, right?)

 
Reply With Quote
 
Joseph Kesselman
Guest
Posts: n/a
 
      10-25-2006
Dilip wrote:
> Just so that I got this straight, from the standpoint of the XML parser
> does the 2 forms of elements make a difference? I mean, if I use XPath
> to locate that element to retrieve its value, will I get back IBM or
> something else?


XPath doesn't distinguish the two; both yield IBM.

Parsers *CAN* distinguish the two, for the convenience of editors and
other tools which want to be able to display syntax as well as semantics
-- but aren't required to and often don't unless you ask them to.

> probably just treats whatever is inside as plain text, right?)


Modulo the difference in how escaping is handled, yes, pretty much. A
SAX parser may tell the application that it's now inside the bounds of a
CDATA section; the app needs to decide whether to listen for lexical
events and whether it cares about this one. A DOM (depending on how the
builder is configured) may display the data using a CDATASection Node
rather than a Text Node, but the former is a subclass of the latter so
again that doesn't matter unless the application cares about the difference.

As far as the XML Infoset is concerned, <![CDATA[&a<]]> is just a
representation of the character sequence &a< and is identical to
&amp;a&lt; or &a< or &#x26;a&#x3c; or any of the other possible
combinations. The Infoset considers the differences between these to be
No Difference.

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I un-CDATA my CDATA section and elaborate a transformation for the contained data? troppfigo@excite.it XML 3 03-06-2006 03:01 AM
Quick question, hopefully quick answer. ~misfit~ NZ Computing 114 01-06-2005 01:36 PM
Extracting CDATA Text without CDATA Tags??? John Davison Java 1 07-06-2004 11:00 PM
Quick Question Quick Answer JKop C++ 11 05-24-2004 09:46 PM



Advertisments