Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > May a CDATA section appear in an attribute value?

Reply
Thread Tools

May a CDATA section appear in an attribute value?

 
 
Jon Noring
Guest
Posts: n/a
 
      11-14-2005
Out of curiosity, may a CDATA section appear within an attribute
value with datatype CDATA? And if so, how about other attribute
value datatypes which accept the XML markup characters?

To me, the XML specification seems a little ambiguous on this, so
I defer to the XML authorities. Refer to sections 2.4 and 2.7 (it all
hinges on if CDATA attribute values are part of markup or not.)

Thanks.

Jon
 
Reply With Quote
 
 
 
 
mgungora
Guest
Posts: n/a
 
      11-14-2005
As I understand from the XML 1.0 spec, attribute value is a kind of a
literal which cannot start with ...<... or ...&...(unless it's a
reference). So, the answer is "no".

Regards,
-murat

 
Reply With Quote
 
 
 
 
Peter Flynn
Guest
Posts: n/a
 
      11-14-2005
Jon Noring wrote:

> Out of curiosity, may a CDATA section appear within an attribute
> value with datatype CDATA?


No. You can't have declaration markup in attribute values.

> And if so, how about other attribute
> value datatypes which accept the XML markup characters?


No attribute types allow element or declaration markup in their values.

> To me, the XML specification seems a little ambiguous on this,


No, it's quite specific: Production 41, Well-Formedness Constraint:
"No < in Attribute Values"

> I defer to the XML authorities. Refer to sections 2.4 and 2.7 (it all
> hinges on if CDATA attribute values are part of markup or not.)


It doesn't really have anything at all to do with CDATA attribute
values. There is an unfortunately (hereditary) semantic distinction
between what CDATA means in attribute declarations and what CDATA
means in Marked Sections, which you probably don't want to investigate
unless you're a masochist (but it doesn't even have much to do with
that either

It's a restriction in XML that you cannot have the open-angle bracket
in an attribute value. Period. Not for any reason. (You *could* do
this in SGML, but this was one of the sacrifices we had to make to
get a more extensible and easily-programmed language).

If you could give us some idea of what you wanted this for, perhaps
there is another way to solve the problem.

///Peter
--
XML FAQ: http://xml.silmaril.ie/

 
Reply With Quote
 
Jon Noring
Guest
Posts: n/a
 
      11-15-2005
Peter Flynn answered:
> Jon Noring asked:


>> Out of curiosity, may a CDATA section appear within an attribute
>> value with datatype CDATA?


> ... No, it's quite specific: Production 41, Well-Formedness
> Constraint: "No < in Attribute Values"
>
> It's a restriction in XML that you cannot have the open-angle
> bracket in an attribute value. Period. Not for any reason. (You
> *could* do this in SGML, but this was one of the sacrifices we had
> to make to get a more extensible and easily-programmed language).


Thanks! Somehow I missed that particular well-formedness constraint
given in production 41. This constraint clearly trumps any other
ambiguities that there may be about using a CDATA section within
attribute values. No question about it -- CDATA sections must not
appear in attribute values.

Now, to address a slightly different issue, in my reading of that
constraint, it seems like the "<" character may not literally appear
(not as part of any markup) in an attribute value, whether directly
encoded, as a numeric character reference, or as part of a defined
general entity. It leaves out the ability of XML document authors to
use that character, in a literal fashion, within attribute values of
datatype CDATA. For example, this appears to not be allowed (where
"<" == "&#x003C;"):

<header title="Is A &#x003C; B?"> ... </header>


> If you could give us some idea of what you wanted this for, perhaps
> there is another way to solve the problem.


I don't have a particular problem. Rather it's simply trying to gain
a thorough understanding of using CDATA sections in XML documents
from an XML document authoring perspective.

But since you mention it, I am curious to know how an XML document
author may include the literal "<" character in a CDATA attribute
value. As noted above, it does not appear it is possible. Assuming
this indeed is the case, then the only way I can think of to get
around this would be to use a similar Unicode character. For example,
from the Unicode Basic Latin script chart the following are similar
characters:

x2039 single left-pointing angle quotation
x2329 left-pointing angle bracket
x27E8 mathematical left angle bracket
x3008 left angle bracket

But this kludge is still not very satisfying and has presentational
issues.

Thanks.

Jon Noring

 
Reply With Quote
 
Andrew Thompson
Guest
Posts: n/a
 
      11-15-2005
Jon Noring wrote:

> But since you mention it, I am curious to know how an XML document
> author may include the literal "<" character in a CDATA attribute
> value. As noted above, it does not appear it is possible.


<WAG>
Convert to an HTML entity? E.G. < = &lt;
</WAG>
 
Reply With Quote
 
Jon Noring
Guest
Posts: n/a
 
      11-15-2005
Andrew Thompson wrote:
>Jon Noring wrote:


>> But since you mention it, I am curious to know how an XML document
>> author may include the literal "<" character in a CDATA attribute
>> value. As noted above, it does not appear it is possible.


> <WAG>
> Convert to an HTML entity? E.G. < = &lt;
> </WAG>


My prior message noted what the XML 1.0 Spec seems to say about
putting a literal "<" character into an attribute value: it appears
that it can't be done, even with an entity reference.

Here's the relevant section in XML 1.0:

http://www.w3.org/TR/REC-xml/#sec-starttags

Which says:

"Well-formedness constraint: No < in Attribute Values

"The replacement text of any entity referred to directly or
indirectly in an attribute value MUST NOT contain a <."


Now, being a little dense at times, maybe I'm misinterpreting what
the XML spec is saying, but it seems to me that the "<" character may
*never* appear in the attribute value of a well-formed XML document no
matter how it is done, encoded, directly and indirectly.

Am I right?

Jon
 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      11-15-2005
In article <(E-Mail Removed)>,
Jon Noring <(E-Mail Removed)> wrote:

> "The replacement text of any entity referred to directly or
> indirectly in an attribute value MUST NOT contain a <."


The replacement text of the lt attribute is < which does
not contain a <. Note that < is a character reference,
not an entity reference. You can also use < directly in
attributes.

-- Richard
 
Reply With Quote
 
Jon Noring
Guest
Posts: n/a
 
      11-15-2005
Richard Tobin wrote:
> Jon Noring wrote:


>> "The replacement text of any entity referred to directly or
>> indirectly in an attribute value MUST NOT contain a <."


> The replacement text of the lt attribute is < which does
> not contain a <. Note that < is a character reference,
> not an entity reference. You can also use < directly in
> attributes.


Yes, the XML spec does note that a numeric character reference is not
an entity, nor is "&lt;", which is called a "string" even though its
structure suggests an entity reference.

In addition, the original 1998 XML spec, in rule 41, specifically
notes the following:

"The replacement text of any entity referred to directly or
indirectly in an attribute value (other than "&lt;") must not
contain a <."

So, the original intent was to allow "&lt;" to represent the "<"
character in attribute values (and by section 2.4 also allow the
numeric character reference of &#x003C; / < ). Tim Bray
commented on the above constraint in his well-known Annotated XML
Specification: http://www.xml.com/axml/notes/NoLTinAtt.html

"Banishing the < ... This rule might seem a bit unnecessary, on
the face of it. Since you can't have tags in attribute values,
having an < can hardly be confusing, so why ban it?

"This is another attempt to make life easy for the DPH ["Desperate
Perl Hacker"]. The rule in XML is simple: when you're reading text,
and you hit a <, then that's a markup delimiter. Not just
sometimes, always. When you want one in the data, you have to use
&lt;. Not just sometimes, always. In attribute values too.

"This rule has another unintended beneficial side-effect; it makes
the catching of certain errors much easier. Suppose you have a
chunk of XML as follows:

<a href="notes.html> <img src='notes.gif'></a>

"Notice that the notes.html is missing its closing quote. Without
the no-&lt; rule, it would be really hard to detect this problem
and issue a reasonable error message. Since attribute values can
contain almost anything, no error would be detected until the
processor finds the next quotation mark. Instead, you get an error
message the first time you hit a <, which in the example above, as
in many cases, is almost immediately."


So, from the possibilities list I previously posted:

1) <foo bar="is x < y ?">

2) <foo bar="is x &lt; y ?">

3) <foo bar="is x &#x003C; y ?">

4) <foo bar="is x &lessthan; y ?"

a) where in the DTD we have <!ENTITY lessthan "<">

b) where in the DTD we have <!ENTITY lessthan "&lt;">

c) where in the DTD we have <!ENTITY lessthan "&#x003C;">


It would seem like all are permissable except for #1 and #4a since
they involve the literal "<" character.

Am I right on this?

Thanks.

Jon
 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      11-15-2005
Jon Noring wrote:

> Peter Flynn answered:
>> Jon Noring asked:

>
>>> Out of curiosity, may a CDATA section appear within an attribute
>>> value with datatype CDATA?

>
>> ... No, it's quite specific: Production 41, Well-Formedness
>> Constraint: "No < in Attribute Values"
>>
>> It's a restriction in XML that you cannot have the open-angle
>> bracket in an attribute value. Period. Not for any reason. (You
>> *could* do this in SGML, but this was one of the sacrifices we had
>> to make to get a more extensible and easily-programmed language).

>
> Thanks! Somehow I missed that particular well-formedness constraint
> given in production 41. This constraint clearly trumps any other
> ambiguities that there may be about using a CDATA section within
> attribute values. No question about it -- CDATA sections must not
> appear in attribute values.


It's more fundamental than that: CDATA sections are for enclosing
pieces of your document *text* that contain markup characters < and &
that you do not want to be interpreted as markup. For example:

<para>To create the header of your web page, type the following:</para>
<programlisting><![CDATA[
<html>
<head>
<title>My first web page</title>
</head>
]]></programlisting>

I'm curious to know how the question could arise of such data appearing
in an attribute value. It's always very helpful to documentation writers
to understand the thought-processes or reading experiences that lie
behind people's acquisition of knowledge, because it's something that
rarely comes to light, and it can help make documentation more useful.
(If you have the time to explain...offline

> Now, to address a slightly different issue, in my reading of that
> constraint, it seems like the "<" character may not literally appear
> (not as part of any markup) in an attribute value,


Correct.

> whether directly
> encoded, as a numeric character reference, or as part of a defined
> general entity.


The restriction is only on the literal < character itself. The character
entity reference &lt; and the decimal or hexadecimal equivalent are
perfectly valid in CDATA attribute values (indeed some document types
actually rely on this).

> It leaves out the ability of XML document authors to
> use that character, in a literal fashion, within attribute values of
> datatype CDATA. For example, this appears to not be allowed (where
> "<" == "&#x003C;"):
>
> <header title="Is A &#x003C; B?"> ... </header>


No, that's perfectly valid. So is title="Is A&lt;C" (assuming lt is
declared, either explicitly or implicitly).

As I mentioned, SGML allowed markup start characters in attributes,
so <header title="Is A<C; B?">...</> would be OK in SGML. But to make
it easier to write software for XML this feature was withdrawn in XML.

>> If you could give us some idea of what you wanted this for, perhaps
>> there is another way to solve the problem.

>
> I don't have a particular problem. Rather it's simply trying to gain
> a thorough understanding of using CDATA sections in XML documents
> from an XML document authoring perspective.


OK...the objective is as above: to stop the parser from interpreting
markup characters as markup. In a CDATA section, < and & are just
text.

> But since you mention it, I am curious to know how an XML document
> author may include the literal "<" character in a CDATA attribute
> value.


As &lt; or the numeric equivalent.

///Peter

 
Reply With Quote
 
Jon Noring
Guest
Posts: n/a
 
      11-16-2005
Peter Flynn wrote:

> [explaining about the issue of "<" in attribute values]


Peter, thanks! You've clarified the issue very well. Very
valuable information.

Jon
Peter Flynn wrote:

> [explaining about the issue of "<" in attribute values
> in two separate messages.]


Peter, thanks! You've clarified the issue very well. Very
valuable information.

Jon
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I un-CDATA my CDATA section and elaborate a transformation for the contained data? troppfigo@excite.it XML 3 03-06-2006 03:01 AM
losing carriage returns in CDATA section - how do I prevent this? CarlosRivera Java 5 01-16-2005 11:57 PM
Extracting CDATA Text without CDATA Tags??? John Davison Java 1 07-06-2004 11:00 PM
Does XPath text() return CDATA SECTION? Tak Sze XML 2 04-30-2004 12:38 AM
ASCII control characters in CDATA section nowhere@home.com XML 2 11-27-2003 10:32 PM



Advertisments