Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   XML (http://www.velocityreviews.com/forums/f32-xml.html)
-   -   losing carriage returns in CDATA section - how do I prevent this? (http://www.velocityreviews.com/forums/t168548-losing-carriage-returns-in-cdata-section-how-do-i-prevent-this.html)

CarlosRivera 01-08-2005 10:30 PM

losing carriage returns in CDATA section - how do I prevent this?
 
I am using apache xerces J 2.5.0. I have \r\n feed combinations in the
CDATA sections that get converted to \n (or rather \r gets lost. I am
using sax parsing. I can see in the buffer that is passed that when I
have \n, one character back it has the \r, but the start offset is on
the \n. The source is an XML string, so it did not get lost while
reading the file. In any case, it seems that it should not be removing
the \r in the cdata section during my sax events. I am running this on
windows; so it seems like the bahavior is converting \r\n to \n might be
related. If this is related, this means that the code would not be
portable between unix and windows. It should give it to me as is.
Isn't this one of the purposes of the CDATA? I know that one can put
character entities in the XML and it works, but this is real ugly. We
just want to get some text from source location and put it into the XML
without having to replace \r with 
.

Richard Tobin 01-08-2005 10:37 PM

Re: losing carriage returns in CDATA section - how do I prevent this?
 
In article <xaZDd.499$JI3.119@newssvr14.news.prodigy.com>,
CarlosRivera <CarlosRivera@badnamefornospam.to> wrote:

>I am using apache xerces J 2.5.0. I have \r\n feed combinations in the
>CDATA sections that get converted to \n (or rather \r gets lost.


XML parsers convert CR-LF and CR to LF, so that you don't have to worry
about what platform you're using.

If you really want to preserve CRs, you have to use a character
reference, but think carefully before doing this: XML is a text
format, and dependence on platform-specific line-end sequences
is not usually a good idea.

-- Richard

John C. Bollinger 01-10-2005 01:52 PM

Re: losing carriage returns in CDATA section - how do I prevent this?
 
Richard Tobin wrote:

> In article <xaZDd.499$JI3.119@newssvr14.news.prodigy.com>,
> CarlosRivera <CarlosRivera@badnamefornospam.to> wrote:
>
>
>>I am using apache xerces J 2.5.0. I have \r\n feed combinations in the
>>CDATA sections that get converted to \n (or rather \r gets lost.

>
>
> XML parsers convert CR-LF and CR to LF, so that you don't have to worry
> about what platform you're using.


To be more specific, here is an excerpt from the XML 1.0 spec:

====

2.11 End-of-Line Handling

XML parsed entities are often stored in computer files which, for
editing convenience, are organized into lines. These lines are typically
separated by some combination of the characters CARRIAGE RETURN (#xD)
and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor MUST behave as
if it normalized all line breaks in external parsed entities (including
the document entity) on input, before parsing, by translating both the
two-character sequence #xD #xA and any #xD that is not followed by #xA
to a single #xA character.

====

XML 1.1 generalizes that requirement a bit.


John Bollinger
jobollin@indiana.edu


All times are GMT. The time now is 09:21 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.