Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Escapes Sequences Not Working?

Reply
Thread Tools

Escapes Sequences Not Working?

 
 
Rick Brandt
Guest
Posts: n/a
 
      08-25-2004
If you examine the complete XML below you will see an element "Notes"
consisting of...

<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>

As you can see I have properly (I think) escaped the ampersand (&) with
"&amp;". If I place this XML in a file and open it with Internet Explorer
the ampersand is properly dealt with. In my Java servlet I am using a SAX
parser to parse the XML and write it to a database. When that parser gets
to the "Notes" element all that is returned is the characters up to (not
including) the ampersand in the escape sequence. Everything after that is
truncated. I have found that this will happen with any escape sequence
(since they all start with the ampersand).

I get no errors and the record is written to the database, just with a
truncated Notes field.

Any ideas what I can look for?



<?xml version="1.0"?>
<MBO>
<Record>
<ID>-49781293</ID>
<OrderDate>2004-08-24 15:19:31</OrderDate>
<MemoBillType>5</MemoBillType>
<AccountNum>1</AccountNum>
<BillToAddress>TEST</BillToAddress>
<ShipToAddress>Same as Bill To Address</ShipToAddress>
<RegMgr>John Doe</RegMgr>
<SecCode>308040-860602</SecCode>
<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>
<RequireDate>TEST</RequireDate>
<RackInfo>TEST</RackInfo>
<CallPhoneNumber>TEST TEST</CallPhoneNumber>
<SubRecord_A>
<LineNum>1</LineNum>
<Quantity>1</Quantity>
<PartNum>TEST</PartNum>
<ShipDesignation>TEST</ShipDesignation>
<Price>NULL_VALUE</Price>
<Discount>NULL_VALUE</Discount>
<Notes>TEST TEST TEST</Notes>
</SubRecord_A>
</Record>
</MBO>


 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      08-25-2004


Rick Brandt wrote:

> If you examine the complete XML below you will see an element "Notes"
> consisting of...
>
> <Notes>test replace test[LINE]&amp;[LINE]replace</Notes>
>
> As you can see I have properly (I think) escaped the ampersand (&) with
> "&amp;". If I place this XML in a file and open it with Internet Explorer
> the ampersand is properly dealt with. In my Java servlet I am using a SAX
> parser to parse the XML and write it to a database. When that parser gets
> to the "Notes" element all that is returned is the characters up to (not
> including) the ampersand in the escape sequence. Everything after that is
> truncated. I have found that this will happen with any escape sequence
> (since they all start with the ampersand).


How does your SAX code look? You might get several chunks of character
data as the content of the <Notes> element.

--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
 
 
 
Rick Brandt
Guest
Posts: n/a
 
      08-25-2004
"Martin Honnen" <(E-Mail Removed)> wrote in message
news:412cab43$0$19550$(E-Mail Removed)-online.net...
> How does your SAX code look? You might get several chunks of character
> data as the content of the <Notes> element.


public void characters(char[] ch, int start, int length)
throws SAXException, DataSetException {
try {
if (elementStart) {
elementStart = false;
String s = new String(ch, start, length);

I'm using JBuilder 7 and it has a built in SAX parser object template that
extends DefaultHandler. The problem seems to be with the length argument
on the last line above. If I examine the ch[] array in debug mode it still
has all of the text from the "Notes" element, but the length argument being
passed from the parser is (for some reason) being set to the first
occurrence of an ampersand instead of extending to the element close tag.
So the String s that I use for insertion to the database is truncated.


--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com




 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      08-25-2004
In article <(E-Mail Removed)>,
Rick Brandt <(E-Mail Removed)> wrote:
>I'm using JBuilder 7 and it has a built in SAX parser object template that
>extends DefaultHandler. The problem seems to be with the length argument
>on the last line above. If I examine the ch[] array in debug mode it still
>has all of the text from the "Notes" element, but the length argument being
>passed from the parser is (for some reason) being set to the first
>occurrence of an ampersand instead of extending to the element close tag.
>So the String s that I use for insertion to the database is truncated.


And you don't get more calls to characters() with the rest of the string?
There's no guarantee you will get it all at once.

-- Richard
 
Reply With Quote
 
William Park
Guest
Posts: n/a
 
      08-25-2004
In <comp.text.xml> Rick Brandt <(E-Mail Removed)> wrote:
> If you examine the complete XML below you will see an element "Notes"
> consisting of...
>
> <Notes>test replace test[LINE]&amp;[LINE]replace</Notes>
>
> As you can see I have properly (I think) escaped the ampersand (&)
> with "&amp;". If I place this XML in a file and open it with Internet
> Explorer the ampersand is properly dealt with. In my Java servlet I am
> using a SAX parser to parse the XML and write it to a database. When
> that parser gets to the "Notes" element all that is returned is the
> characters up to (not including) the ampersand in the escape sequence.
> Everything after that is truncated. I have found that this will
> happen with any escape sequence (since they all start with the
> ampersand).
>
> I get no errors and the record is written to the database, just with a
> truncated Notes field.
>
> Any ideas what I can look for?


At least with Expat XML parser, I get 3 calls, ie.
test replace test[LINE]
&
[LINE]replace
So, collect all data until end of <Notes> element.

--
William Park <(E-Mail Removed)>
Open Geometry Consulting, Toronto, Canada
 
Reply With Quote
 
Rick Brandt
Guest
Posts: n/a
 
      08-25-2004
"Richard Tobin" <(E-Mail Removed)> wrote in message
news:cgigsq$26st$(E-Mail Removed)...
> In article <(E-Mail Removed)>,
> Rick Brandt <(E-Mail Removed)> wrote:
> >I'm using JBuilder 7 and it has a built in SAX parser object template

that
> >extends DefaultHandler. The problem seems to be with the length

argument
> >on the last line above. If I examine the ch[] array in debug mode it

still
> >has all of the text from the "Notes" element, but the length argument

being
> >passed from the parser is (for some reason) being set to the first
> >occurrence of an ampersand instead of extending to the element close

tag.
> >So the String s that I use for insertion to the database is truncated.

>
> And you don't get more calls to characters() with the rest of the string?
> There's no guarantee you will get it all at once.


Should I get those "more calls" automatically or do I have to put in some
kind of loop? Why wouldn't Characters() return ALL characters between the
<> and </>? Isn't that what the parser's job is?

I was originally wrapping all of my text elements in CDATA sections, but I
ran into a problem where any CDATA section with the string "replace" in it
raised a Parse Error (previous newsgroup thread where I received no
answers).

I decided I would just escape all of the illegal XML characters instead of
using CDATA and now I have this truncation issue.

I appreciate the help.


--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com



 
Reply With Quote
 
Rick Brandt
Guest
Posts: n/a
 
      08-25-2004
"William Park" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> At least with Expat XML parser, I get 3 calls, ie.
> test replace test[LINE]
> &
> [LINE]replace
> So, collect all data until end of <Notes> element.


OK I found this at a SAX FAQ site...

*****************************************
The ContentHandler.characters() callback is missing data!

Please read the JavaDoc for this method. A parser may split text into any
number of separate chunks, and some characters may be reported using
ignorableWhitespace() instead of this callback. If you want all the text
inside an element, you need to collect the text from the various characters
callbacks into a buffer. Only when you see the endElement event can you be
sure that you have seen all the text, and some of it may really "belong" to
child elements. \
******************************************

This appears to say that I am using the wrong event. It would be a major
re-write to move my code to the EndElement() event, but if I have to I
guess I have to, but then I might have child element characters included
that I don't want? How do I avoid the child element characters? The FAQ
doesn't go into that at all.


--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com



 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      08-25-2004
In article <(E-Mail Removed)>,
Rick Brandt <(E-Mail Removed)> wrote:

>Should I get those "more calls" automatically


Yes. Quite likely you will get thre calls in this case.

>I was originally wrapping all of my text elements in CDATA sections, but I
>ran into a problem where any CDATA section with the string "replace" in it
>raised a Parse Error (previous newsgroup thread where I received no
>answers).


Maybe you should try a different parser!

-- Richard
 
Reply With Quote
 
Rick Brandt
Guest
Posts: n/a
 
      08-25-2004
"Richard Tobin" <(E-Mail Removed)> wrote in message
news:cgipvu$29ml$(E-Mail Removed)...
> In article <(E-Mail Removed)>,
> Rick Brandt <(E-Mail Removed)> wrote:
>
> >Should I get those "more calls" automatically

>
> Yes. Quite likely you will get thre calls in this case.
>
> >I was originally wrapping all of my text elements in CDATA sections, but

I
> >ran into a problem where any CDATA section with the string "replace" in

it
> >raised a Parse Error (previous newsgroup thread where I received no
> >answers).

>
> Maybe you should try a different parser!


AFAIK I am using the one that comes with java 1.4.2_04-b05. The import
statements in my SAX class are...

org.xml.sax.*;
org.xml.sax.helpers.*;




 
Reply With Quote
 
Rick Brandt
Guest
Posts: n/a
 
      08-25-2004
"Rick Brandt" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "William Park" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> > At least with Expat XML parser, I get 3 calls, ie.
> > test replace test[LINE]
> > &
> > [LINE]replace
> > So, collect all data until end of <Notes> element.

>
> OK I found this at a SAX FAQ site...
>
> *****************************************
> The ContentHandler.characters() callback is missing data!
>
> Please read the JavaDoc for this method. A parser may split text into any
> number of separate chunks, and some characters may be reported using
> ignorableWhitespace() instead of this callback. If you want all the text
> inside an element, you need to collect the text from the various

characters
> callbacks into a buffer. Only when you see the endElement event can you

be
> sure that you have seen all the text, and some of it may really "belong"

to
> child elements. \
> ******************************************
>
> This appears to say that I am using the wrong event. It would be a major
> re-write to move my code to the EndElement() event, but if I have to I
> guess I have to, but then I might have child element characters included
> that I don't want? How do I avoid the child element characters? The FAQ
> doesn't go into that at all.


Ok, I found yet another reference...

*********************************************
Note that a SAX driver is free to chunk the character data any way it
wants, so you cannot count on all of the character data content of an
element arriving in a single characters event.
*********************************************

So it appears that this is working "as designed" yet none of the examples I
see on these same pages describe methods for properly dealing with the
characters() event.

Immediately prior to the statement above the site uses an example for
pulling the data from the characters event that clearly will NOT work if
the parser decides to "chunk" the data into multiple pieces.

I guess I will look at collecting the pieces in characters and not writing
them until endElement(). I just wish I could fix the CDATA bug as this was
working fine for 3 or 4 years before that started happening. Either CDATA
forces all of the text in the characters event to be pulled in a single
block or we just got really lucky for all that time because I never saw any
truncation until the CDATA section was removed.


--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
cookielib incorrectly escapes cookie =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?= Python 1 07-09-2006 04:39 PM
escapes in regular expressions James Thiele Python 4 05-21-2006 08:54 PM
convert ascii escapes into binary form Hans-Peter Jansen Python 3 07-20-2005 08:24 PM
Q: quoting string without escapes Xah Lee Python 2 01-31-2005 09:27 PM
re.sub replacement text \-escapes woe Alexander Schmolck Python 4 02-14-2004 02:30 AM



Advertisments