Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > SAX parsing problem

Reply
Thread Tools

SAX parsing problem

 
 
anon
Guest
Posts: n/a
 
      03-16-2005
So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.

Any help/info would be much appreciated.

gh
 
Reply With Quote
 
 
 
 
David M. Cooke
Guest
Posts: n/a
 
      03-16-2005
anon <(E-Mail Removed)> writes:

> So I've encountered a strange behavior that I'm hoping someone can fill
> me in on. i've written a simple handler that works with one small
> exception, when the parser encounters a line with '&' in it, it
> only returns the portion that follows the occurence.
>
> For example, parsing a file with the line :
> <key>mykey</key><value>some%20&%20value</value>
>
> results in getting "%20value" back from the characters method, rather
> than "some%20&%20value".
>
> After looking into this a bit, I found that SAX supports entities and
> that it is probably believing the & to be an entity and processing
> it in some way that i'm unware of. I'm using the default
> EntityResolver.


Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
 
Reply With Quote
 
 
 
 
gh
Guest
Posts: n/a
 
      03-16-2005
In article <(E-Mail Removed)>, David M.
Cooke <(E-Mail Removed)> wrote:

> anon <(E-Mail Removed)> writes:
>
> > So I've encountered a strange behavior that I'm hoping someone can fill
> > me in on. i've written a simple handler that works with one small
> > exception, when the parser encounters a line with '&' in it, it
> > only returns the portion that follows the occurence.
> >
> > For example, parsing a file with the line :
> > <key>mykey</key><value>some%20&%20value</value>
> >
> > results in getting "%20value" back from the characters method, rather
> > than "some%20&%20value".
> >
> > After looking into this a bit, I found that SAX supports entities and
> > that it is probably believing the & to be an entity and processing
> > it in some way that i'm unware of. I'm using the default
> > EntityResolver.

>
> Are you sure you're not actually getting three chunks: "some%20", "&",
> and "%20value"? The xml.sax.handler.ContentHandler.characters method
> (which I presume you're using for SAX, as you don't mention!) is not
> guaranteed to get all contiguous character data in one call. Also check
> if .skippedEntity() methods are firing.


Ya, skippedEntity() wasn't firing, but you are correct about receiving
three chunks. The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
doing this?

Much thanks.

gh
 
Reply With Quote
 
Uche Ogbuji
Guest
Posts: n/a
 
      03-23-2005
On Wed, 2005-03-16 at 00:14 -0800, gh wrote:
> The characters handler routine is fired 3 times for a
> single text block. Why does it do this? Is there a way to prevent
> doing this?


Continuing in the vein of closing matters cross-posted to XML-SIG:

http://mail.python.org/pipermail/xml...ch/011013.html

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerwork...xmlcss2-i.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xm...x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xm...-tiplook2.html

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SAX parsing problem, when element contains text like "[text]" Kai Schlamp Java 1 03-27-2008 08:36 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Problem parsing XML with SAX in WSAD - Please help! Jonathan XML 0 10-28-2003 01:59 PM
SAX parsing problem silviu XML 4 09-20-2003 07:00 AM



Advertisments