Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > sax EntityResolver problem (expat?)

Reply
Thread Tools

sax EntityResolver problem (expat?)

 
 
chris
Guest
Posts: n/a
 
      06-10-2004
hi,
sax beginner question i must admit:

i try to filter a simple XHTML document with a standard DTD declaration
(<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">) in it.
sax gives the following error

>>> xml.sax._exceptions.SAXParseException: <unknown>:53:8: undefined entity


which is an &nbsp; entity.
so i thought i just implement the EntityResolver class and use a local
copy of the DTD

# ========================
class XHTMLResolver(xml.sax.handler.EntityResolver, object):

def resolveEntity(self, publicId, systemId):
return 'http://localhost/xhtml1-transitional.dtd'

reader = xml.sax.make_parser()
reader.setEntityResolver(XHTMLResolver())
# ========================

problem is, it seems expat does not use this resolver as i get the same
error again. i also tried the following, which is not supported anyhow:

reader.setFeature('http://xml.org/sax/features/external-parameter-entities',
True)
>>> xml.sax._exceptions.SAXNotSupportedException: expat does not read

external parameter entities

is the XHTMLResolver class not the way it should be? or do i have to set
another feature/property?


ultimately i do not want to use the http://localhost copy but i would
like to read the local file (just with open(...) or something) and go
from there. is that possible? do i have to


thanks a lot
chris
 
Reply With Quote
 
 
 
 
Ralf Schmitt
Guest
Posts: n/a
 
      06-11-2004
chris <(E-Mail Removed)> writes:

> hi,
> sax beginner question i must admit:
>
> i try to filter a simple XHTML document with a standard DTD
> declaration (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
> Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">) in it.
> sax gives the following error
>
> >>> xml.sax._exceptions.SAXParseException: <unknown>:53:8: undefined entity

>
> which is an &nbsp; entity.
> so i thought i just implement the EntityResolver class and use a local
> copy of the DTD
>
> # ========================
> class XHTMLResolver(xml.sax.handler.EntityResolver, object):
>
> def resolveEntity(self, publicId, systemId):
> return 'http://localhost/xhtml1-transitional.dtd'
>
> reader = xml.sax.make_parser()
> reader.setEntityResolver(XHTMLResolver())
> # ========================
>
> problem is, it seems expat does not use this resolver as i get the
> same error again. i also tried the following, which is not supported
> anyhow:
>
> reader.setFeature('http://xml.org/sax/features/external-parameter-entities',
> True)
> >>> xml.sax._exceptions.SAXNotSupportedException: expat does not read

> external parameter entities
>
> is the XHTMLResolver class not the way it should be? or do i have to
> set another feature/property?


That's the way it works for me. You can also just open() your dtd'
files and return an open file handle. Note that when using the above
dtd your resolveEntity will be called more than once with different id's.

--------------------------------
from xml.sax import saxutils, handler, make_parser, xmlreader
class Handler(handler.ContentHandler):
def resolveEntity(self, publicid, systemid):
print "RESOLVE:", publicid, systemid

return open(systemid[systemid.rfind('/')+1:], "rb")
def characters(self, s):
print repr(s)

doc = r'''<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<HTML>
&nbsp;&auml;
</HTML>
'''

h = Handler()
parser = make_parser()
parser.setContentHandler(h)
parser.setEntityResolver(h)

parser.feed(doc)
parser.close()
-------
Output:

RESOLVE: -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
RESOLVE: -//W3C//ENTITIES Latin 1 for XHTML//EN xhtml-lat1.ent
RESOLVE: -//W3C//ENTITIES Symbols for XHTML//EN xhtml-symbol.ent
RESOLVE: -//W3C//ENTITIES Special for XHTML//EN xhtml-special.ent
u'\n'
u'\xa0'
u'\xe4'
u'\n'

>
>
> ultimately i do not want to use the http://localhost copy but i would
> like to read the local file (just with open(...) or something) and go
> from there. is that possible? do i have to
>
>
> thanks a lot
> chris


--
brainbot technologies ag
boppstrasse 64 . 55118 mainz . germany
fon +49 6131 211639-1 . fax +49 6131 211639-2
http://brainbot.com/ (E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Repost: Using EntityResolver to NOT resolve a PE Reference =?ISO-8859-1?Q?Ricardo_Palomares_Mart=EDnez?= Java 0 08-19-2006 10:27 AM
Is there an EntityResolver equivalent for xsd imports? Mark Wright XML 0 05-12-2004 03:28 PM
empty EntityResolver for SAX christof hoeke Python 0 12-21-2003 04:04 PM
JAXP:SAX: EntityResolver never used Thomas Scheffler XML 0 11-12-2003 07:38 PM



Advertisments