Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Help with xml.parsers.expat please?

Reply
Thread Tools

Help with xml.parsers.expat please?

 
 
Will Stuyvesant
Guest
Posts: n/a
 
      07-04-2003
There seems to be no XML parser that can do validation in
the Python Standard Libraries. And I am stuck with Python
2.1.1. until my web master upgrades (I use Python for
CGI). I know pyXML has validating parsers, but I can not
compile things on the (unix) webserver. And even if I
could, the compiler I have access to would be different
than what was used to compile python for CGI.

I need to write a CGI script that does XML validation (and
then later also does other things). It does not have to
be complete standards compliant validation but at least it
should check if elements are declared and allowed in
special places in the XML tree.

I tried to understand SAX and DOM but I gave up, and
effbot advises to avoid them anyway. So I am studying
xml.parsers.expat now, but I am stuck.

The program below *does* print information about DOCTYPE
declarations but nothing about the element definitions in
the DTD. I feed it an XML file with a DOCTYPE declaration
like <!DOCTYPE ROOTTAG SYSTEM "MYDTD.DTD"> and the DTD is
in the same directory. I also tried inputting the DTD
itself to this program but that doesn't work either
(ExpatError: syntaxerror at the first element definition).

Please help if you can.




# file: minimal_validate.py
#
import xml.parsers.expat

def element_decl_handler(name, model):
print 'ELEMENT definition: ', name, ' model: ', model

def doctype_decl_handler(doctypeName, systemId, publicId, has_internal_subset):
print 'DOCTYPE declaration: '
print ' doctypeName: ', doctypeName
print ' systemId: ', systemId
print ' publicId:', publicId
print ' internal subset:', has_internal_subset

p = xml.parsers.expat.ParserCreate()

p.ElementDeclHandler = element_decl_handler
p.StartDoctypeDeclHandler = doctype_decl_handler

import sys
input = file(sys.argv[1]).read()
p.Parse(input)
 
Reply With Quote
 
 
 
 
Alan Kennedy
Guest
Posts: n/a
 
      07-04-2003
Will Stuyvesant wrote:

> There seems to be no XML parser that can do validation in
> the Python Standard Libraries. And I am stuck with Python
> 2.1.1. until my web master upgrades (I use Python for
> CGI). I know pyXML has validating parsers, but I can not
> compile things on the (unix) webserver. And even if I
> could, the compiler I have access to would be different
> than what was used to compile python for CGI.


So it didn't work out with xmlproc? Isn't xmlproc a pure python
parser that you should be able to drop in and run without
compiling anything?

> I need to write a CGI script that does XML validation (and
> then later also does other things). It does not have to
> be complete standards compliant validation but at least it
> should check if elements are declared and allowed in
> special places in the XML tree.


I think you would be much more likely to get constructive help
if you posted some examples of the tree structures and data
that you're processing.

> I tried to understand SAX and DOM but I gave up, and
> effbot advises to avoid them anyway. So I am studying
> xml.parsers.expat now, but I am stuck.


SAX and DOM aren't solutions, they're tools. They are simply
different ways to accessing the contents of an XML document.
They may or may not be suitable for your problem, depending
on a wide variety of considerations.

I think the problem needs to be clearly defined before an
appropriate solution can be reached.

> The program below *does* print information about DOCTYPE
> declarations but nothing about the element definitions in
> the DTD. I feed it an XML file with a DOCTYPE declaration
> like <!DOCTYPE ROOTTAG SYSTEM "MYDTD.DTD"> and the DTD is
> in the same directory. I also tried inputting the DTD
> itself to this program but that doesn't work either
> (ExpatError: syntaxerror at the first element definition).
>
> Please help if you can.
>
> # file: minimal_validate.py
> #
> import xml.parsers.expat
>
> def element_decl_handler(name, model):
> print 'ELEMENT definition: ', name, ' model: ', model
>
> def doctype_decl_handler(doctypeName, systemId, publicId, has_internal_subset):
> print 'DOCTYPE declaration: '
> print ' doctypeName: ', doctypeName
> print ' systemId: ', systemId
> print ' publicId:', publicId
> print ' internal subset:', has_internal_subset
>
> p = xml.parsers.expat.ParserCreate()
>
> p.ElementDeclHandler = element_decl_handler
> p.StartDoctypeDeclHandler = doctype_decl_handler
>
> import sys
> input = file(sys.argv[1]).read()
> p.Parse(input)


I think you need to do some reading on what SAX does. In summary, it
gives you the pieces of an XML document, in a series of function
callbacks. You've got to do something with the pieces that you're
given.
SAX won't solve your problem any more than anything else unless you
know what pieces you are receiving, and are doing something with them.

One memory efficient way of building up a document in memory is to
create a python object to represent every element, and with each
"element object" being a (python) attribute of its parent. It's a lot
easier than it sounds, and can be read about here

http://aspn.activestate.com/ASPN/Coo.../Recipe/149368

And you can read about SAX in general here

http://www.devarticles.com/art/1/383/2
http://www-106.ibm.com/developerwork...ipsaxflex.html

The latter is a good example from Uche Ogbuji about extracting pieces
of a document from a SAX stream, which might be easily adaptable to
your
problem.

But I still think you'd be better to describe the problem as simply as
you can here, rather than fumbling around.

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help Help Help Pentax S5i Help needed (Please) The Martian Digital Photography 14 06-20-2008 07:56 AM
HELP - HELP - HELP =?Utf-8?B?S2ltb24gSWZhbnRpZGlz?= ASP .Net 4 03-09-2006 12:46 PM
HELP WANTED HELP WANTED HELP WANTED Harvey ASP .Net 1 07-16-2004 01:12 PM
HELP WANTED HELP WANTED HELP WANTED Harvey ASP .Net 0 07-16-2004 10:00 AM
HELP! HELP! HELP! Opening Web Application Project Error =?Utf-8?B?dHJlbGxvdzQyMg==?= ASP .Net 0 02-20-2004 05:16 PM



Advertisments