Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: XML

Reply
Thread Tools

Re: XML

 
 
Paul Boddie
Guest
Posts: n/a
 
      06-24-2003
"A.M. Kuchling" <> wrote in message news:<>...
>
> I've come to the conclusion that the initial concern for supporting APIs
> such as SAX and DOM in Python was a mistake; many bugs stem from trying to
> support interfaces that don't map to Python very well. Instead we should
> have made nice Pythonic interfaces such as effbot's ElementTree which are
> simpler to implement and to use, and ignore the W3C's APIs.


The thing is that Python and its developers quite often have to live
(and work) alongside other technologies; having a set of common APIs
is important if you consider them in that context. The other issue is
that the Python community isn't always supreme at standardising
things, working through the edge cases, and so on, and one might well
argue that the DOM specification has at least had a lot of attention
on most areas to be considered generally robust. I haven't seriously
looked at most of the "Pythonic" APIs, but I would be quite concerned
about interoperability, how comprehensive they are (with respect to
representing all XML details), and what the options are for increasing
performance without pervasive source code changes.

> SAX isn't too bad in this respect[1] -- pull APIs look pretty similar,
> differing only in interface and method names -- but the DOM has been a mess,
> spawning several implementations all of which are fairly complex.


I like the work that Andrew Clover did with regard to testing the
different DOM implementations for Python:

http://mail.python.org/pipermail/xml...ne/009560.html

I'll agree that it's probably quite demanding to write a DOM
implementation and support various levels of compliance. However, I'd
argue that as a user of such implementations, one does gain from the
broad compatibility between implementations - certainly, I've used
cDomlette and minidom interchangeably for some time with the only
major issue being a library "collision" around Expat and mod_python
with cDomlette.

> [1] SAX is a _de facto_ API, not a W3C one; perhaps that explains why it's
> not too bad.


Certainly, the W3C DOM had dubious beginnings, but I don't personally
buy into the widespread arguments that it is found seriously lacking
in a number of supposedly key criteria. Or at least, I don't really
see many of the supposedly better alternatives as being noticeably
better, especially when DOM as a "platform" supports some very useful
technologies indeed.

Paul
 
Reply With Quote
 
 
 
 
A.M. Kuchling
Guest
Posts: n/a
 
      06-24-2003
On 24 Jun 2003 01:52:27 -0700,
Paul Boddie <> wrote:
> The thing is that Python and its developers quite often have to live
> (and work) alongside other technologies; having a set of common APIs
> is important if you consider them in that context.


In practice, there's no way to access those other technologies from Python.
There's a Python wrapper for the Xerces DOM implementation, but I never hear
about anyone using it; there's a wrapper for libxml2, but it has its own
API that's somewhat similar to ElementTree (but not as nice to use --
someone should fix that, because libxml2 is blazingly fast). So any DOM
implementation you use will likely have been built by the Python world, and
could have been written to a standard interface. Jython users could use the
Python interface or use the Jython mapping of Java interfacers.

When you think about it: how useful is it that the Python DOM interface uses
the same method names as the Java or Perl interface? What's gained by this?
I initially thought there might be some gain from being able to use material
written for other languages to learn the API, but don't know how most users
learn the DOM; do they read the DOM Recommendation, look at tutorials, read
the implementation source, or just copy existing code?

In this context, I find the existence of jDOM, a Java-centric DOM-like API,
to support this view. There's even a jDOM JSR, the Java world's equivalent
of a PEP.

--amk
 
Reply With Quote
 
 
 
 
Alan Kennedy
Guest
Posts: n/a
 
      06-24-2003
"A.M. Kuchling" wrote:

> When you think about it: how useful is it that the Python DOM
> interface uses the same method names as the Java or Perl interface?
> What's gained by this?


Code interoperability. This is very important, given the "glue" like
nature of many uses of python, for scripting COM, Java, .NET, etc. So
I can do things like this (off the top of my head, not tested)

def loadDOM(filename):
try:
from win32com.client import Dispatch
msxml = Dispatch('Msxml2.DOMDocument.4.0')
domtree = msxml.load(filename)
except ImportError:
import xml.dom.minidom
domtree = xml.dom.minidom.parse(filename)
return domtree

dom = loadDOM('myfile.xml')
for anchor in dom.getElementsByTagName('a'):
print "Link: %s" % anchor.getAttribute('href')

> I initially thought there might be some gain from being able to use
> material written for other languages to learn the API, but don't
> know how most users learn the DOM; do they read the DOM Recommendation,
> look at tutorials, read the implementation source, or just copy
> existing code?


I read the DOM Recommendation & But I did all of the others as well,
at different stages, and I think most people end up doing more than one
as well.

> In this context, I find the existence of jDOM, a Java-centric
> DOM-like API, to support this view. There's even a jDOM JSR,
> the Java world's equivalent of a PEP.


It's a pity that the JDOM isn't very well designed for extensibility,
as opposed to DOM4J, which is so extensible that it has a steep
learning curve. (I actually ended up writing my own minimal read-only
XOM for a Java app, because it was quicker than trying to get JDOM or
DOM4J to do what I needed. I must find the time to open source that
one of these days, with its jaxen adapter).

Which I think illustrates the simple point that many interfaces are
needed in different scenarios: it's "horses for courses". Sometimes
one needs interoperability, as per the 1st example above. Sometimes one
only needs simplicity, so something pythonic like elementree or pyxie
is suitable. Other times, requirements are somewhere in the middle of
those two.

I generally find that interoperability is almost always worth the
pain, if the code is going to be used for any period of time. When use
cases change and, for example, processing volumes increase, or I need
to (schema)validate documents, then interoperable code greatly simplifies
the problem, because I can switch seamlessly to a high-performance DOM,
or one that does validation, or supports
xpath/relaxng/xpointer/events//whatever.

regards,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
 
Reply With Quote
 
Paul Boddie
Guest
Posts: n/a
 
      06-27-2003
(Paul Boddie) wrote in message news:<. com>...
>


[libxml2]

> Yes, it's very tempting to write a PyXML-style DOM API for it. Then we
> can use XPath (whether it be from PyXML or 4Suite) on our documents
> without having to port our source code just because some underlying
> implementation detail has changed.


Minor correction on my part, here: since libxml2 provides an XPath
implementation, the use of the PyXML/4Suite XPath implementations on
top of a libxml2 DOM layer wouldn't be strictly necessary, but it
might be nice to harmonise the APIs so that XPath contexts and queries
are accessed in the same way for all available implementations. Having
tried libxml2 and libxslt out recently, I certainly agree that they
seem very fast in comparison to other XML processing libraries.

Paul
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Print XML parsing to JspWriter (out) Class org.xml.sax.helpers.NewInstance can not access a member of class javax.xml.parsers.SAXParser with modifiers "protected" Per Magnus L?vold Java 0 11-15-2004 02:27 PM
embedding xml in xml as non-xml :) Mark Van Orman XML 5 09-15-2004 05:57 AM
What XML technologies to learn first for "XML Processing" and "XML Mapping"? Bomb Diggy Java 0 07-28-2004 07:26 AM
Help on including one XML document within another XML document using XML Schemas Tony Prichard XML 0 12-12-2003 03:18 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57