![]() |
DOM implementation
Hi everybody,
I just spent the past hour or so trying to have a better understanding of how the various DOM-supporting libraries (xml.dom, xml.dom.minidom) work. I've used etree and lxml successfully before but I wanted to understand how close I can get to the W3C DOM standards. Ok, I think more or less I got it all. A few questions emerged: 1) classes in xml.dom.minidom (i.e. Element) seem to be old style classes. Is there a good reason they are kept that way or simply nobody had the time/will to update the library to use new-style classes? 2) for a lightweight implementation xml.dom.minidom comes with a lot of methods that aren't part of the W3C standards. I'm referring to toxml, toprettyxml, writxml and the _get_* family. Would it be better if there was a package offering W3C-faithful classes only, on top of which convenience and compatibility methods are added by another package (or two!) through subclassing? Manu |
Re: DOM implementation
On 13 Mai, 18:08, "Emanuele D'Arrigo" <man...@gmail.com> wrote:
> > I just spent the past hour or so trying to have a better understanding > of how the various DOM-supporting libraries (xml.dom, xml.dom.minidom) > work. I've used etree and lxml successfully before but I wanted to > understand how close I can get to the W3C DOM standards. You might want to look at pxdom if you want a high level of compliance with W3C DOM standards: http://www.doxdesk.com/software/py/pxdom.html > Ok, I *think more or less I got it all. A few questions emerged: > > 1) classes in xml.dom.minidom (i.e. Element) seem to be old style > classes. Is there a good reason they are kept that way or simply > nobody had the time/will to update the library to use new-style > classes? I imagine that no-one bothered to update the code. The built-in modules like minidom do get maintenance, but not much further development. (PyXML, which seemed to accumulate code from 4Suite, possibly contributed code to the standard library, but it doesn't seem to be actively maintained or developed any more.) > 2) for a lightweight implementation xml.dom.minidom comes with a lot > of methods that aren't part of the W3C standards. I'm referring to > toxml, toprettyxml, writxml and the _get_* family. Would it be better > if there was a package offering W3C-faithful classes only, on top of > which convenience and compatibility methods are added by another > package (or two!) through subclassing? Those methods probably don't add that much weight, considering the weight that the W3C facilities already necessitate. I attempted to make a somewhat W3C-compliant implementation with the libxml2dom package (http://pypi.python.org/pypi/libxml2dom), although I felt that providing PyXML-like conveniences (similar to those you describe) was beneficial: some of the W3C APIs for parsing and serialisation are baroque, and although I've tried to implement some of those, too, I feel that it isn't a good use of my time. Paul |
Re: DOM implementation
Thank you Paul for your reply!
I'm looking into pxdom right now and it looks very good and useful! Thank you again! Manu |
Re: DOM implementation
Hey Paul,
would you mind continuing this thread on Python + DOM? I'm trying to implement a DOM Events-like set of classes and I could use another brain that has some familiarity with the DOM to bounce ideas with. If you are too busy never mind. Also, I thought of keeping the discussion here rather than via email, for the benefit of current and future readers. Manu |
Re: DOM implementation
On 15 Mai, 15:23, "Emanuele D'Arrigo" <man...@gmail.com> wrote:
> Hey Paul, > > would you mind continuing this thread on Python + DOM? I'm trying to > implement a DOM Events-like set of classes and I could use another > brain that has some familiarity with the DOM to bounce ideas with. If > you are too busy never mind. Also, I thought of keeping the discussion > here rather than via email, for the benefit of current and future > readers. Sure! Just keep your observations coming! I've made a very lazy attempt at DOM Events support in libxml2dom, since it looked as if it might be necessary when providing elementary SVG Tiny support (which also isn't finished), although I find these things quite hard to figure out with the usual vagueness of the specifications on certain crucial implementation-related details (and that there's a mountain of specifications that one has to navigate). One of my tests tries to exercise the code, but I might be doing it all completely wrong: https://hg.boddie.org.uk/libxml2dom/.../svg_events.py It occurs to me that various PyQt- and PyKDE-related bindings might also provide some exposure to DOM Events, although I had heard that WebKit, which should have support for lots of DOM features, exposes some pretty useless interfaces to languages like Python, currently. The situation with Mozilla and PyXPCOM may well be similar. Paul |
Re: DOM implementation
Hi Paul, thank you for your swift reply!
On May 15, 3:42*pm, Paul Boddie <p...@boddie.org.uk> wrote: > Sure! Just keep your observations coming! I've made a very lazy > attempt at DOM Events support in libxml2dom, I just had a look at libxml2dom, in particular its events.py file. Given that we are working from a standard your implementation is exceedingly similar to mine and had I know before I started writing my own classes I would have started from it instead! =) Browsing through the code, the EventTarget class docstring reads: The listeners for a node are accessed through the global object. This common collection is consequently accessed by all nodes in a document, meaning that distinct objects representing the same node can still obtain the set of listeners registered for that node. In contrast, any attempt to directly store listeners on particular objects would result in the specific object which registered the listeners holding the record of such objects, whereas other objects obtained independently for the same node would hold no such record. Naively, I implemented my EventTarget class storing its own listeners rather than global ones. Nevertheless, I'm not quite understanding this issue. Why shouldn't the listeners be stored directly on the EventTarget? I have a glimpse of understanding that if the DOMImplementation keeps EventTarget and Nodes (or Elements? which entity is supposed to support Events?) separate this might be necessary. But beside the fact that it's just a fuzzy and potentially incorrect intuition, I seem to think that the appropriate way to proceed would be for the DOMImplementation to provide a Node class that also inherits from EventTarget. In so doing the listeners would be immediately accessible as soon as one has a handle to a Node. Furthermore, your code finds the bubbling route with the line: bubble_route = target.xpath("ancestor::*") That xpath method is a libxml method right? > (...) although I find these things quite hard to > figure out with the usual vagueness of the specifications on certain > crucial implementation-related details (and that there's a mountain of > specifications that one has to navigate). Indeed there is some vagueness in the W3C recommendations and the various documents offer very little redundancy with each other but require you to be knowledgeable about them all! I'm managing to piece together the pieces of the puzzle only after a couple of day having an in-depth read-through of DOM, DOM Events and a little bit of XML events to see how it all works in practice. XML events is also what's prompting me to think that Node/Elements classes of the implementation should also inherit from EventTarget as they can all be event targets. > One of my tests tries to exercise the code, but I might be doing it > all completely wrong: > > https://hg.boddie.org.uk/libxml2dom/...ests/svg_event... Before I can comment I'd like to better understand what you are aiming for with libxml2dom. It seems to be providing some kind of conversion services from the xml structure generated by libxml to a dom-like structure (implemented by pxdom?). Is that correct? > It occurs to me that various PyQt- and PyKDE-related bindings might > also provide some exposure to DOM Events, although I had heard that > WebKit, which should have support for lots of DOM features, exposes > some pretty useless interfaces to languages like Python, currently. > The situation with Mozilla and PyXPCOM may well be similar. PyKDE is off-limits because it's unix only while I'm trying to be cross-platform. PyQT is interesting. Very. Further investigation is required. =) Manu |
Re: DOM implementation
On 15 Mai, 18:27, "Emanuele D'Arrigo" <man...@gmail.com> wrote:
> > I just had a look at libxml2dom, in particular its events.py file. > Given that we are working from a standard your implementation is > exceedingly similar to mine and had I know before I started writing my > own classes I would have started from it instead! =) Another implementation is probably a good thing, though, since I don't trust my own interpretation of the specifications. ;-) > Browsing through the code, the EventTarget class docstring reads: [Long docstring cut] > Naively, I implemented my EventTarget class storing its own listeners > rather than global ones. Nevertheless, I'm not quite understanding > this issue. Why shouldn't the listeners be stored directly on the > EventTarget? One reason for this might well be due to the behaviour of libxml2 and libxml2dom: if I visit the same node in a document twice, obtaining a node instance each time, these two instances will be different; therefore, storing listeners on such instances is not very helpful because the expectation that you will automatically see previously added listeners on a node will not generally be fulfilled. With pxdom, it may be a different situation, but libxml2dom is constrained by the behaviour of libxml2: I don't attempt to check node equivalence and then expose the structures representing a single node using a single object; I generally try and instantiate as few Python objects, wrapping libxml2 structures, as I can. > I have a glimpse of understanding that if the > DOMImplementation keeps EventTarget and Nodes (or Elements? which > entity is supposed to support Events?) separate this might be > necessary. But beside the fact that it's just a fuzzy and potentially > incorrect intuition, I seem to think that the appropriate way to > proceed would be for the DOMImplementation to provide a Node class > that also inherits from EventTarget. In so doing the listeners would > be immediately accessible as soon as one has a handle to a Node. The libxml2dom.svg module has classes which inherit from EventTarget. What I've tried to do is to make submodules to address particular formats and document models. > Furthermore, your code finds the bubbling route with the line: > > bubble_route = target.xpath("ancestor::*") > > That xpath method is a libxml method right? I use libxml2's XPath support exposed via libxml2dom.Node. > Indeed there is some vagueness in the W3C recommendations and the > various documents offer very little redundancy with each other but > require you to be knowledgeable about them all! I'm managing to piece > together the pieces of the puzzle only after a couple of day having an > in-depth read-through of DOM, DOM Events and a little bit of XML > events to see how it all works in practice. XML events is also what's > prompting me to think that Node/Elements classes of the implementation > should also inherit from EventTarget as they can all be event > targets. I think that if I were to expose an event-capable DOM, other than that provided for SVG, I would just have a specific submodule for that purpose. > > One of my tests tries to exercise the code, but I might be doing it > > all completely wrong: > > >https://hg.boddie.org.uk/libxml2dom/...ests/svg_event... > > Before I can comment I'd like to better understand what you are aiming > for with libxml2dom. It seems to be providing some kind of conversion > services from the xml structure generated by libxml to a dom-like > structure (implemented by pxdom?). > Is that correct? Yes. The aim is to provide a PyXML DOM API on top of libxml2 documents. Paul |
Re: DOM implementation
Hello Paul, sorry for the long delay, I was trying to wrap my mind
around DOM and Events implementations... On May 15, 7:08*pm, Paul Boddie <p...@boddie.org.uk> wrote: > Another implementation is probably a good thing, though, since I don't > trust my own interpretation of the specifications. ;-) Tell me about it. In general I like the work the W3C is doing, but some things could use a little less freedom and a little more clarity. =) But then again, maybe it's for the best to leave things as they are so that we can figure it out for ourselves. > > Why shouldn't the listeners be stored directly on the EventTarget? > > One reason for this might well be due to the behaviour of libxml2 and > libxml2dom: if I visit the same node in a document twice, obtaining a > node instance each time, these two instances will be different; Mmmm.... I don't know the specifics of libxml... are you saying that once the object tree is created out of an XML file, requesting twice the same node object -does not- result in a pointer to the same instance in memory? How's that possible? > The libxml2dom.svg module has classes which inherit from EventTarget. And what does the EventTarget inherit from? Or are those classes inheriting from both Nodes and EventTargets? > What I've tried to do is to make submodules to address particular > formats and document models. I think the issue to consider there is that the DOM does not restrict a document from being a mush-up of multiple formats. I.e. it should be possible to have XHTML and SVG tags in the same document. As long as those modules work at element/tag level and do not obstruct each other I think you are on the right track! > I think that if I were to expose an event-capable DOM, other than that > provided for SVG, I would just have a specific submodule for that > purpose. Ultimately I found it moderately easier to modify pxdom with the intention of releasing "pxdome", a fork of pxdom. Monkey-patching pxdom seemed to be a little too tricky and prone to error to create a separate module. > > > One of my tests tries to exercise the code, but I might be doing it > > > all completely wrong: > > >https://hg.boddie.org.uk/libxml2dom/...ests/svg_event.... I had a more in-depth look after having spent the weekend trying to wrap my head around all sorts of implementation issues. My understanding, also after a few exchanges in the www-dom@w3.org mailing-list, is that initialization of an event can happen wherever you feel like doing it, except in Document.createEvent(). I.e. it could be a method on the event itself or an external function. In your code however, I believe the initialization method should be initMouseEventNS() rather then initEventNS() and the namespace for DOM 3 Events should be -None-. Between the two implementations the first one seems to be more aligned with the DOM documentation. The way I'm doing it is that I invoke Document.createEvent(eventType), I initialize the resulting event in part manually and in part with type-related default settings and I finally use Document.pxdomTriggerEvent(event) to create a propagation path and iterate through its targets. I.e.: def _trigger_DOMSubtreeModified(target): relevantTargetTypes = (Node.DOCUMENT_NODE, Node.DOCUMENT_FRAGMENT_NODE, Node.ELEMENT_NODE, Node.ATTRIBUTE_NODE) if target.nodeType not in relevantTargetTypes: return if target.ownerDocument: event = target.ownerDocument.createEvent("MutationEvent") event._target = target target.ownerDocument.pxdomEventDefaultInitNS(None, "DOMSubtreeModified", event) target.ownerDocument.pxdomTriggerEvent(event) Notice that I'm currently keeping this function as a loose function but it could very well be placed as a method in the Document class or in each relevant classes. I'm not sure why one option would be better than all others and the DOM doesn't specify it. The dispatch of the event to each target on the propagation path is also a matter of implementation. In the discussion in www-dom three options have emerged: 1) the Document node establishes the propagation path and iterates through the targets listed to dispatch the event to each 2) an unspecified, external object does the same job 3) the propagation path is established, stored on the event and each event target is responsible for recursively dispatching the event to the next target if propagation hasn't been stopped. Apparently an earlier version of Mozilla's Gecko used option 3 but they eventually switched to option 1. Again, it's unclear in what circumstances to use one option or the other. What I don't know at this time is how to merge all this with the specific file formats such as SVG and HTML. I.e. in an SVG example, do I create a GroupElement(Element) class and I override the Document.createElement() method to create an instance of it any time a <g> element is found in the input file? Or do I first create an application-neutral DOM tree out of the input file and I then instantiate a parallel application-specific structure, holding the objects that provide methods to actually draw and group shapes? If I get an answer from www-dom I'll report it here... Manu |
| All times are GMT. The time now is 09:06 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.