Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > DOM question

Reply
Thread Tools

DOM question

 
 
Richard Lewis
Guest
Posts: n/a
 
      06-02-2005
Hi there,

I have an XML document which contains a mixture of structural nodes
(called 'section' and with unique 'id' attributes) and non-structural
nodes (called anything else). The structural elements ('section's) can
contain, as well as non-structural elements, other structural elements.
I'm doing the Python DOM programming with this document and have got
stuck with something.

I want to be able to get all the non-structural elements which are
children of a given 'section' elemenent (identified by 'id' attribute)
but not children of any child 'section' elements of the given 'section'.

e.g.:

<section id="a">
<foo>bar</foo>
</section>
<section id="b">
<foo>baz</foo>
<section id="c">
<bar>foo</bar>
</section>
</section>

Given this document, the working function would return "<foo>baz</foo>"
for id='b' and "<bar>foo</bar>" for id='c'.

Normally, recursion is used for DOM traversals. I've tried this function
which uses recursion with a generator (can the two be mixed?)

def content_elements(node):
if node.hasChildNodes():
node = node.firstChild

if not page_node(node):
yield node

for e in self.content_elements(node):
yield e

node = node.nextSibling

which didn't work. So I tried it without using a generator:

def content_elements(node, elements):
if node.hasChildNodes():
node = node.firstChild

if node.nodeType == Node.ELEMENT_NODE: print node.tagName
if not page_node(node):
elements.append(node)

self.content_elements(node, elements)

node = node.nextSibling

return elements

However, I got exactly the same problem: each time I use this function I
just get a DOM Text node with a few white space (tabs and returns) in
it. I guess this is the indentation in my source document? But why do I
not get the propert element nodes?

Cheers,
Richard
 
Reply With Quote
 
 
 
 
Diez B. Roggisch
Guest
Posts: n/a
 
      06-02-2005
> However, I got exactly the same problem: each time I use this function I
> just get a DOM Text node with a few white space (tabs and returns) in
> it. I guess this is the indentation in my source document? But why do I
> not get the propert element nodes?


Welcome to the wonderful world of DOM, Where insignificant whitespace
becomes a first-class citizen!

Use XPath. Really. It's well worth the effort, as it is suited for exactly
the tasks you presented us, and allows for a concise formulation of these.
Yours would be (untested)

//section[id==$id_param]/node()[!name() == section]

It looks from the root throug all the descending childs

//

after nodes with name section

section

that fulfill the predicate

[id==$id_param]

From this out we collect all immediate children

/node()

that are not of type section [!name() == section]


--
Regards,

Diez B. Roggisch
 
Reply With Quote
 
 
 
 
Richard Lewis
Guest
Posts: n/a
 
      06-02-2005

On Thu, 02 Jun 2005 14:34:47 +0200, "Diez B. Roggisch"
<(E-Mail Removed)> said:
> > However, I got exactly the same problem: each time I use this function I
> > just get a DOM Text node with a few white space (tabs and returns) in
> > it. I guess this is the indentation in my source document? But why do I
> > not get the propert element nodes?

>
> Welcome to the wonderful world of DOM, Where insignificant whitespace
> becomes a first-class citizen!
>
> Use XPath. Really. It's well worth the effort, as it is suited for
> exactly
> the tasks you presented us, and allows for a concise formulation of
> these.
> Yours would be (untested)
>
> //section[id==$id_param]/node()[!name() == section]
>
>

Yes, in fact:

//section[@id=$id_param]//*[name()!='section']

would do the trick.

I was trying to avoid using anything not in the standard Python
distribution if I could help it; I need to be able to use my code on
Linux, OS X and Windows.

The xml.path package is from PyXML, yes? I'll just have to battle with
installing PyXML on OS X

Cheers,
Richard
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      06-02-2005
>
> Yes, in fact:
>
> //section[@id=$id_param]//*[name()!='section']
>
> would do the trick.
>
> I was trying to avoid using anything not in the standard Python
> distribution if I could help it; I need to be able to use my code on
> Linux, OS X and Windows.
>
> The xml.path package is from PyXML, yes? I'll just have to battle with
> installing PyXML on OS X


As a fresh member of the MacOSX community I can say that so far except
pygame I made everything run. So - I don't expect that to be too much of
a problem.

Diez
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Replacing _xmlplus.dom.minidom with xml.dom.minidom aine_canby@yahoo.com Python 3 08-03-2007 03:50 PM
Convert a XML DOM Object to a HTML DOM Object manjunath.d@gmail.com XML 0 09-20-2005 08:16 AM
What is the difference between DOM Level 1 and DOM Level 2. mike XML 1 11-20-2004 03:19 PM
Difference between pure DOM and JAXP over DOM ?? Thorsten Meininger XML 0 07-28-2004 08:51 AM
Difference between pure DOM and JAXP over DOM ?? Thorsten Meininger Java 0 07-28-2004 08:51 AM



Advertisments