Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > get element text in DOM?

Reply
Thread Tools

get element text in DOM?

 
 
Juliano Freitas
Guest
Posts: n/a
 
      11-10-2004
How can i get the text between the <teste> tags??

>>> xml = """<root><teste> texto </teste></root>"""
>>> from xml.dom import minidom
>>> document = minidom.parseString(xml)
>>> document

<xml.dom.minidom.Document instance at 0x4181df0c>
>>> minidom.getElementsByTagName('teste')


>>> element = document.getElementsByTagName('teste')
>>> element

[<DOM Element: teste at 0x418e110c>]
>>> element[0].nodeType

1

Juliano Freitas


 
Reply With Quote
 
 
 
 
Irmen de Jong
Guest
Posts: n/a
 
      11-10-2004
Juliano Freitas wrote:
> How can i get the text between the <teste> tags??
>
>
>>>>xml = """<root><teste> texto </teste></root>"""


You must know that the text between the tags is a DOM element
by itself, namely a TEXT node, which is a child of the
elment node formed by the tag.

So try;

xml = """<root><teste> texto </teste></root>"""
from xml.dom import minidom
document = minidom.parseString(xml)
element = document.getElementsByTagName('teste')
textelt=element[0].firstChild
print textelt.nodeType, textelt.nodeValue

and it will print:

3 texto

--Irmen
 
Reply With Quote
 
 
 
 
Uche Ogbuji
Guest
Posts: n/a
 
      11-12-2004
Juliano Freitas <> wrote in message news:<mailman.6228.1100113797.5135.python->...
> How can i get the text between the <teste> tags??
>
> >>> xml = """<root><teste> texto </teste></root>"""
> >>> from xml.dom import minidom
> >>> document = minidom.parseString(xml)
> >>> document

> <xml.dom.minidom.Document instance at 0x4181df0c>
> >>> minidom.getElementsByTagName('teste')

>
> >>> element = document.getElementsByTagName('teste')
> >>> element

> [<DOM Element: teste at 0x418e110c>]
> >>> element[0].nodeType

> 1
>
> Juliano Freitas


http://lists.fourthought.com/piperma...er/013027.html

Verbatim:

"""
Or, ObTopic, for 4Suite recent CVS:

>>> from Ft.Xml.Domlette import NonvalidatingReader
>>> doc = NonvalidatingReader.parseString("<root><teste> texto

</teste></root>", 'urn:dummy')
>>> print doc.xpath('string(/root/teste)')

texto

Simple and sweet IMHO.
"""

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
A hands-on introduction to ISO Schematron -
http://www-106.ibm.com/developerwork...ematron-i.html
Schematron abstract patterns -
http://www.ibm.com/developerworks/xm...y/x-stron.html
Wrestling HTML (using Python) -
http://www.xml.com/pub/a/2004/09/08/pyxml.html
XML's growing pains - http://www.adtmag.com/article.asp?id=10196
XMLOpen and more XML Hacks -
http://www.ibm.com/developerworks/xm...x-think27.html
A survey of XML standards -
http://www-106.ibm.com/developerwork...rary/x-stand4/
 
Reply With Quote
 
Manlio Perillo
Guest
Posts: n/a
 
      11-13-2004
On Wed, 10 Nov 2004 17:11:09 -0200, Juliano Freitas
<> wrote:

>How can i get the text between the <teste> tags??
>
>>>> xml = """<root><teste> texto </teste></root>"""
>>>> from xml.dom import minidom
>>>> document = minidom.parseString(xml)
>>>> document

><xml.dom.minidom.Document instance at 0x4181df0c>
>>>> minidom.getElementsByTagName('teste')

>
>>>> element = document.getElementsByTagName('teste')
>>>> element

>[<DOM Element: teste at 0x418e110c>]
>>>> element[0].nodeType

>1
>



Here is an useful function I have written:

def getText(node, recursive = False):
"""
Get all the text associated with this node.
With recursive == True, all text from child nodes is retrieved
"""
L = ['']
for n in node.childNodes:
if n.nodeType in (dom.Node.TEXT_NODE,
dom.Node.CDATA_SECTION_NODE):
L.append(n.data)
else:
if not recursive:
return None
L.append( get_text(n) )

return ''.join(L)



>>> print getText(element[0])





Regards Manlio Perillo
 
Reply With Quote
 
Andrew Clover
Guest
Posts: n/a
 
      11-14-2004
Manlio Perillo <> wrote:

> for n in node.childNodes:
> if n.nodeType in (dom.Node.TEXT_NODE, dom.Node.CDATA_SECTION_NODE):


(Aside: node.TEXT_NODE would probably be better here. Can't guarantee
that a DOM's implementation of the 'Node' interface is available as a
class called 'Node' inside its module.)

> L.append(n.data)
> else:
> if not recursive:
> return None


Surely 'continue'? This will exit the function (returning None instead
of the expected empty string) the first time a non-Text node is met.

Incidentally, DOM Level 3 Core defines the property 'textContent' to
return pretty much exactly this (although it removes the ignorable
whitespace). Not in minidom yet, but... <insert usual plug here>

--
Andrew Clover
private.php?do=newpm&u=
http://www.doxdesk.com/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SAX parsing problem, when element contains text like "[text]" Kai Schlamp Java 1 03-27-2008 08:36 PM
how to Update/insert an xml element's text----> (<element>text</element>) HANM XML 2 01-29-2008 03:31 PM
Get next element given a certain element? Kourosh Javascript 1 06-08-2006 02:23 AM
How do I get an element's the text node Ron Brennan Javascript 5 07-16-2005 08:43 PM
get the form element from a frame element Mr. x HTML 3 12-11-2003 08:44 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57