Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Parsing xml file using python (http://www.velocityreviews.com/forums/t329219-parsing-xml-file-using-python.html)

chad 03-05-2004 12:49 AM

Parsing xml file using python
 
Hello, all,

I am new to Python.

I need to read an XML document and ignore all XML tags and write only
those between the tags to a text file. In other words, if I have an
XML document like so:

<tag1>This</tag1>
<tag2>is</tag2>
<tag3>a</tag3>
<tag1>test</tag1>

I need to write "This is a test" to a text file. How do I achieve
this? Thanks.

C GIllespie 03-05-2004 08:17 AM

Re: Parsing xml file using python
 
> I need to read an XML document and ignore all XML tags and write only
> those between the tags to a text file. In other words, if I have an
> XML document like so:

See
http://diveintopython.org/xml_processing/index.html

for a nice introduction

Colin



Peter Hansen 03-05-2004 03:13 PM

Re: Parsing xml file using python
 
chad wrote:

> I am new to Python.


And XML?

> I need to read an XML document and ignore all XML tags and write only
> those between the tags to a text file. In other words, if I have an
> XML document like so:
>
> <tag1>This</tag1>
> <tag2>is</tag2>
> <tag3>a</tag3>
> <tag1>test</tag1>
>
> I need to write "This is a test" to a text file. How do I achieve
> this? Thanks.


Note that what you have above is _not_ an XML file. That is, it looks
like XML but it's not well-formed, as it doesn't have a single enclosing
element.

You probably meant that just as a quickie example, but in case that's
like what your actual data format looks like, you'll have trouble using
any of the Python XML parsers. (The fix is pretty trivial though.)

-Peter

Georgy 03-05-2004 03:50 PM

Re: Parsing xml file using python
 
There's standard xml modules with built-in SAX2 parser.

http://pyxml.sourceforge.net/topics/...xml-howto.html
http://www.python.org/doc/current/lib/markup.html
http://www.devarticles.com/c/a/Pytho...-and-Python/2/

For more samples: http://www.google.com/search?hl=en&i...=Google+Search

And the code you're looking for will be like this (not tested):

import sys
from xml.sax import make_parser, handler
class BodyOnly(handler.ContentHandler):
def characters( self, content ):
print content,
parser = make_parser()
parser.setContentHandler(BodyOnly())
parser.parse( "input.xml" )



"chad" <antonyliu2002@yahoo.com> wrote in message news:14b36d18.0403041649.252d2e7c@posting.google.c om...
| Hello, all,
|
| I am new to Python.
|
| I need to read an XML document and ignore all XML tags and write only
| those between the tags to a text file. In other words, if I have an
| XML document like so:
|
| <tag1>This</tag1>
| <tag2>is</tag2>
| <tag3>a</tag3>
| <tag1>test</tag1>
|
| I need to write "This is a test" to a text file. How do I achieve
| this? Thanks.



Andrew Clover 03-05-2004 05:30 PM

Re: Parsing xml file using python
 
antonyliu2002@yahoo.com (chad) wrote:

> <tag1>This</tag1>
> <tag2>is</tag2>
> <tag3>a</tag3>
> <tag1>test</tag1>


> I need to write "This is a test"


Assuming no nested tags (in which case you'd have to specify the problem
more completely), and no entity reference issues, any DOM Level 1
implementation can do this, eg. with minidom:

from xml.dom import minidom
doc= minidom.parse(inputFilename)

parent= doc.documentElement
children= [child for child in parent.childNodes if child.nodeType==1]
content= ' '.join([child.firstChild.nodeValue for child in children])

fp= open(outputFilename, 'wb')
fp.write(content)
fp.close()

For more complicated structures, the 'textContent' property in DOM Level 3
might be of use. (Insert standard pxdom plug here.)

--
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/


All times are GMT. The time now is 01:54 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.