![]() |
Parsing xml file using python
Hello, all,
I am new to Python. I need to read an XML document and ignore all XML tags and write only those between the tags to a text file. In other words, if I have an XML document like so: <tag1>This</tag1> <tag2>is</tag2> <tag3>a</tag3> <tag1>test</tag1> I need to write "This is a test" to a text file. How do I achieve this? Thanks. |
Re: Parsing xml file using python
> I need to read an XML document and ignore all XML tags and write only
> those between the tags to a text file. In other words, if I have an > XML document like so: See http://diveintopython.org/xml_processing/index.html for a nice introduction Colin |
Re: Parsing xml file using python
chad wrote:
> I am new to Python. And XML? > I need to read an XML document and ignore all XML tags and write only > those between the tags to a text file. In other words, if I have an > XML document like so: > > <tag1>This</tag1> > <tag2>is</tag2> > <tag3>a</tag3> > <tag1>test</tag1> > > I need to write "This is a test" to a text file. How do I achieve > this? Thanks. Note that what you have above is _not_ an XML file. That is, it looks like XML but it's not well-formed, as it doesn't have a single enclosing element. You probably meant that just as a quickie example, but in case that's like what your actual data format looks like, you'll have trouble using any of the Python XML parsers. (The fix is pretty trivial though.) -Peter |
Re: Parsing xml file using python
There's standard xml modules with built-in SAX2 parser.
http://pyxml.sourceforge.net/topics/...xml-howto.html http://www.python.org/doc/current/lib/markup.html http://www.devarticles.com/c/a/Pytho...-and-Python/2/ For more samples: http://www.google.com/search?hl=en&i...=Google+Search And the code you're looking for will be like this (not tested): import sys from xml.sax import make_parser, handler class BodyOnly(handler.ContentHandler): def characters( self, content ): print content, parser = make_parser() parser.setContentHandler(BodyOnly()) parser.parse( "input.xml" ) "chad" <antonyliu2002@yahoo.com> wrote in message news:14b36d18.0403041649.252d2e7c@posting.google.c om... | Hello, all, | | I am new to Python. | | I need to read an XML document and ignore all XML tags and write only | those between the tags to a text file. In other words, if I have an | XML document like so: | | <tag1>This</tag1> | <tag2>is</tag2> | <tag3>a</tag3> | <tag1>test</tag1> | | I need to write "This is a test" to a text file. How do I achieve | this? Thanks. |
Re: Parsing xml file using python
antonyliu2002@yahoo.com (chad) wrote:
> <tag1>This</tag1> > <tag2>is</tag2> > <tag3>a</tag3> > <tag1>test</tag1> > I need to write "This is a test" Assuming no nested tags (in which case you'd have to specify the problem more completely), and no entity reference issues, any DOM Level 1 implementation can do this, eg. with minidom: from xml.dom import minidom doc= minidom.parse(inputFilename) parent= doc.documentElement children= [child for child in parent.childNodes if child.nodeType==1] content= ' '.join([child.firstChild.nodeValue for child in children]) fp= open(outputFilename, 'wb') fp.write(content) fp.close() For more complicated structures, the 'textContent' property in DOM Level 3 might be of use. (Insert standard pxdom plug here.) -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ |
| All times are GMT. The time now is 01:54 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.