Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Parsing xml file using python

Reply
Thread Tools

Parsing xml file using python

 
 
chad
Guest
Posts: n/a
 
      03-05-2004
Hello, all,

I am new to Python.

I need to read an XML document and ignore all XML tags and write only
those between the tags to a text file. In other words, if I have an
XML document like so:

<tag1>This</tag1>
<tag2>is</tag2>
<tag3>a</tag3>
<tag1>test</tag1>

I need to write "This is a test" to a text file. How do I achieve
this? Thanks.
 
Reply With Quote
 
 
 
 
C GIllespie
Guest
Posts: n/a
 
      03-05-2004
> I need to read an XML document and ignore all XML tags and write only
> those between the tags to a text file. In other words, if I have an
> XML document like so:

See
http://diveintopython.org/xml_processing/index.html

for a nice introduction

Colin


 
Reply With Quote
 
 
 
 
Peter Hansen
Guest
Posts: n/a
 
      03-05-2004
chad wrote:

> I am new to Python.


And XML?

> I need to read an XML document and ignore all XML tags and write only
> those between the tags to a text file. In other words, if I have an
> XML document like so:
>
> <tag1>This</tag1>
> <tag2>is</tag2>
> <tag3>a</tag3>
> <tag1>test</tag1>
>
> I need to write "This is a test" to a text file. How do I achieve
> this? Thanks.


Note that what you have above is _not_ an XML file. That is, it looks
like XML but it's not well-formed, as it doesn't have a single enclosing
element.

You probably meant that just as a quickie example, but in case that's
like what your actual data format looks like, you'll have trouble using
any of the Python XML parsers. (The fix is pretty trivial though.)

-Peter
 
Reply With Quote
 
Georgy
Guest
Posts: n/a
 
      03-05-2004
There's standard xml modules with built-in SAX2 parser.

http://pyxml.sourceforge.net/topics/...xml-howto.html
http://www.python.org/doc/current/lib/markup.html
http://www.devarticles.com/c/a/Pytho...-and-Python/2/

For more samples: http://www.google.com/search?hl=en&i...=Google+Search

And the code you're looking for will be like this (not tested):

import sys
from xml.sax import make_parser, handler
class BodyOnly(handler.ContentHandler):
def characters( self, content ):
print content,
parser = make_parser()
parser.setContentHandler(BodyOnly())
parser.parse( "input.xml" )



"chad" <(E-Mail Removed)> wrote in message news:(E-Mail Removed) om...
| Hello, all,
|
| I am new to Python.
|
| I need to read an XML document and ignore all XML tags and write only
| those between the tags to a text file. In other words, if I have an
| XML document like so:
|
| <tag1>This</tag1>
| <tag2>is</tag2>
| <tag3>a</tag3>
| <tag1>test</tag1>
|
| I need to write "This is a test" to a text file. How do I achieve
| this? Thanks.


 
Reply With Quote
 
Andrew Clover
Guest
Posts: n/a
 
      03-05-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) (chad) wrote:

> <tag1>This</tag1>
> <tag2>is</tag2>
> <tag3>a</tag3>
> <tag1>test</tag1>


> I need to write "This is a test"


Assuming no nested tags (in which case you'd have to specify the problem
more completely), and no entity reference issues, any DOM Level 1
implementation can do this, eg. with minidom:

from xml.dom import minidom
doc= minidom.parse(inputFilename)

parent= doc.documentElement
children= [child for child in parent.childNodes if child.nodeType==1]
content= ' '.join([child.firstChild.nodeValue for child in children])

fp= open(outputFilename, 'wb')
fp.write(content)
fp.close()

For more complicated structures, the 'textContent' property in DOM Level 3
might be of use. (Insert standard pxdom plug here.)

--
Andrew Clover
(E-Mail Removed)
http://www.doxdesk.com/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What libraries should I use for MIME parsing, XML parsing, and MySQL ? John Levine Ruby 0 02-02-2012 11:15 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Print XML parsing to JspWriter (out) Class org.xml.sax.helpers.NewInstance can not access a member of class javax.xml.parsers.SAXParser with modifiers "protected" Per Magnus L?vold Java 0 11-15-2004 02:27 PM
RE: Parsing xml file using python Tony Meyer Python 0 03-07-2004 04:16 AM
RE: Parsing xml file using python David LeBlanc Python 5 03-05-2004 09:02 PM



Advertisments