![]() |
RE: Parsing xml file using python
> I need to read an XML document and ignore all XML tags and
> write only those between the tags to a text file. In other > words, if I have an XML document like so: > > <tag1>This</tag1> > <tag2>is</tag2> > <tag3>a</tag3> > <tag1>test</tag1> > > I need to write "This is a test" to a text file. How do I achieve > this? Apart from the XML parsing solutions that have already been posted, if you really don't care about the tags, then you don't actually need to parse the XML. Just dump anything between < and > (assumes that the XML is valid, and so doesn't have < or > except for as a tag). For example (this is pretty basic): >>> import re >>> s = "<tag1>This</tag1>\n<tag2>is</tag2>\n<tag3>a</tag3>\n<tag1>test</tag1>" >>> re.sub(r"<.*?>", "", s) 'This\nis\na\ntest' You'd still have to deal with entities, but IIRC, you'll have to with most parsers, too. It's certainly simpler. =Tony Meyer |
| All times are GMT. The time now is 02:46 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.