Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > elementtree: line numbers and iterparse

Reply
Thread Tools

elementtree: line numbers and iterparse

 
 
Stuart McGraw
Guest
Posts: n/a
 
      09-13-2006
I have a broad (~200K nodes) but shallow xml file
I want to parse with Elementtree. There are too many
nodes to read into memory simultaneously so I use
iterparse() to process each node sequentially.

Now I find i need to get and save the input file line
number of each node. Googling turned up a way
to do it by subclassing FancyTreeBuilder,
(http://groups.google.com/group/comp....9553b4b?hl=en&)
but that tries to read everything at once.

Is there a way to do something similiar with iterparse()?

 
Reply With Quote
 
 
 
 
Fredrik Lundh
Guest
Posts: n/a
 
      09-13-2006
Stuart McGraw wrote:

> I have a broad (~200K nodes) but shallow xml file
> I want to parse with Elementtree. There are too many
> nodes to read into memory simultaneously so I use
> iterparse() to process each node sequentially.
>
> Now I find i need to get and save the input file line
> number of each node. Googling turned up a way
> to do it by subclassing FancyTreeBuilder,
> (http://groups.google.com/group/comp....9553b4b?hl=en&)
> but that tries to read everything at once.
>
> Is there a way to do something similiar with iterparse()?


something like this could work:

import elementtree.ElementTree as ET
import StringIO

data = """\
<doc>
<tag>
<subtag>text</subtag>
<subtag>text</subtag>
</tag>
</doc>
"""

class FileWrapper:
def __init__(self, source):
self.source = source
self.lineno = 0
def read(self, bytes):
s = self.source.readline()
self.lineno += 1
return s

# f = FileWrapper(open("source.xml")
f = FileWrapper(StringIO.StringIO(data))

for event, elem in ET.iterparse(f, events=["start", "end"]):
if event == "start":
print f.lineno, event, elem

</F>

 
Reply With Quote
 
 
 
 
Stuart McGraw
Guest
Posts: n/a
 
      09-13-2006

"Fredrik Lundh" <(E-Mail Removed)> wrote in message news:(E-Mail Removed)...
> Stuart McGraw wrote:
> > Now I find i need to get and save the input file line
> > number of each node. Googling turned up a way
> > to do it by subclassing FancyTreeBuilder,
> > (http://groups.google.com/group/comp....9553b4b?hl=en&)
> > but that tries to read everything at once.
> >
> > Is there a way to do something similiar with iterparse()?

>
> something like this could work:
> ...snip...


Indeed it does. Many thanks!

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Issue with xml iterparse bfrederi Python 4 06-13-2010 12:13 PM
Read a file line by line and write each line to a file based on the5th byte scad C++ 23 05-17-2009 06:11 PM
Re: iterparse and unicode George Sakkis Python 6 08-27-2008 03:00 PM
How to read a text file line by line and remove some line kaushikshome C++ 4 09-10-2006 10:12 PM
Iterparse and ElementTree confusion paul.sherwood@gmail.com Python 4 08-18-2005 07:58 AM



Advertisments