Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > 4DOM eating all my memory

Reply
Thread Tools

4DOM eating all my memory

 
 
ewan
Guest
Posts: n/a
 
      02-01-2004
hello all -

I'm looping over a set of urls pulled from a database, fetching the
corresponding webpage, and building a DOM tree for it using
xml.dom.ext.reader.HtmlLib (then trying to match titles in a web library
catalogue). all the trees seem to be kept in memory,

however, when I get through fifty or so iterations the program has used
about half my memory and slowed the system to a crawl.

tried turning on all gc debugging flags. they produce lots of output, but it
all says 'collectable' - sounds fine to me.

I even tried doing gc.collect() at the end of every iteration. nothing.
everything seems to be being collected. so why does each iteration increase
the memory usage by several megabytes?

below is some code (and by the way, do I have those 'global's in the right
places?)

any suggestions would be appreciated immeasurably...
ewan



import MySQLdb

....

cursor = db.cursor()
result = cursor.execute("""SELECT CALLNO, TITLE FROM %s""" % table)
rows = cursor.fetchall()
cursor.close()

for row in rows:
current_callno = row[0]
title = row[1]
url = construct_url(title)
cf = callno_finder()
cf.find(title.decode('latin-1'), url)
...

(meanwhile, in another file)
....

class callno_finder:
def __init__(self):
global root
root = None

def find(self, title, uri):
global root

reader = HtmlLib.Reader()
root = reader.fromUri(uri)

# find what we're looking for
...
 
Reply With Quote
 
 
 
 
John J. Lee
Guest
Posts: n/a
 
      02-02-2004
ewan <(E-Mail Removed)> writes:

> I'm looping over a set of urls pulled from a database, fetching the
> corresponding webpage, and building a DOM tree for it using
> xml.dom.ext.reader.HtmlLib (then trying to match titles in a web library
> catalogue).


Hmm, if this is open-source and it's more than a quick hack, let me
know when you have it working, I maintain a page on open-source stuff
of this nature (bibliographic and cataloguing).


> all the trees seem to be kept in memory,
>
> however, when I get through fifty or so iterations the program has used
> about half my memory and slowed the system to a crawl.
>
> tried turning on all gc debugging flags. they produce lots of output, but it
> all says 'collectable' - sounds fine to me.


I've never had to resort to this... does it tell you what types /
classes are involved? IIRC, there was some code posted to python-dev
to give hints about this (though I guess that was mostly/always for
debugging leaks at the C level).


> I even tried doing gc.collect() at the end of every iteration. nothing.
> everything seems to be being collected. so why does each iteration increase
> the memory usage by several megabytes?
>
> below is some code (and by the way, do I have those 'global's in the right
> places?)


Yes, they're in the right places. Not sure a global is really needed,
though...


> any suggestions would be appreciated immeasurably...

[...]
> def find(self, title, uri):
> global root
>
> reader = HtmlLib.Reader()
> root = reader.fromUri(uri)
>
> # find what we're looking for
> ...


+ reader.releaseNode(root)

?


John
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Program eating memory, but only on one machine? Per B. Sederberg Python 5 01-22-2007 08:14 PM
javascript animation eating all memory gencode Javascript 1 08-28-2006 06:21 PM
Fx 1.5.0.1 media player plug-in eating memory Faun Firefox 0 02-26-2006 04:32 PM
Nero eating up / hiding memory DVD Video 3 07-09-2005 01:54 AM
get textual content of a Xml element using 4DOM frankabel Python 4 03-06-2005 08:21 AM



Advertisments