Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: python resource management

Reply
Thread Tools

Re: python resource management

 
 
Philip Semanchuk
Guest
Posts: n/a
 
      01-19-2009

On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:

> Hi all,
>
> I am running a python script which parses nearly 22,000 html files
> locally
> stored using BeautifulSoup.
> The problem is the memory usage linearly increases as the files are
> being
> parsed.
> When the script has crossed parsing 200 files or so, it consumes all
> the
> available RAM and The CPU usage comes down to 0% (may be due to
> excessive
> paging).
>
> We tried 'del soup_object' and used 'gc.collect()'. But, no
> improvement.
>
> Please guide me how to limit python's memory-usage or proper method
> for
> handling BeautifulSoup object in resource effective manner


You need to figure out where the memory is disappearing. Try
commenting out parts of your script. For instance, maybe start with a
minimalist script: open and close the files but don't process them.
See if the memory usage continues to be a problem. Then add elements
back in, making your minimalist script more and more like the real
one. If the extreme memory usage problem is isolated to one component
or section, you'll find it this way.

HTH
Philip
 
Reply With Quote
 
 
 
 
Tim Arnold
Guest
Posts: n/a
 
      01-19-2009
"Philip Semanchuk" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>
> On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:
>
>> Hi all,
>>
>> I am running a python script which parses nearly 22,000 html files
>> locally
>> stored using BeautifulSoup.
>> The problem is the memory usage linearly increases as the files are
>> being
>> parsed.
>> When the script has crossed parsing 200 files or so, it consumes all the
>> available RAM and The CPU usage comes down to 0% (may be due to
>> excessive
>> paging).
>>
>> We tried 'del soup_object' and used 'gc.collect()'. But, no
>> improvement.
>>
>> Please guide me how to limit python's memory-usage or proper method for
>> handling BeautifulSoup object in resource effective manner

>
> You need to figure out where the memory is disappearing. Try commenting
> out parts of your script. For instance, maybe start with a minimalist
> script: open and close the files but don't process them. See if the
> memory usage continues to be a problem. Then add elements back in, making
> your minimalist script more and more like the real one. If the extreme
> memory usage problem is isolated to one component or section, you'll find
> it this way.
>
> HTH
> Philip


Also, are you creating a separate soup object for each file or reusing one
object over and over?
--Tim


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resource expression to access resource located in library Heinrich Moser ASP .Net 1 03-27-2008 04:25 PM
Very annoying error: Access to the path is denied. ASP.NET is not authorized to access the requested resource. Consider granting access rights to the resource to the ASP.NET request identity Jay ASP .Net 2 08-20-2007 07:38 PM
Resource manager problem: naming for embedded resource. Dirc Khan-Evans ASP .Net 1 10-17-2005 12:52 PM
The system cannot locate the resource specified. Error processing resource avishosh XML 2 08-08-2004 06:28 AM
Two macros for resource management Stefan Ram C Programming 9 08-01-2004 02:11 PM



Advertisments