Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > You thoughts/philosphies on manual garbage collection

Reply
Thread Tools

You thoughts/philosphies on manual garbage collection

 
 
dkmd_nielsen
Guest
Posts: n/a
 
      03-08-2007
The process that initiated my message earlier (about deleting array
elements) is a rather long running process of rebuilding and
reconfiguring parameter files. There hundreds of files, each with as
many as 22,000 parameters to processed. For example, four small test
files ran in about two minutes. There is a ton of string manipulation
going on, which probably translated into lots of trailing string parts
and pointer laying around RAM...clogging it up. I was thinking of
manually initiating garbage collection after every five or ten files
processed. Is that a smart thing?

What are yours thoughts on manually initiated garbage collection?
What kinds of practices result in bits and pieces of objects and
pointers being left laying around in the ether of RAM? Are there
tools that help see what happens to RAM while a process runs, like a
debugger does with variables?

Thanks for everything
dvn

 
Reply With Quote
 
 
 
 
ara.t.howard@noaa.gov
Guest
Posts: n/a
 
      03-08-2007
On Fri, 9 Mar 2007, dkmd_nielsen wrote:

> The process that initiated my message earlier (about deleting array
> elements) is a rather long running process of rebuilding and
> reconfiguring parameter files. There hundreds of files, each with as
> many as 22,000 parameters to processed. For example, four small test
> files ran in about two minutes. There is a ton of string manipulation
> going on, which probably translated into lots of trailing string parts
> and pointer laying around RAM...clogging it up. I was thinking of
> manually initiating garbage collection after every five or ten files
> processed. Is that a smart thing?
>
> What are yours thoughts on manually initiated garbage collection?
> What kinds of practices result in bits and pieces of objects and
> pointers being left laying around in the ether of RAM? Are there
> tools that help see what happens to RAM while a process runs, like a
> debugger does with variables?
>
> Thanks for everything
> dvn


if you can fork - that's the best - then you just let each child's death clean
up that sub-segment of work's memory.

-a
--
be kind whenever possible... it is always possible.
- the dalai lama

 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      03-09-2007
On 08.03.2007 23:18, wrote:
> On Fri, 9 Mar 2007, dkmd_nielsen wrote:
>
>> The process that initiated my message earlier (about deleting array
>> elements) is a rather long running process of rebuilding and
>> reconfiguring parameter files. There hundreds of files, each with as
>> many as 22,000 parameters to processed. For example, four small test
>> files ran in about two minutes. There is a ton of string manipulation
>> going on, which probably translated into lots of trailing string parts
>> and pointer laying around RAM...clogging it up. I was thinking of
>> manually initiating garbage collection after every five or ten files
>> processed. Is that a smart thing?


To OP: generally "manual" GC is considered bad since it interferes with
the automatic mechanism.

>> What are yours thoughts on manually initiated garbage collection?
>> What kinds of practices result in bits and pieces of objects and
>> pointers being left laying around in the ether of RAM? Are there
>> tools that help see what happens to RAM while a process runs, like a
>> debugger does with variables?
>>
>> Thanks for everything
>> dvn

>
> if you can fork - that's the best - then you just let each child's death
> clean
> up that sub-segment of work's memory.


Also, forking has the added advantage of better utilizing multi core CPU's.

If you do encounter excessive memory usage then you should

a) make sure you do not hold onto stuff longer than needed

b) check your algorithms for inefficient dealing with objects; since you
mention string processing, this is a typical gotcha:

s += "foo" # creates a new string
s << "foo" # just appends to s

Another one

a=[]
a += ["foo", "bar"] # creates another array
a << "foo" << "bar" # just appends
a.concat ["foo", "bar"] # just appends

c) If files you are processing are large then you might also try to do
some kind of stream processing where you do not have to keep the whole
file's content in memory (if that's applicable to your problem domain).

Kind regards

robert
 
Reply With Quote
 
Joel VanderWerf
Guest
Posts: n/a
 
      03-11-2007
wrote:
> On Fri, 9 Mar 2007, dkmd_nielsen wrote:
>
>> The process that initiated my message earlier (about deleting array
>> elements) is a rather long running process of rebuilding and
>> reconfiguring parameter files. There hundreds of files, each with as
>> many as 22,000 parameters to processed. For example, four small test
>> files ran in about two minutes. There is a ton of string manipulation
>> going on, which probably translated into lots of trailing string parts
>> and pointer laying around RAM...clogging it up. I was thinking of
>> manually initiating garbage collection after every five or ten files
>> processed. Is that a smart thing?
>>
>> What are yours thoughts on manually initiated garbage collection?
>> What kinds of practices result in bits and pieces of objects and
>> pointers being left laying around in the ether of RAM? Are there
>> tools that help see what happens to RAM while a process runs, like a
>> debugger does with variables?
>>
>> Thanks for everything
>> dvn

>
> if you can fork - that's the best - then you just let each child's death
> clean
> up that sub-segment of work's memory.


One caution: mark-and-sweep GC and fork don't always play well together,
in terms of sharing memory pages. The mark algorithm needs to touch all
live objects in the heap. The child inherits the parent's heap, with
copy on write. If the parent has a large heap, and the child does a GC,
all those pages are copied into the child's address space. Memory
usage will scale badly as the number of child processes grows. (Perhaps
you factor your process into one child for each of the hundreds of files?)

It can be a good idea to GC.disable in the child, in some cases:

- parent has large heap, and

- child lifespan and allocation rate are such that is does not need to GC

Some benchmarks:

http://blade.nagaokaut.ac.jp/cgi-bin...by-talk/186561

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

 
Reply With Quote
 
Daniel DeLorme
Guest
Posts: n/a
 
      03-13-2007
Joel VanderWerf wrote:
> One caution: mark-and-sweep GC and fork don't always play well together,
> in terms of sharing memory pages. The mark algorithm needs to touch all
> live objects in the heap. The child inherits the parent's heap, with
> copy on write. If the parent has a large heap, and the child does a GC,
> all those pages are copied into the child's address space. Memory usage
> will scale badly as the number of child processes grows. (Perhaps you
> factor your process into one child for each of the hundreds of files?)
>
> It can be a good idea to GC.disable in the child, in some cases:
>
> - parent has large heap, and
>
> - child lifespan and allocation rate are such that is does not need to GC
>
> Some benchmarks:
>
> http://blade.nagaokaut.ac.jp/cgi-bin...by-talk/186561


I looked for some extra information on this topic and found:
http://blog.beaver.net/2005/03/ruby_...pyonwrite.html

That's pretty disheartening news to me. I had plans to make a fcgi-like
process manager that would take advantage of copy-on-write to reduce the
memory footprint of a webapp by pre-loading all libraries in the parent
process. But if ruby's GC renders COW useless... there's not much point
anymore.

Are there any plans to optimize ruby to make it fork-friendly?

Daniel

 
Reply With Quote
 
Gary Wright
Guest
Posts: n/a
 
      03-13-2007

On Mar 11, 2007, at 3:26 PM, Joel VanderWerf wrote:

> wrote:
>> if you can fork - that's the best - then you just let each child's
>> death clean
>> up that sub-segment of work's memory.

>
> One caution: mark-and-sweep GC and fork don't always play well
> together, in terms of sharing memory pages. The mark algorithm
> needs to touch all live objects in the heap. The child inherits the
> parent's heap, with copy on write.


I think you are describing a different situation than the OP and Ara.

If you've got hundreds of files to process and the processing is
sufficiently
complex to justify forking for each file then the parent just
iterates over
the file list forking and waiting for each child to process each
file. The
parent's address space won't have all the stale objects generated by
the child's
processing so each new child starts with a reasonable memory footprint.

One fork per file is the easiest to program but if that is
problematic for
some reason you could batch things up pretty easily.


Gary Wright




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Manual Memory Management and Automatic Garbage Collection Tridib Bandopadhyay Ruby 25 05-19-2011 11:50 PM
Slow manual 'garbage collection' in C++ ?? brey_maastricht@hotmail.com C++ 14 06-18-2008 05:14 PM
Manual Garbage Collection Conan Ruby 3 06-13-2007 10:07 PM
Collection problems (create Collection object, add data to collection, bind collection to datagrid) Øyvind Isaksen ASP .Net 1 05-18-2007 09:24 AM
Templates - Garbage In Garbage Not Out ramiro_b@yahoo.com C++ 1 07-25-2005 04:48 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57