Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > creating garbage collectable objects (caching objects)

Reply
Thread Tools

creating garbage collectable objects (caching objects)

 
 
News123
Guest
Posts: n/a
 
      06-28-2009
Hi.

I started playing with PIL.

I'm performing operations on multiple images and would like compromise
between speed and memory requirement.

The fast approach would load all images upfront and create then multiple
result files. The problem is, that I do not have enough memory to load
all files.

The slow approach is to load each potential source file only when it is
needed and to release it immediately after (leaving it up to the gc to
free memory when needed)



The question, that I have is whether there is any way to tell python,
that certain objects could be garbage collected if needed and ask python
at a later time whether the object has been collected so far (image has
to be reloaded) or not (image would not have to be reloaded)


# Fastest approach:
imgs = {}
for fname in all_image_files:
imgs[fname] = Image.open(fname)
for creation_rule in all_creation_rules():
img = Image.new(...)
for img_file in creation_rule.input_files():
img = do_somethingwith(img,imgs[img_file])
img.save()


# Slowest approach:
for creation_rule in all_creation_rules():
img = Image.new(...)
for img_file in creation_rule.input_files():
src_img = Image.open(img_file)
img = do_somethingwith(img,src_img)
img.save()



# What I'd like to do is something like:
imgs = GarbageCollectable_dict()
for creation_rule in all_creation_rules():
img = Image.new(...)
for img_file in creation_rule.input_files():
if src_img in imgs: # if 'm lucke the object is still there
src_img = imgs[img_file]
else:
src_img = Image.open(img_file)
img = do_somethingwith(img,src_img)
img.save()



Is this possible?

Thaks in advance for an answer or any other ideas of
how I could do smart caching without hogging all the system's
memory





 
Reply With Quote
 
 
 
 
Terry Reedy
Guest
Posts: n/a
 
      06-28-2009
News123 wrote:
> Hi.
>
> I started playing with PIL.
>
> I'm performing operations on multiple images and would like compromise
> between speed and memory requirement.
>
> The fast approach would load all images upfront and create then multiple
> result files. The problem is, that I do not have enough memory to load
> all files.
>
> The slow approach is to load each potential source file only when it is
> needed and to release it immediately after (leaving it up to the gc to
> free memory when needed)
>
> The question, that I have is whether there is any way to tell python,
> that certain objects could be garbage collected if needed and ask python
> at a later time whether the object has been collected so far (image has
> to be reloaded) or not (image would not have to be reloaded)


See the weakref module. But note that in CPython, objects are collected
as soon as there all no normal references, not when 'needed'.
>
>
> # Fastest approach:
> imgs = {}
> for fname in all_image_files:
> imgs[fname] = Image.open(fname)
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> img = do_somethingwith(img,imgs[img_file])
> img.save()
>
>
> # Slowest approach:
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> src_img = Image.open(img_file)
> img = do_somethingwith(img,src_img)
> img.save()
>
>
>
> # What I'd like to do is something like:
> imgs = GarbageCollectable_dict()
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> if src_img in imgs: # if 'm lucke the object is still there
> src_img = imgs[img_file]
> else:
> src_img = Image.open(img_file)
> img = do_somethingwith(img,src_img)
> img.save()


> Is this possible?


 
Reply With Quote
 
 
 
 
Simon Forman
Guest
Posts: n/a
 
      06-28-2009
On Jun 28, 11:03*am, News123 <(E-Mail Removed)> wrote:
> Hi.
>
> I started playing with PIL.
>
> I'm performing operations on multiple images and would like compromise
> between speed and memory requirement.
>
> The fast approach would load all images upfront and create then multiple
> result files. The problem is, that I do not have enough memory to load
> all files.
>
> The slow approach is to load each potential source file only when it is
> needed and to release it immediately after (leaving it up to the gc to
> free memory when needed)
>
> The question, that I have is whether there is any way to tell python,
> that certain objects could be garbage collected if needed and ask python
> at a later time whether the object has been collected so far (image has
> to be reloaded) or not (image would not have to be reloaded)
>
> # Fastest approach:
> imgs = {}
> for fname in all_image_files:
> * * imgs[fname] = Image.open(fname)
> for creation_rule in all_creation_rules():
> * * img = Image.new(...)
> * * for img_file in creation_rule.input_files():
> * * * * img = do_somethingwith(img,imgs[img_file])
> * * img.save()
>
> # Slowest approach:
> for creation_rule in all_creation_rules():
> * * img = Image.new(...)
> * * for img_file in creation_rule.input_files():
> * * * * src_img = Image.open(img_file)
> * * * * img = do_somethingwith(img,src_img)
> * * img.save()
>
> # What I'd like to do is something like:
> imgs = GarbageCollectable_dict()
> for creation_rule in all_creation_rules():
> * * img = Image.new(...)
> * * for img_file in creation_rule.input_files():
> * * * * if src_img in imgs: # if 'm lucke the object is still there
> * * * * * * * * src_img = imgs[img_file]
> * * * * else:
> * * * * * * * * src_img = Image.open(img_file)
> * * * * img = do_somethingwith(img,src_img)
> * * img.save()
>
> Is this possible?
>
> Thaks in advance for an answer or any other ideas of
> how I could do smart caching without hogging all the system's
> memory


Maybe I'm just being thick today, but why would the "slow" approach be
slow? The same amount of I/O and processing would be done either way,
no?
Have you timed both methods?

That said, take a look at the weakref module Terry Reedy already
mentioned, and maybe the gc (garbage collector) module too (although
that might just lead to wasting a lot of time fiddling with stuff that
the gc is supposed to handle transparently for you in the first place.)
 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      06-28-2009
News123 wrote:
> Hi.
>
> I started playing with PIL.
>
> I'm performing operations on multiple images and would like compromise
> between speed and memory requirement.
>
> The fast approach would load all images upfront and create then multiple
> result files. The problem is, that I do not have enough memory to load
> all files.
>
> The slow approach is to load each potential source file only when it is
> needed and to release it immediately after (leaving it up to the gc to
> free memory when needed)
>
>
>
> The question, that I have is whether there is any way to tell python,
> that certain objects could be garbage collected if needed and ask python
> at a later time whether the object has been collected so far (image has
> to be reloaded) or not (image would not have to be reloaded)
>
>
> # Fastest approach:
> imgs = {}
> for fname in all_image_files:
> imgs[fname] = Image.open(fname)
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> img = do_somethingwith(img,imgs[img_file])
> img.save()
>
>
> # Slowest approach:
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> src_img = Image.open(img_file)
> img = do_somethingwith(img,src_img)
> img.save()
>
>
>
> # What I'd like to do is something like:
> imgs = GarbageCollectable_dict()
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> if src_img in imgs: # if 'm lucke the object is still there
> src_img = imgs[img_file]
> else:
> src_img = Image.open(img_file)
> img = do_somethingwith(img,src_img)
> img.save()
>
>
>
> Is this possible?
>
> Thaks in advance for an answer or any other ideas of
> how I could do smart caching without hogging all the system's
> memory
>
>
>

You don't say what implementation of Python, nor on what OS platform.
Yet you're asking how to influence that implementation.

In CPython, version 2.6 (and probably most other versions, but somebody
else would have to chime in) an object is freed as soon as its reference
count goes to zero. So the garbage collector is only there to catch
cycles, and it runs relatively infrequently.

So, if you keep a reference to an object, it'll not be freed.
Theoretically, you can use the weakref module to keep a reference
without inhibiting the garbage collection, but I don't have any
experience with the module. You could start by studying its
documentation. But probably you want a weakref.WeakValueDictionary.
Use that in your third approach to store the cache.

If you're using Cython or Jython, or one of many other implementations,
the rules will be different.

The real key to efficiency is usually managing locality of reference.
If a given image is going to be used for many output files, you might
try to do all the work with it before going on to the next image. In
that case, it might mean searching all_creation_rules for rules which
reference the file you've currently loaded, measurement is key.


 
Reply With Quote
 
News123
Guest
Posts: n/a
 
      06-29-2009
Dave Angel wrote:
> News123 wrote:
>> Hi.
>>
>> I started playing with PIL.
>>
>> I'm performing operations on multiple images and would like compromise
>> between speed and memory requirement.
>> . . .
>>
>> The question, that I have is whether there is any way to tell python,
>> that certain objects could be garbage collected if needed and ask python
>> at a later time whether the object has been collected so far (image has
>> to be reloaded) or not (image would not have to be reloaded)
>>
>>


>>

> You don't say what implementation of Python, nor on what OS platform.
> Yet you're asking how to influence that implementation.


Sorry my fault. I'm using C-python under Windows and under Linux
>
> In CPython, version 2.6 (and probably most other versions, but somebody
> else would have to chime in) an object is freed as soon as its reference
> count goes to zero. So the garbage collector is only there to catch
> cycles, and it runs relatively infrequently.


If CYthon frees objects as early as possible (as soon as the refcount is
0), then weakref wil not really help me.
In this case I'd have to elaborate into a cache like structure.
>
> So, if you keep a reference to an object, it'll not be freed.
> Theoretically, you can use the weakref module to keep a reference
> without inhibiting the garbage collection, but I don't have any
> experience with the module. You could start by studying its
> documentation. But probably you want a weakref.WeakValueDictionary.
> Use that in your third approach to store the cache.
>
> If you're using Cython or Jython, or one of many other implementations,
> the rules will be different.
>
> The real key to efficiency is usually managing locality of reference.
> If a given image is going to be used for many output files, you might
> try to do all the work with it before going on to the next image. In
> that case, it might mean searching all_creation_rules for rules which
> reference the file you've currently loaded, measurement is key.


Changing the order of the images to be calculated is key and I'm working
on that.

For a first step I can reorder the image creation such, that all outpout
images, that depend only on one input image will be calculated one after
the other.

so for this case I can transform:
# Slowest approach:
for creation_rule in all_creation_rules():
img = Image.new(...)
for img_file in creation_rule.input_files():
src_img = Image.open(img_file)
img = do_somethingwith(img,src_img) # wrong indentation in OP
img.save()


into
src_img = Image.open(img_file)
for creation_rule in all_creation_rules_with_on_src_img():
img = Image.new(...)
img = do_somethingwith(img,src_img)
img.save()


What I was more concerned is a group of output images depending on TWO
or more input images.

Depending on the platform (and the images) I might not be able to
preload all two (or more images)

So, as CPython's garbage collection takes always place immediately,
then I'd like to pursue something else.
I can create a cache, which caches input files as long as python leaves
at least n MB available for the rest of the system.

For this I have to know how much RAM is still available on a system.

I'll start looking into this.

thanks again



N


 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      06-29-2009
News123 wrote:
> Dave Angel wrote:
>
>> News123 wrote:
>>
>>> Hi.
>>>
>>> I started playing with PIL.
>>>
>>> I'm performing operations on multiple images and would like compromise
>>> between speed and memory requirement.
>>> . . .
>>>
>>> The question, that I have is whether there is any way to tell python,
>>> that certain objects could be garbage collected if needed and ask python
>>> at a later time whether the object has been collected so far (image has
>>> to be reloaded) or not (image would not have to be reloaded)
>>>
>>>
>>>

>
>
>>>
>>>

>> You don't say what implementation of Python, nor on what OS platform.
>> Yet you're asking how to influence that implementation.
>>

>
> Sorry my fault. I'm using C-python under Windows and under Linux
>
>> In CPython, version 2.6 (and probably most other versions, but somebody
>> else would have to chime in) an object is freed as soon as its reference
>> count goes to zero. So the garbage collector is only there to catch
>> cycles, and it runs relatively infrequently.
>>

>
> If CYthon frees objects as early as possible (as soon as the refcount is
> 0), then weakref wil not really help me.
> In this case I'd have to elaborate into a cache like structure.
>
>> So, if you keep a reference to an object, it'll not be freed.
>> Theoretically, you can use the weakref module to keep a reference
>> without inhibiting the garbage collection, but I don't have any
>> experience with the module. You could start by studying its
>> documentation. But probably you want a weakref.WeakValueDictionary.
>> Use that in your third approach to store the cache.
>>
>> If you're using Cython or Jython, or one of many other implementations,
>> the rules will be different.
>>
>> The real key to efficiency is usually managing locality of reference.
>> If a given image is going to be used for many output files, you might
>> try to do all the work with it before going on to the next image. In
>> that case, it might mean searching all_creation_rules for rules which
>> reference the file you've currently loaded, measurement is key.
>>

>
> Changing the order of the images to be calculated is key and I'm working
> on that.
>
> For a first step I can reorder the image creation such, that all outpout
> images, that depend only on one input image will be calculated one after
> the other.
>
> so for this case I can transform:
> # Slowest approach:
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> src_img = Image.open(img_file)
> img = do_somethingwith(img,src_img) # wrong indentation in OP
> img.save()
>
>
> into
> src_img = Image.open(img_file)
> for creation_rule in all_creation_rules_with_on_src_img():
> img = Image.new(...)
> img = do_somethingwith(img,src_img)
> img.save()
>
>
> What I was more concerned is a group of output images depending on TWO
> or more input images.
>
> Depending on the platform (and the images) I might not be able to
> preload all two (or more images)
>
> So, as CPython's garbage collection takes always place immediately,
> then I'd like to pursue something else.
> I can create a cache, which caches input files as long as python leaves
> at least n MB available for the rest of the system.
>
> For this I have to know how much RAM is still available on a system.
>
> I'll start looking into this.
>
> thanks again
>
>
>
> N
>
>
>

As I said earlier, I think weakref is probably what you need. A weakref
is still a reference from the point of view of the ref-counting, but not
from the point of view of the garbage collector. Have you read the help
on weakref module? In particular, did you read Pep 0205?
http://www.python.org/dev/peps/pep-0205/

Object cache is one of the two reasons for the weakref module.

 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      06-29-2009
En Mon, 29 Jun 2009 08:01:20 -0300, Dave Angel <(E-Mail Removed)> escribió:
> News123 wrote:


>> What I was more concerned is a group of output images depending on TWO
>> or more input images.
>>
>> Depending on the platform (and the images) I might not be able to
>> preload all two (or more images)
>>
>> So, as CPython's garbage collection takes always place immediately,
>> then I'd like to pursue something else.
>> I can create a cache, which caches input files as long as python leaves
>> at least n MB available for the rest of the system.


> As I said earlier, I think weakref is probably what you need. A weakref
> is still a reference from the point of view of the ref-counting, but not
> from the point of view of the garbage collector. Have you read the help
> on weakref module? In particular, did you read Pep 0205?
> http://www.python.org/dev/peps/pep-0205/


You've misunderstood something. A weakref is NOT "a reference from the
point of view of the ref-counting", it adds zero to the reference count.
When the last "real" reference to some object is lost, the object is
destroyed, even if there exist weak references to it. That's the whole
point of a weak reference. The garbage collector isn't directly related.

py> from sys import getrefcount as rc
py> class X(object): pass
....
py> x=X()
py> rc(x)
2
py> y=x
py> rc(x)
3
py> import weakref
py> r=weakref.ref(x)
py> r
<weakref at 00BE56C0; to 'X' at 00BE4F30>
py> rc(x)
3
py> del y
py> rc(x)
2
py> del x
py> r
<weakref at 00BE56C0; dead>

(remember that getrefcount -as any function- holds a temporary reference
to its argument, so the number it returns is one more than the expected
value)

> Object cache is one of the two reasons for the weakref module.


....when you don't want the object to stay artificially alive just because
it's referenced in the cache. But the OP wants a different behavior, it
seems. A standard dictionary where images are removed when they're no more
needed (or a memory restriction is fired).

--
Gabriel Genellina

 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      06-29-2009
Gabriel Genellina wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">En Mon,
> 29 Jun 2009 08:01:20 -0300, Dave Angel <(E-Mail Removed)> escribió:
>> News123 wrote:

>
>>> What I was more concerned is a group of output images depending on TWO
>>> or more input images.
>>>
>>> Depending on the platform (and the images) I might not be able to
>>> preload all two (or more images)
>>>
>>> So, as CPython's garbage collection takes always place immediately,
>>> then I'd like to pursue something else.
>>> I can create a cache, which caches input files as long as python leaves
>>> at least n MB available for the rest of the system.

>
>> As I said earlier, I think weakref is probably what you need. A
>> weakref is still a reference from the point of view of the
>> ref-counting, but not from the point of view of the garbage
>> collector. Have you read the help on weakref module? In particular,
>> did you read Pep 0205? http://www.python.org/dev/peps/pep-0205/

>
> You've misunderstood something. A weakref is NOT "a reference from the
> point of view of the ref-counting", it adds zero to the reference
> count. When the last "real" reference to some object is lost, the
> object is destroyed, even if there exist weak references to it. That's
> the whole point of a weak reference. The garbage collector isn't
> directly related.
>
> py> from sys import getrefcount as rc
> py> class X(object): pass
> ...
> py> x=X()
> py> rc(x)
> 2
> py> y=x
> py> rc(x)
> 3
> py> import weakref
> py> r=weakref.ref(x)
> py> r
> <weakref at 00BE56C0; to 'X' at 00BE4F30>
> py> rc(x)
> 3
> py> del y
> py> rc(x)
> 2
> py> del x
> py> r
> <weakref at 00BE56C0; dead>
>
> (remember that getrefcount -as any function- holds a temporary
> reference to its argument, so the number it returns is one more than
> the expected value)
>
>> Object cache is one of the two reasons for the weakref module.

>
> ...when you don't want the object to stay artificially alive just
> because it's referenced in the cache. But the OP wants a different
> behavior, it seems. A standard dictionary where images are removed
> when they're no more needed (or a memory restriction is fired).
>

Thanks for correcting me. As I said earlier, I have no experience with
weakref. The help and the PEP did sound to me like it would work for
his needs.

So how about adding an attribute in the large object that refers to the
object iself?. Then the ref count will never go to zero, but it can be
freed by the gc. Also store the ref in a WeakValueDictionary, and you
can find the object without blocking its gc.

And no, I haven't tried it, and wouldn't unless a machine had nothing
important running on it. Clearly, the gc might not be able to keep up
with this kind of abuse. But if gc is triggered by any attempt to make
too-large an object, it might work.

DaveA
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
MySQLdb collectable memory leak John Nagle Python 0 03-16-2007 07:16 PM
(ADV) Collectable Buffy Candy Bars oddhobby@hotmail.com DVD Video 0 12-06-2005 10:00 PM
Templates - Garbage In Garbage Not Out ramiro_b@yahoo.com C++ 1 07-25-2005 04:48 PM
The Latest Collectable DVD .... Allan DVD Video 0 02-04-2005 01:16 AM
Empty Boxes - mostlymint, all collectable... Gambler Digital Photography 4 03-06-2004 12:44 PM



Advertisments