Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Reclaiming (lots of) memory

Reply
Thread Tools

Reclaiming (lots of) memory

 
 
Thomas Rast
Guest
Posts: n/a
 
      10-02-2004
Hello everyone

My scenario is somewhat involved, so if you're in a hurry, here's an
abstract: I have a daemon that needs about 80MB of RAM to build its
internal data structures, but can pack them to 20MB for the rest of
its lifetime. The other 60MB are never returned to the operating
system. I've thought of building and packing the structures in a
fork()ed process, then piping them over to the main part; is there an
easier way to get my RAM back?

Explanation:

The program is part of a toy IRC bot which reads roughly 100'000 lines
from logfiles and builds a huge dict of dicts which is then used to
build sentences much like word-based dissociated press would. All
entries are ints representing words or, as values of the inner
dictionaries, frequencies.

The final dictionary has about 325'000 entries, and the python process
uses around 80MB of RAM. See the test case at the end of my post:
fill() creates a similiar data structure. If you run it as a script
and compare the 'ps' outputs, you'll see that 'del d' frees only a
small part of the memory to the OS; on my Linux system, about 10%.

Using the standard 'struct' module, I can pack the keys and values of
the outer dictionary into strings, which brings memory usage down to
about 20M. Unfortunately, this has to be done as a second step (as
opposed to always keeping the dictionary in that format), otherwise it
would slow down things too much. Writing everything to disk (using
e.g. 'bsddb3') suffers from the same problem; I really have to do the
initial work in RAM to get acceptable speed.

So, do I really have to do this in two separate processes? Would it
help if I implemented the data storage part as a C extension module?
Or am I worrying too much about a problem that is better left to the
OS's memory management (i.e. swapping)?

Of course I'd also appreciate ideas for more efficient data layouts


Thomas

---8<---
#!/usr/bin/python

import random
import os

def fill(map):
random.seed(0)
for i in xrange(300000):
k1 = (random.randrange(100),
random.randrange(100),
random.randrange(100))
k2 = random.randrange(100)
if k1 in map:
d = map[k1]
else:
d = dict()
d[k2] = d.get(k2, 0) + 1
map[k1] = d

if __name__ == "__main__":
os.system('ps v')
d = dict()
fill(d)
os.system('ps v')
del d
os.system('ps v')
--->8---

--
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      10-02-2004
Thomas Rast <(E-Mail Removed)> writes:
> So, do I really have to do this in two separate processes? Would it
> help if I implemented the data storage part as a C extension module?
> Or am I worrying too much about a problem that is better left to the
> OS's memory management (i.e. swapping)?
>
> Of course I'd also appreciate ideas for more efficient data layouts


When faced with such problems I always find it worthwhile to remember
that the phone company managed to issue itemized long-distance bills
to each of its customers, even in the 1960's using mainframe computers
with a few hundred kb of memory and a room full of magtape drives like
you see in old movies. I ask myself "how would a 1960's mainframe
programmer do this, without megabytes of ram?".

I'm not sure what your dictionary of dictionaries is supposed to do,
but from your dissociated press description it sounds like you want to
look at the pairs of adjacent pairs of words and build up a
probability table. In that case I'd do something like:

1. Sequentially read the logs files, and for each pair of adjacent words,
write that pair to a temp file tmp1.
2. Sort tmp1 with os.system("sort").
3. Make a sequential pass through tmp1 to build a second temp file tmp2,
which is a compressed tmp1. For example, if tmp1 has the lines
apple pie
apple computer
apple pie
apple sauce

then make a single line in tmp2:

apple: pie:2,computer:1,sauce:1

Do that in the obvious way by building a temporary dict for the words
that follow "apple". Also, while doing this, build a Python dict
giving the offset in tmp2 for the line for each word, i.e.

d['apple'] = ofile.tell()
and similarly for all the other words (you have a Python dict with
325k entries which is each a single int; that shouldn't be too large).
There are of course lots of tradeoffs you can make to give a smaller
dict, if you need to. You could also use the array module to implement
an int-valued hash table without the overhead of a Python object in each slot.

4. Use os.mmap to map the file tmp2 into memory. Then to get the
successor words to 'apple', simply use the in-memory dict to get the
appropriate offset in tmp2, read and parse the line from that offset,
and emit the appropriate word.

5. There's lots of other optimization and compression you can do to
make tmp2 smaller, which can reduce your paging load, but the above
should be enough to get you started.
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      10-02-2004
Paul Rubin <http://(E-Mail Removed)> writes:
> apple pie
> apple computer
> apple pie
> apple sauce


That of course should have said

> apple computer
> apple pie
> apple pie
> apple sauce


since the file is sorted.
 
Reply With Quote
 
Thomas Rast
Guest
Posts: n/a
 
      10-03-2004
Paul Rubin <http://(E-Mail Removed)> writes:

> I'm not sure what your dictionary of dictionaries is supposed to do,
> but from your dissociated press description it sounds like you want to
> look at the pairs of adjacent pairs of words and build up a
> probability table. In that case I'd do something like:
>

[... use the 'sort' utility on a temporary file ...]

Thanks a lot for your input! I'm still not sure what exactly I'm
going to do, since I'd also like to update the tables as new lines
come in, but I may just have to regenerate them periodically. In any
case, your post certainly gives me some ideas.

- Thomas

--
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.
 
Reply With Quote
 
Nick Craig-Wood
Guest
Posts: n/a
 
      10-04-2004
Thomas Rast <(E-Mail Removed)> wrote:
> My scenario is somewhat involved, so if you're in a hurry, here's an
> abstract: I have a daemon that needs about 80MB of RAM to build its
> internal data structures, but can pack them to 20MB for the rest of
> its lifetime. The other 60MB are never returned to the operating
> system.


First a recap on how memory allocation works! This is general - I
don't know exactly how the python memory allocation works, but there
aren't too many ways of doing it under linux.

The way malloc() and friends work under linux is to claim large blocks
of memory from the OS. This is either done with brk()/sbrk() or with
mmmap("/dev/zero") (depending on exactly which version of glibc you
are using etc).

For brk()/sbrk() malloc keeps a single area which grows and grows.
This area *can* be shrunk, but only if there is nothing that was
allocated in the way.

mmap("/dev/zero") behaves in pretty much the same way (ie it shrinks
and grows), except for large allocation (>4k I think) the allocation
is done in its own mmap() area. This means that when it is freed it
completely disappears.

So for either of these methods freeing memory back to the OS relies on
the fact that there has been nothing allocated that will stop the
memory shrinking back down all the way. In a dynamic object language
like python thats pretty difficult to guarantee - perhaps impossible!

The one ray of hope is the mmap("/dev/zero") for blocks bigger than
4k. I'm pretty sure this is how modern glibc's work by default.

What you really want to do is get the allocation of the initial dict
and all of its data to be in its own mmap() block. No idea whether
this is possible in python though! Its the kind of thing you could do
in C++ if you could be bothered to wade through all the tedious syntax


When I've had this sort of problem in the past I've usually resorted
to an on disk database (dbm of some kind). These are usually fast
enough. It would have the advantage that you'd only have to parse the
log files once too. They are reasonably fast to access too as all the
hot pages soon get into the OS page cache.

Also as suggested by another poster /usr/bin/sort is very, very good
at huge datasets using minimal memory.

> I've thought of building and packing the structures in a fork()ed
> process, then piping them over to the main part;


Thats a neat idea. You could also build a dbm in the forked process.

> is there an easier way to get my RAM back?


I don't think so, but hopefully an expert who knows something about
how python allocates its objects will speak up now!

....

Just for fun I converted your program to use anydbm. It then took 135
seconds to run (vs 12 for your original code), but only used 4 MB of
memory top. The values of the dbm are pickled hashes. The keys could
be too, but I decided to just use a tab seperated string...

import random
import os
import anydbm
import cPickle

def fill(map):
random.seed(0)
for i in xrange(300000):
k1 = "%s\t%s\t%s" % (random.randrange(100),
random.randrange(100),
random.randrange(100))
k2 = random.randrange(100)
if k1 in map:
d = cPickle.loads(map[k1])
else:
d = dict()
d[k2] = d.get(k2, 0) + 1
map[k1] = cPickle.dumps(d, -1)

if __name__ == "__main__":
show_me = 'ps v %d' % os.getpid()
os.system(show_me)
d = anydbm.open("ztest", 'n')
fill(d)
os.system(show_me)
del d
os.system(show_me)

--
Nick Craig-Wood <(E-Mail Removed)> -- http://www.craig-wood.com/nick
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      10-04-2004
Nick Craig-Wood <(E-Mail Removed)> writes:
> Just for fun I converted your program to use anydbm. It then took 135
> seconds to run (vs 12 for your original code), but only used 4 MB of
> memory top. The values of the dbm are pickled hashes. The keys could
> be too, but I decided to just use a tab seperated string...


But I think it used much more than 4 MB of memory if you count the
amount of system cache that held that dbm file. Without the caching
there'd have had to be real disk seeks for all those dbm updates. So
I think if you had a lot more data, enough that you couldn't keep the
dbm in cache any more, you'd get a horrible slowdown. That's why you
may be better off with a sorting-based method, that only needs
sequential disk operations and not a lot of random seeks.
 
Reply With Quote
 
Nick Craig-Wood
Guest
Posts: n/a
 
      10-04-2004
Paul Rubin <> wrote:
> Nick Craig-Wood <(E-Mail Removed)> writes:
> > Just for fun I converted your program to use anydbm. It then took 135
> > seconds to run (vs 12 for your original code), but only used 4 MB of
> > memory top. The values of the dbm are pickled hashes. The keys could
> > be too, but I decided to just use a tab seperated string...

>
> But I think it used much more than 4 MB of memory if you count the
> amount of system cache that held that dbm file.


The dbm file is only 10 MB though, so thats 10MB of memory extra used
which *is* reclaimable!

> Without the caching there'd have had to be real disk seeks for all
> those dbm updates. So I think if you had a lot more data, enough
> that you couldn't keep the dbm in cache any more, you'd get a
> horrible slowdown.


You've only got to keep the index of the dbm in cache and its designed
for that purpose.

You do get lots of disk io when making the dbm though - I suspect it
does a lot of fsync() - you may be able to turn that off.

> That's why you may be better off with a sorting-based method, that
> only needs sequential disk operations and not a lot of random
> seeks.


A sorting based method will be better for huge datasets that are only
compiled once. I would have thought a dbm would win for lots of
updates and medium sized datasets.

--
Nick Craig-Wood <(E-Mail Removed)> -- http://www.craig-wood.com/nick
 
Reply With Quote
 
Thomas Rast
Guest
Posts: n/a
 
      10-04-2004
Nick Craig-Wood <(E-Mail Removed)> writes:

> mmap("/dev/zero") behaves in pretty much the same way (ie it shrinks
> and grows), except for large allocation (>4k I think) the allocation
> is done in its own mmap() area. This means that when it is freed it
> completely disappears.


Ok. That explains why e.g.

>>> l = [0] * 10000000 # 10 million
>>> del l


frees the memory used for l, as expected.

> Thats a neat idea. You could also build a dbm in the forked
> process.


I've implemented it that way now. Using a pipe to transfer the data
to the dbm appears to essentially double the I/O overhead and is very
inefficient.

> > is there an easier way to get my RAM back?

>
> I don't think so, but hopefully an expert who knows something about
> how python allocates its objects will speak up now!


Well, can't blame Python if you'd need black magic even in C

Thanks for the explanations!

Thomas

--
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.
 
Reply With Quote
 
Thomas Rast
Guest
Posts: n/a
 
      10-04-2004
Nick Craig-Wood <(E-Mail Removed)> writes:

> Paul Rubin <> wrote:
> > But I think it used much more than 4 MB of memory if you count the
> > amount of system cache that held that dbm file.

>
> The dbm file is only 10 MB though, so thats 10MB of memory extra used
> which *is* reclaimable!


Just to clear this up, the problem isn't that the process takes too
much memory while initializing. The machine that runs it has 256MB
RAM, and for all I care it can fill all available memory (when the
logfiles have grown to about four times what they are today, it
will...). The problem was that it wouldn't free *unused* RAM after
initialization.

Anyway, it works now. Thanks for your help Nick and Paul!

- Thomas

--
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.
 
Reply With Quote
 
Andrew Dalke
Guest
Posts: n/a
 
      10-04-2004
Thomas Rast wrote:
> I've implemented it that way now. Using a pipe to transfer the data
> to the dbm appears to essentially double the I/O overhead and is very
> inefficient.

...
> Well, can't blame Python if you'd need black magic even in C


You could also try using shared memory. There's the raw
layer, like http://0pointer.de/lennart/projects/libshbuf/
or a higher layer like POSH ( http://poshmodule.sourceforge.net/ ).

I've used neither. The letter only works on Intel (and
perhaps only Linux) because it does locking with some inline
asm.

Andrew
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Reclaiming locks CJ C Programming 6 10-30-2007 07:10 AM
reclaiming hard drive space =?Utf-8?B?bG9hZGVyb3Bw?= Windows 64bit 10 04-10-2007 02:54 AM
0 Kb of RAM left, reclaiming it ?? fredcromer Computer Information 8 06-19-2004 03:48 PM
Session Variables and reclaiming associated memory. Mario ASP General 1 02-18-2004 01:08 PM
Reclaiming Home Page in IE 6.0 Ron Flaman Computer Support 4 06-26-2003 05:18 PM



Advertisments