Go Back   Velocity Reviews > Newsgroups > Python
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

Python - Python dictionary size/entry limit?

 
Thread Tools Search this Thread
Old 02-21-2009, 09:35 AM   #1
Default Python dictionary size/entry limit?


I wrote a script to process textual data and extract phrases from
them, storing these phrases in a dictionary. It encounters a
MemoryError when there are about 11.18M keys in the dictionary, and
the size is about 1.5GB. I tried multiple times, and the error occurs
everytime at exactly the same place (with the same number of keys in
the dict). I then split the dictionary into two using a simple
algorithm:

if str[0]<='m':
dict=dict1
else:
dict=dict2

#use dict...

And it worked fine. The total size of the two dictionaries well
exceeded 2GB yet no MemoryError occured.

I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
the size or number of entries that a single dictionary can possess? By
searching on the web I can't find a clue why this problem occurs.


intelliminer@gmail.com
  Reply With Quote
Old 02-21-2009, 10:45 AM   #2
intelliminer@gmail.com
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
On Feb 21, 6:25*pm, Tino Wildenhain <t...@wildenhain.de> wrote:
> intellimi...@gmail.com wrote:
> > I wrote a script to process textual data and extract phrases from
> > them, storing these phrases in a dictionary. It encounters a
> > MemoryError when there are about 11.18M keys in the dictionary, and
> > the size is about 1.5GB. I tried multiple times, and the error occurs
> > everytime at exactly the same place (with the same number of keys in
> > the dict). I then split the dictionary into two using a simple
> > algorithm:

>
> > if str[0]<='m':
> > * * dict=dict1
> > else:
> > * * dict=dict2

>
> > #use dict...

>
> > And it worked fine. The total size of the two dictionaries well
> > exceeded 2GB yet no MemoryError occured.

>
> > I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
> > the size or number of entries that a single dictionary can possess? By
> > searching on the web I can't find a clue why this problem occurs.

>
> *From what can be deducted from the headers of your message:
> X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1;...
> you are using windows?
> It seems either python or windows memory management somehow prevent
> the use of continuous memory areas that large.
> We've got such an example somewhere down the list which was similar
> (iirc it was a large string in memory) which runned perfectly
> with linux. You can try yourself maybe by installing ubuntu
> on the same host. (If you feel fit you can even skip the install
> and run it off life CD but then you need to fiddle a little to
> get swap space on disk)
>
> Regards
> Tino
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list

>
>
>
> *smime.p7s
> 4KViewDownload


Yes, it's winxp, I forgot to mention it.
Thanks for the reply.


intelliminer@gmail.com
  Reply With Quote
Old 02-21-2009, 10:47 AM   #3
Stefan Behnel
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
wrote:
> I wrote a script to process textual data and extract phrases from
> them, storing these phrases in a dictionary. It encounters a
> MemoryError when there are about 11.18M keys in the dictionary, and
> the size is about 1.5GB.
> [...]
> I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
> the size or number of entries that a single dictionary can possess? By
> searching on the web I can't find a clue why this problem occurs.


Python dicts are only limited by what your OS returns as free memory.
However, when a dict grows, it needs to resize, which means that it has to
create a bigger copy of itself and redistribute the keys. For a dict that
is already 1.5GB big, this can temporarily eat a lot more memory than you
have, at least more than two times as much as the size of the dict itself.

You may be better served with one of the dbm databases that come with
Python. They live on-disk but do the usual in-memory caching. They'll
likely perform a lot better than your OS level swap file.

Stefan


Stefan Behnel
  Reply With Quote
Old 02-22-2009, 02:03 AM   #4
Sean
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
Stefan Behnel wrote:
> wrote:
> You may be better served with one of the dbm databases that come with
> Python. They live on-disk but do the usual in-memory caching. They'll
> likely perform a lot better than your OS level swap file.
>
> Stefan


the bsddb module has the feature that access to the database uses the
same methods as python dictionaries.

Sean


Sean
  Reply With Quote
Old 02-22-2009, 07:08 AM   #5
intelliminer@gmail.com
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
On Feb 21, 6:47*pm, Stefan Behnel <stefan...@behnel.de> wrote:
> intellimi...@gmail.com wrote:
> > I wrote a script to process textual data and extract phrases from
> > them, storing these phrases in a dictionary. It encounters a
> > MemoryError when there are about 11.18M keys in the dictionary, and
> > the size is about 1.5GB.
> > [...]
> > I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
> > the size or number of entries that a single dictionary can possess? By
> > searching on the web I can't find a clue why this problem occurs.

>
> Python dicts are only limited by what your OS returns as free memory.
> However, when a dict grows, it needs to resize, which means that it has to
> create a bigger copy of itself and redistribute the keys. For a dict that
> is already 1.5GB big, this can temporarily eat a lot more memory than you
> have, at least more than two times as much as the size of the dict itself..
>
> You may be better served with one of the dbm databases that come with
> Python. They live on-disk but do the usual in-memory caching. They'll
> likely perform a lot better than your OS level swap file.
>
> Stefan


Ummm, I didn't know about the dbm databases. It seems there are many
different
modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm
needing to implement
a constant hashtable with a large number of keys, but only a small
fraction of them
will be accessed frequently, the read speed is crucial. It would be
ideal if
the implementation caches all the frequently used key/value pairs in
memory. Which
module should I use? And is there a way to specify the amount of
memory it uses for caching?
BTW, the target platform is Linux.

Thank you.


intelliminer@gmail.com
  Reply With Quote
Old 02-22-2009, 07:21 PM   #6
Stefan Behnel
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
wrote:
> Ummm, I didn't know about the dbm databases. It seems there are many
> different
> modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm
> needing to implement
> a constant hashtable with a large number of keys, but only a small
> fraction of them
> will be accessed frequently, the read speed is crucial. It would be
> ideal if
> the implementation caches all the frequently used key/value pairs in
> memory. Which
> module should I use? And is there a way to specify the amount of
> memory it uses for caching?
> BTW, the target platform is Linux.


Their interfaces are mostly compatible to Python dicts. Just keep your code
independent at the beginning and benchmark it against all of them.

Stefan


Stefan Behnel
  Reply With Quote
Old 02-22-2009, 08:11 PM   #7
Martin v. Löwis
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
wrote:
> Is there a limit to the size or number of entries that a single
> dictionary can possess?


On a 32-bit system, the dictionary can have up to 2**31 slots,
meaning that the maximum number of keys is slightly smaller
(about 2**30).

As others have pointed out, Python's or Windows' memory management
might have led to fragmentation, so that no contiguous piece of
memory to hold the resized dictionary can be found.

One solution to that problem would be to use a 64-bit system.

Regards,
Martin



Martin v. Löwis
  Reply With Quote
Old 02-25-2009, 02:39 PM   #8
Stefan Behnel
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
Martin v. Löwis wrote:
> wrote:
>> Is there a limit to the size or number of entries that a single
>> dictionary can possess?

>
> On a 32-bit system, the dictionary can have up to 2**31 slots,
> meaning that the maximum number of keys is slightly smaller
> (about 2**30).


Which, in practice, means that the size is limited by the available memory.

Stefan


Stefan Behnel
  Reply With Quote
Old 02-25-2009, 08:19 PM   #9
Martin v. Löwis
 
Posts: n/a
Default Re: Python dictionary size/entry limit?
>> On a 32-bit system, the dictionary can have up to 2**31 slots,
>> meaning that the maximum number of keys is slightly smaller
>> (about 2**30).

>
> Which, in practice, means that the size is limited by the available memory.


Right. Each slot takes 12 bytes, so the storage for the slots alone
would consume all available address space.

From that point of view, you can't possibly have more than 314M slots
in a 32-bit address space (roughly 2**2.

Regards,
Martin


Martin v. Löwis
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46