![]() |
|
|
|||||||
![]() |
Python - Python dictionary size/entry limit? |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
I wrote a script to process textual data and extract phrases from
them, storing these phrases in a dictionary. It encounters a MemoryError when there are about 11.18M keys in the dictionary, and the size is about 1.5GB. I tried multiple times, and the error occurs everytime at exactly the same place (with the same number of keys in the dict). I then split the dictionary into two using a simple algorithm: if str[0]<='m': dict=dict1 else: dict=dict2 #use dict... And it worked fine. The total size of the two dictionaries well exceeded 2GB yet no MemoryError occured. I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to the size or number of entries that a single dictionary can possess? By searching on the web I can't find a clue why this problem occurs. intelliminer@gmail.com |
|
|
|
|
#2 |
|
Posts: n/a
|
On Feb 21, 6:25*pm, Tino Wildenhain <t...@wildenhain.de> wrote:
> intellimi...@gmail.com wrote: > > I wrote a script to process textual data and extract phrases from > > them, storing these phrases in a dictionary. It encounters a > > MemoryError when there are about 11.18M keys in the dictionary, and > > the size is about 1.5GB. I tried multiple times, and the error occurs > > everytime at exactly the same place (with the same number of keys in > > the dict). I then split the dictionary into two using a simple > > algorithm: > > > if str[0]<='m': > > * * dict=dict1 > > else: > > * * dict=dict2 > > > #use dict... > > > And it worked fine. The total size of the two dictionaries well > > exceeded 2GB yet no MemoryError occured. > > > I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to > > the size or number of entries that a single dictionary can possess? By > > searching on the web I can't find a clue why this problem occurs. > > *From what can be deducted from the headers of your message: > X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1;... > you are using windows? > It seems either python or windows memory management somehow prevent > the use of continuous memory areas that large. > We've got such an example somewhere down the list which was similar > (iirc it was a large string in memory) which runned perfectly > with linux. You can try yourself maybe by installing ubuntu > on the same host. (If you feel fit you can even skip the install > and run it off life CD but then you need to fiddle a little to > get swap space on disk) > > Regards > Tino > > > -- > >http://mail.python.org/mailman/listinfo/python-list > > > > *smime.p7s > 4KViewDownload Yes, it's winxp, I forgot to mention it. Thanks for the reply. intelliminer@gmail.com |
|
|
|
#3 |
|
Posts: n/a
|
wrote:
> I wrote a script to process textual data and extract phrases from > them, storing these phrases in a dictionary. It encounters a > MemoryError when there are about 11.18M keys in the dictionary, and > the size is about 1.5GB. > [...] > I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to > the size or number of entries that a single dictionary can possess? By > searching on the web I can't find a clue why this problem occurs. Python dicts are only limited by what your OS returns as free memory. However, when a dict grows, it needs to resize, which means that it has to create a bigger copy of itself and redistribute the keys. For a dict that is already 1.5GB big, this can temporarily eat a lot more memory than you have, at least more than two times as much as the size of the dict itself. You may be better served with one of the dbm databases that come with Python. They live on-disk but do the usual in-memory caching. They'll likely perform a lot better than your OS level swap file. Stefan Stefan Behnel |
|
|
|
#4 |
|
Posts: n/a
|
Stefan Behnel wrote:
> wrote: > You may be better served with one of the dbm databases that come with > Python. They live on-disk but do the usual in-memory caching. They'll > likely perform a lot better than your OS level swap file. > > Stefan the bsddb module has the feature that access to the database uses the same methods as python dictionaries. Sean Sean |
|
|
|
#5 |
|
Posts: n/a
|
On Feb 21, 6:47*pm, Stefan Behnel <stefan...@behnel.de> wrote:
> intellimi...@gmail.com wrote: > > I wrote a script to process textual data and extract phrases from > > them, storing these phrases in a dictionary. It encounters a > > MemoryError when there are about 11.18M keys in the dictionary, and > > the size is about 1.5GB. > > [...] > > I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to > > the size or number of entries that a single dictionary can possess? By > > searching on the web I can't find a clue why this problem occurs. > > Python dicts are only limited by what your OS returns as free memory. > However, when a dict grows, it needs to resize, which means that it has to > create a bigger copy of itself and redistribute the keys. For a dict that > is already 1.5GB big, this can temporarily eat a lot more memory than you > have, at least more than two times as much as the size of the dict itself.. > > You may be better served with one of the dbm databases that come with > Python. They live on-disk but do the usual in-memory caching. They'll > likely perform a lot better than your OS level swap file. > > Stefan Ummm, I didn't know about the dbm databases. It seems there are many different modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm needing to implement a constant hashtable with a large number of keys, but only a small fraction of them will be accessed frequently, the read speed is crucial. It would be ideal if the implementation caches all the frequently used key/value pairs in memory. Which module should I use? And is there a way to specify the amount of memory it uses for caching? BTW, the target platform is Linux. Thank you. intelliminer@gmail.com |
|
|
|
#6 |
|
Posts: n/a
|
wrote:
> Ummm, I didn't know about the dbm databases. It seems there are many > different > modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm > needing to implement > a constant hashtable with a large number of keys, but only a small > fraction of them > will be accessed frequently, the read speed is crucial. It would be > ideal if > the implementation caches all the frequently used key/value pairs in > memory. Which > module should I use? And is there a way to specify the amount of > memory it uses for caching? > BTW, the target platform is Linux. Their interfaces are mostly compatible to Python dicts. Just keep your code independent at the beginning and benchmark it against all of them. Stefan Stefan Behnel |
|
|
|
#7 |
|
Posts: n/a
|
wrote:
> Is there a limit to the size or number of entries that a single > dictionary can possess? On a 32-bit system, the dictionary can have up to 2**31 slots, meaning that the maximum number of keys is slightly smaller (about 2**30). As others have pointed out, Python's or Windows' memory management might have led to fragmentation, so that no contiguous piece of memory to hold the resized dictionary can be found. One solution to that problem would be to use a 64-bit system. Regards, Martin Martin v. Löwis |
|
|
|
#8 |
|
Posts: n/a
|
Martin v. Löwis wrote:
> wrote: >> Is there a limit to the size or number of entries that a single >> dictionary can possess? > > On a 32-bit system, the dictionary can have up to 2**31 slots, > meaning that the maximum number of keys is slightly smaller > (about 2**30). Which, in practice, means that the size is limited by the available memory. Stefan Stefan Behnel |
|
|
|
#9 |
|
Posts: n/a
|
>> On a 32-bit system, the dictionary can have up to 2**31 slots,
>> meaning that the maximum number of keys is slightly smaller >> (about 2**30). > > Which, in practice, means that the size is limited by the available memory. Right. Each slot takes 12 bytes, so the storage for the slots alone would consume all available address space. From that point of view, you can't possibly have more than 314M slots in a 32-bit address space (roughly 2**2 Regards, Martin Martin v. Löwis |
|