Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > RE: Populating a dictionary, fast [SOLVED SOLVED]

Reply
Thread Tools

RE: Populating a dictionary, fast [SOLVED SOLVED]

 
 
Michael Bacarella
Guest
Posts: n/a
 
      11-12-2007
> > You can download the list of keys from here, it's 43M gzipped:
> > http://www.sendspace.com/file/9530i7
> >
> > and see it take about 45 minutes with this:
> >
> > $ cat cache-keys.py
> > #!/usr/bin/python
> > v = {}
> > for line in open('keys.txt'):
> > v[long(line.strip())] = True
> >
> >

> It takes about 20 seconds for me. It's possible it's related to
> int/long
> unification - try using Python 2.5. If you can't switch to 2.5, try
> using string keys instead of longs.


Yes, this was it. It ran *very* fast on Python v2.5.

Terribly on v2.4, v2.3.

(I thought I had already evaluated v2.5 but I see now that the server
With 2.5 on it invokes 2.3 for 'python'.)

Thanks!

 
Reply With Quote
 
 
 
 
Aaron Watters
Guest
Posts: n/a
 
      11-14-2007
On Nov 12, 12:46 pm, "Michael Bacarella" <(E-Mail Removed)> wrote:
>
> > It takes about 20 seconds for me. It's possible it's related to
> > int/long
> > unification - try using Python 2.5. If you can't switch to 2.5, try
> > using string keys instead of longs.

>
> Yes, this was it. It ran *very* fast on Python v2.5.


Um. Is this the take away from this thread? Longs as dictionary
keys are bad? Only for older versions of Python?

This could be a problem for people like me who build
lots of structures using seek values, which are longs, as done in
http://nucular.sourceforge.net and http://bplusdotnet.sourceforge.net
and elsewhere. Someone please summarize.

-- Aaron Watters
===
http://www.xfeedme.com/nucular/pydis...=white%20trash

 
Reply With Quote
 
 
 
 
Hrvoje Niksic
Guest
Posts: n/a
 
      11-14-2007
Aaron Watters <(E-Mail Removed)> writes:

> On Nov 12, 12:46 pm, "Michael Bacarella" <(E-Mail Removed)> wrote:
>>
>> > It takes about 20 seconds for me. It's possible it's related to
>> > int/long
>> > unification - try using Python 2.5. If you can't switch to 2.5, try
>> > using string keys instead of longs.

>>
>> Yes, this was it. It ran *very* fast on Python v2.5.

>
> Um. Is this the take away from this thread? Longs as dictionary
> keys are bad? Only for older versions of Python?


It sounds like Python 2.4 (and previous versions) had a bug when
populating large dicts on 64-bit architectures.

> Someone please summarize.


Yes, that would be good.
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      11-14-2007
On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote:

> Aaron Watters <(E-Mail Removed)> writes:
>
>> On Nov 12, 12:46 pm, "Michael Bacarella" <(E-Mail Removed)> wrote:
>>>
>>> > It takes about 20 seconds for me. It's possible it's related to
>>> > int/long
>>> > unification - try using Python 2.5. If you can't switch to 2.5, try
>>> > using string keys instead of longs.
>>>
>>> Yes, this was it. It ran *very* fast on Python v2.5.

>>
>> Um. Is this the take away from this thread? Longs as dictionary keys
>> are bad? Only for older versions of Python?

>
> It sounds like Python 2.4 (and previous versions) had a bug when
> populating large dicts on 64-bit architectures.


No, I found very similar behaviour with Python 2.5.


>> Someone please summarize.

>
> Yes, that would be good.



On systems with multiple CPUs or 64-bit systems, or both, creating and/or
deleting a multi-megabyte dictionary in recent versions of Python (2.3,
2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
compared to seconds if the system only has a single CPU. Turning garbage
collection off doesn't help.


--
Steven.
 
Reply With Quote
 
Aaron Watters
Guest
Posts: n/a
 
      11-15-2007
On Nov 14, 6:26 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> >> Someone please summarize.

>
> > Yes, that would be good.

>
> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU. Turning garbage
> collection off doesn't help.
>
> --
> Steven.


criminy... Any root cause? patch?

btw, I think I've seen this, but I think you need
to get into 10s of megs or more before it becomes
critical.

Note: I know someone will say "don't scare off the newbies"
but in my experience most Python programmers are highly
experienced professionals who need to know this sort of thing.
The bulk of the newbies are either off in VB land
or struggling with java.

-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...EXT=silly+walk
 
Reply With Quote
 
Aaron Watters
Guest
Posts: n/a
 
      11-15-2007
On Nov 14, 6:26 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU. Turning garbage
> collection off doesn't help.


Fwiw, Testing on a 2 cpu 64 bit machine with 1gb real memory I
consistently
run out of real memory before I see this effect, so I guess it kicks
in for dicts
that consume beyond that. That's better than I feared at any
rate...

-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...+nasty+windows
 
Reply With Quote
 
Chris Mellon
Guest
Posts: n/a
 
      11-15-2007
On Nov 14, 2007 5:26 PM, Steven D'Aprano
<(E-Mail Removed)> wrote:
> On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote:
>
> > Aaron Watters <(E-Mail Removed)> writes:
> >
> >> On Nov 12, 12:46 pm, "Michael Bacarella" <(E-Mail Removed)> wrote:
> >>>
> >>> > It takes about 20 seconds for me. It's possible it's related to
> >>> > int/long
> >>> > unification - try using Python 2.5. If you can't switch to 2.5, try
> >>> > using string keys instead of longs.
> >>>
> >>> Yes, this was it. It ran *very* fast on Python v2.5.
> >>
> >> Um. Is this the take away from this thread? Longs as dictionary keys
> >> are bad? Only for older versions of Python?

> >
> > It sounds like Python 2.4 (and previous versions) had a bug when
> > populating large dicts on 64-bit architectures.

>
> No, I found very similar behaviour with Python 2.5.
>
>
> >> Someone please summarize.

> >
> > Yes, that would be good.

>
>
> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU. Turning garbage
> collection off doesn't help.
>
>


I can't duplicate this in a dual CPU (64 bit, but running in 32 bit
mode with a 32 bit OS) system. I added keys to a dict until I ran out
of memory (a bit over 22 million keys) and deleting the dict took
about 8 seconds (with a stopwatch, so not very precise, but obviously
less than 30 minutes).

>>> d = {}
>>> idx = 0
>>> while idx < 1e10:

.... d[idx] = idx
.... idx += 1
....
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
MemoryError
>>> len(d)

22369622
>>> del d

 
Reply With Quote
 
Istvan Albert
Guest
Posts: n/a
 
      11-15-2007
On Nov 14, 6:26 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU.


Please don't propagate this nonsense. If you see this then the problem
exists between the chair and monitor.

There is nothing wrong with neither creating nor deleting
dictionaries.

i.
 
Reply With Quote
 
Aaron Watters
Guest
Posts: n/a
 
      11-15-2007
On Nov 15, 2:11 pm, Istvan Albert <(E-Mail Removed)> wrote:
> There is nothing wrong with neither creating nor deleting
> dictionaries.


I suspect what happened is this: on 64 bit
machines the data structures for creating dictionaries
are larger (because pointers take twice as much space),
so you run into memory contention issues sooner than
on 32 bit machines, for similar memory sizes.
If there is something deeper going
on please correct me, I would very much like to know.

-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...T=alien+friend
 
Reply With Quote
 
Hrvoje Niksic
Guest
Posts: n/a
 
      11-15-2007
Steven D'Aprano <(E-Mail Removed)> writes:

>>> Someone please summarize.

>>
>> Yes, that would be good.

>
> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU.


Can you post minimal code that exhibits this behavior on Python 2.5.1?
The OP posted a lot of different versions, most of which worked just
fine for most people.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Populating a dictionary, fast Michael Bacarella Python 16 11-21-2007 08:17 PM
Re: Populating a dictionary, fast Michael Bacarella Python 2 11-12-2007 03:41 PM
Re: Populating a dictionary, fast Michael Bacarella Python 6 11-12-2007 10:37 AM
Re: Populating a dictionary, fast Michael Bacarella Python 3 11-11-2007 10:23 PM
Populating a dictionary, fast Michael Bacarella Python 4 11-11-2007 04:11 PM



Advertisments