Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   RE: Populating a dictionary, fast [SOLVED SOLVED] (http://www.velocityreviews.com/forums/t551346-re-populating-a-dictionary-fast-solved-solved.html)

Michael Bacarella 11-12-2007 05:46 PM

RE: Populating a dictionary, fast [SOLVED SOLVED]
 
> > You can download the list of keys from here, it's 43M gzipped:
> > http://www.sendspace.com/file/9530i7
> >
> > and see it take about 45 minutes with this:
> >
> > $ cat cache-keys.py
> > #!/usr/bin/python
> > v = {}
> > for line in open('keys.txt'):
> > v[long(line.strip())] = True
> >
> >

> It takes about 20 seconds for me. It's possible it's related to
> int/long
> unification - try using Python 2.5. If you can't switch to 2.5, try
> using string keys instead of longs.


Yes, this was it. It ran *very* fast on Python v2.5.

Terribly on v2.4, v2.3.

(I thought I had already evaluated v2.5 but I see now that the server
With 2.5 on it invokes 2.3 for 'python'.)

Thanks!


Aaron Watters 11-14-2007 02:43 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
On Nov 12, 12:46 pm, "Michael Bacarella" <m...@gpshopper.com> wrote:
>
> > It takes about 20 seconds for me. It's possible it's related to
> > int/long
> > unification - try using Python 2.5. If you can't switch to 2.5, try
> > using string keys instead of longs.

>
> Yes, this was it. It ran *very* fast on Python v2.5.


Um. Is this the take away from this thread? Longs as dictionary
keys are bad? Only for older versions of Python?

This could be a problem for people like me who build
lots of structures using seek values, which are longs, as done in
http://nucular.sourceforge.net and http://bplusdotnet.sourceforge.net
and elsewhere. Someone please summarize.

-- Aaron Watters
===
http://www.xfeedme.com/nucular/pydis...=white%20trash


Hrvoje Niksic 11-14-2007 05:16 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
Aaron Watters <aaron.watters@gmail.com> writes:

> On Nov 12, 12:46 pm, "Michael Bacarella" <m...@gpshopper.com> wrote:
>>
>> > It takes about 20 seconds for me. It's possible it's related to
>> > int/long
>> > unification - try using Python 2.5. If you can't switch to 2.5, try
>> > using string keys instead of longs.

>>
>> Yes, this was it. It ran *very* fast on Python v2.5.

>
> Um. Is this the take away from this thread? Longs as dictionary
> keys are bad? Only for older versions of Python?


It sounds like Python 2.4 (and previous versions) had a bug when
populating large dicts on 64-bit architectures.

> Someone please summarize.


Yes, that would be good.

Steven D'Aprano 11-14-2007 11:26 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote:

> Aaron Watters <aaron.watters@gmail.com> writes:
>
>> On Nov 12, 12:46 pm, "Michael Bacarella" <m...@gpshopper.com> wrote:
>>>
>>> > It takes about 20 seconds for me. It's possible it's related to
>>> > int/long
>>> > unification - try using Python 2.5. If you can't switch to 2.5, try
>>> > using string keys instead of longs.
>>>
>>> Yes, this was it. It ran *very* fast on Python v2.5.

>>
>> Um. Is this the take away from this thread? Longs as dictionary keys
>> are bad? Only for older versions of Python?

>
> It sounds like Python 2.4 (and previous versions) had a bug when
> populating large dicts on 64-bit architectures.


No, I found very similar behaviour with Python 2.5.


>> Someone please summarize.

>
> Yes, that would be good.



On systems with multiple CPUs or 64-bit systems, or both, creating and/or
deleting a multi-megabyte dictionary in recent versions of Python (2.3,
2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
compared to seconds if the system only has a single CPU. Turning garbage
collection off doesn't help.


--
Steven.

Aaron Watters 11-15-2007 02:40 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
On Nov 14, 6:26 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> >> Someone please summarize.

>
> > Yes, that would be good.

>
> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU. Turning garbage
> collection off doesn't help.
>
> --
> Steven.


criminy... Any root cause? patch?

btw, I think I've seen this, but I think you need
to get into 10s of megs or more before it becomes
critical.

Note: I know someone will say "don't scare off the newbies"
but in my experience most Python programmers are highly
experienced professionals who need to know this sort of thing.
The bulk of the newbies are either off in VB land
or struggling with java.

-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...EXT=silly+walk

Aaron Watters 11-15-2007 04:40 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
On Nov 14, 6:26 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU. Turning garbage
> collection off doesn't help.


Fwiw, Testing on a 2 cpu 64 bit machine with 1gb real memory I
consistently
run out of real memory before I see this effect, so I guess it kicks
in for dicts
that consume beyond that. That's better than I feared at any
rate...

-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...+nasty+windows

Chris Mellon 11-15-2007 04:51 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
On Nov 14, 2007 5:26 PM, Steven D'Aprano
<steve@remove-this-cybersource.com.au> wrote:
> On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote:
>
> > Aaron Watters <aaron.watters@gmail.com> writes:
> >
> >> On Nov 12, 12:46 pm, "Michael Bacarella" <m...@gpshopper.com> wrote:
> >>>
> >>> > It takes about 20 seconds for me. It's possible it's related to
> >>> > int/long
> >>> > unification - try using Python 2.5. If you can't switch to 2.5, try
> >>> > using string keys instead of longs.
> >>>
> >>> Yes, this was it. It ran *very* fast on Python v2.5.
> >>
> >> Um. Is this the take away from this thread? Longs as dictionary keys
> >> are bad? Only for older versions of Python?

> >
> > It sounds like Python 2.4 (and previous versions) had a bug when
> > populating large dicts on 64-bit architectures.

>
> No, I found very similar behaviour with Python 2.5.
>
>
> >> Someone please summarize.

> >
> > Yes, that would be good.

>
>
> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU. Turning garbage
> collection off doesn't help.
>
>


I can't duplicate this in a dual CPU (64 bit, but running in 32 bit
mode with a 32 bit OS) system. I added keys to a dict until I ran out
of memory (a bit over 22 million keys) and deleting the dict took
about 8 seconds (with a stopwatch, so not very precise, but obviously
less than 30 minutes).

>>> d = {}
>>> idx = 0
>>> while idx < 1e10:

.... d[idx] = idx
.... idx += 1
....
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
MemoryError
>>> len(d)

22369622
>>> del d


Istvan Albert 11-15-2007 07:11 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
On Nov 14, 6:26 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU.


Please don't propagate this nonsense. If you see this then the problem
exists between the chair and monitor.

There is nothing wrong with neither creating nor deleting
dictionaries.

i.

Aaron Watters 11-15-2007 07:53 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
On Nov 15, 2:11 pm, Istvan Albert <istvan.alb...@gmail.com> wrote:
> There is nothing wrong with neither creating nor deleting
> dictionaries.


I suspect what happened is this: on 64 bit
machines the data structures for creating dictionaries
are larger (because pointers take twice as much space),
so you run into memory contention issues sooner than
on 32 bit machines, for similar memory sizes.
If there is something deeper going
on please correct me, I would very much like to know.

-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...T=alien+friend

Hrvoje Niksic 11-15-2007 08:51 PM

Re: Populating a dictionary, fast [SOLVED SOLVED]
 
Steven D'Aprano <steve@REMOVE-THIS-cybersource.com.au> writes:

>>> Someone please summarize.

>>
>> Yes, that would be good.

>
> On systems with multiple CPUs or 64-bit systems, or both, creating and/or
> deleting a multi-megabyte dictionary in recent versions of Python (2.3,
> 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,
> compared to seconds if the system only has a single CPU.


Can you post minimal code that exhibits this behavior on Python 2.5.1?
The OP posted a lot of different versions, most of which worked just
fine for most people.


All times are GMT. The time now is 12:43 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57