Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > RE: Populating a dictionary, fast [SOLVED SOLVED]

Reply
Thread Tools

RE: Populating a dictionary, fast [SOLVED SOLVED]

 
 
Hrvoje Niksic
Guest
Posts: n/a
 
      11-16-2007
Steven D'Aprano <(E-Mail Removed)> writes:

>> Can you post minimal code that exhibits this behavior on Python 2.5.1?
>> The OP posted a lot of different versions, most of which worked just
>> fine for most people.

>
> Who were testing it on single-CPU, 32 bit systems.


Still, I'd like to see a test case that fails (works slowly) for you,
so that I (and others) can try it on different machines. As I said,
the OP posted several versions of his code, and tended to depend on a
specific dataset. A minimal test case would help more people debug
it.
 
Reply With Quote
 
 
 
 
Michael Bacarella
Guest
Posts: n/a
 
      11-16-2007
> Do you really believe that you cannot create or delete a large
> dictionary with python versions less than 2.5 (on a 64 bit or multi-
> cpu system)? That a bug of this magnitude has not been noticed until
> someone posted on clp?


You're right, it is completely inappropriate for us to be showing our
dirty laundry to the public. From now we will try to deal with our troubles
privately.

I am so sorry to have ruined the decorum.

 
Reply With Quote
 
 
 
 
Istvan Albert
Guest
Posts: n/a
 
      11-16-2007
On Nov 16, 1:18 pm, "Michael Bacarella" <(E-Mail Removed)> wrote:

> You're right, it is completely inappropriate for us to be showing our
> dirty laundry to the public.


you are misinterpreting my words on many levels,

(and I of course could have refrained from the "chair-monitor" jab as
well)

anyhow, it is what it is, I could not reproduce any of the weird
behaviors myself, I got nothing more to add to this discussion
 
Reply With Quote
 
Jeffrey Froman
Guest
Posts: n/a
 
      11-16-2007
Steven D'Aprano wrote:

> Can you try it running in 64-bit mode?


Here are my results using the following test.py:
$ cat test.py
#!/usr/bin/python
import time
print "Starting: %s" % time.ctime()
v = {}
for line in open('keys.txt'):
v[long(line.strip())] = True
print "Finished: %s" % time.ctime()


32-bit architecture:
-----------------------------------------
[machine1]$ python2.3 test.py
Starting: Fri Nov 16 11:51:22 2007
Finished: Fri Nov 16 11:52:39 2007

[machine2]$ python2.5 test.py
Starting: Fri Nov 16 11:57:57 2007
Finished: Fri Nov 16 11:58:39 2007


64-bit architecture (64-bit mode):
-----------------------------------------
[machine3]$ python2.3 test.py
Starting: Fri Nov 16 11:51:44 2007
Finished: Fri Nov 16 12:31:54 2007

[machine3]$ python2.5 test.py
Starting: Fri Nov 16 11:50:03 2007
Finished: Fri Nov 16 11:50:31 2007


Jeffrey
Jeffrey
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      11-16-2007
On Fri, 16 Nov 2007 11:24:24 +0100, Hrvoje Niksic wrote:

> Steven D'Aprano <(E-Mail Removed)> writes:
>
>>> Can you post minimal code that exhibits this behavior on Python 2.5.1?
>>> The OP posted a lot of different versions, most of which worked just
>>> fine for most people.

>>
>> Who were testing it on single-CPU, 32 bit systems.

>
> Still, I'd like to see a test case that fails (works slowly) for you, so
> that I (and others) can try it on different machines. As I said, the OP
> posted several versions of his code, and tended to depend on a specific
> dataset. A minimal test case would help more people debug it.



http://groups.google.com.au/group/co...3ceaf01db10a86


#!/usr/bin/python
"""Read a big file into a dict."""

import gc
import time
print "Starting at %s" % time.asctime()
flag = gc.isenabled()
gc.disable()
id2name = {}
for n, line in enumerate(open('id2name.txt', 'r')):
if n % 1000000 == 0:
# Give feedback.
print "Line %d" % n
id,name = line.strip().split(':', 1)
id = long(id)
id2name[id] = name
print "Items in dict:", len(id2name)
print "Completed import at %s" % time.asctime()
print "Starting to delete dict..."
del id2name
print "Completed deletion at %s" % time.asctime()
if flag:
gc.enable()
print "Finishing at %s" % time.asctime()


I've since tried variants where the dict keys were kept as strings, and
it made no difference to the speed, and where the data was kept as a list
of (key, value) tuples, which made a HUGE difference.

You should also read this post here:
http://groups.google.com.au/group/co...35dc213bc45f84


showing Perl running very fast on the same machine that Python was
running like a one-legged sloth.



--
Steven.
 
Reply With Quote
 
Hendrik van Rooyen
Guest
Posts: n/a
 
      11-17-2007

"Michael Bacarella" <mb...opper.com>wrote:

> I am so sorry to have ruined the decorum.


Oh dear!

I confidently predict that this thread will now
degenerate to include such things as dignitas
and gravitas.

- Hendrik

 
Reply With Quote
 
harri
Guest
Posts: n/a
 
      11-20-2007
On Nov 15, 9:51 pm, "Michael Bacarella" <(E-Mail Removed)> wrote:
>
> Since some people missed the EUREKA!, here's the executive summary:
>
> Python2.3: about 45 minutes
> Python2.4: about 45 minutes
> Python2.5: about _30 seconds_



FYI, I tried on two 64 bit SMP machines (4 way and 2 way), running
Mandriva 2007 and 2008


2.6.17-6mdv #1 SMP Wed Oct 25 12:17:57 MDT 2006 x86_64 AMD Opteron(tm)
Processor 280 GNU/Linux
Python 2.4.3 (#2, May 7 2007, 15:15:17)
[GCC 4.1.1 20060724 (prerelease) (4.1.1-3mdk)] on linux2


2.6.23.1-1mdvsmp #1 SMP Sat Oct 20 18:04:52 EDT 2007 x86_64 Intel(R)
Core(TM)2 CPU 6400 @ 2.13GHz GNU/Linux
Python 2.5.1 (r251:54863, Sep 13 2007, 09:02:56)
[GCC 4.2.1 20070828 (prerelease) (4.2.1-6mdv2008.0)] on linux2


I could not reproduce the problem with Steven's test file id2name.txt
on either.
sendspace seems to be over capacity, so I haven't been able download
your
keys.txt.

I wonder if this might not be a glibc issue?
If you own built python 2.5 linked with the same glibc version as the
system python?

What does
ldd /usr/bin/python
say vs.
ldd /usr/local/bin/python

?

Harri



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Populating a dictionary, fast Michael Bacarella Python 16 11-21-2007 08:17 PM
Re: Populating a dictionary, fast Michael Bacarella Python 2 11-12-2007 03:41 PM
Re: Populating a dictionary, fast Michael Bacarella Python 6 11-12-2007 10:37 AM
Re: Populating a dictionary, fast Michael Bacarella Python 3 11-11-2007 10:23 PM
Populating a dictionary, fast Michael Bacarella Python 4 11-11-2007 04:11 PM



Advertisments