Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > cPickle alternative?

Reply
Thread Tools

cPickle alternative?

 
 
Drochom
Guest
Posts: n/a
 
      08-15-2003
Hello,
I have a huge problem with loading very simple structure into memory
it is a list of tuples, it has 6MB and consists of 100000 elements

>import cPickle


>plik = open("mealy","r")
>mealy = cPickle.load(plik)
>plik.close()


this takes about 30 seconds!
How can I accelerate it?

Thanks in adv.



 
Reply With Quote
 
 
 
 
Alex Martelli
Guest
Posts: n/a
 
      08-15-2003
Drochom wrote:

> Hello,
> I have a huge problem with loading very simple structure into memory
> it is a list of tuples, it has 6MB and consists of 100000 elements
>
>>import cPickle

>
>>plik = open("mealy","r")
>>mealy = cPickle.load(plik)
>>plik.close()

>
> this takes about 30 seconds!
> How can I accelerate it?
>
> Thanks in adv.


What protocol did you pickle your data with? The default (protocol 0,
ASCII text) is the slowest. I suggest you upgrade to Python 2.3 and
save your data with the new protocol 2 -- it's likely to be fastest.


Alex

 
Reply With Quote
 
 
 
 
Michael Peuser
Guest
Posts: n/a
 
      08-15-2003
Hi,

I have no idea! I used a similar scheme the other day and made some
benchmarks (I *like* benchmarks!)

About 6 MB took 4 seconds dumping as well as loading on a 800 Mhz P3 Laptop.
When using binary mode it went down to about 1.5 seconds (And space to 2 MB)

THis is o.k., because I generally have problems beeing faster than 1 MB/sec
with my 2" drive, processor and Python

Python 2.3 seems to have even a more effective "protocoll mode 2".

May be your structures are *very* complex??

Kindly
Michael P



"Drochom" <(E-Mail Removed)> schrieb im Newsbeitrag
news:bhiqlg$9qj$(E-Mail Removed)...
> Hello,
> I have a huge problem with loading very simple structure into memory
> it is a list of tuples, it has 6MB and consists of 100000 elements
>
> >import cPickle

>
> >plik = open("mealy","r")
> >mealy = cPickle.load(plik)
> >plik.close()

>
> this takes about 30 seconds!
> How can I accelerate it?
>
> Thanks in adv.
>
>
>



 
Reply With Quote
 
Irmen de Jong
Guest
Posts: n/a
 
      08-15-2003
Drochom wrote:
>>What protocol did you pickle your data with? The default (protocol 0,
>>ASCII text) is the slowest. I suggest you upgrade to Python 2.3 and
>>save your data with the new protocol 2 -- it's likely to be fastest.
>>
>>
>>Alex
>>

>
> Thanks
> i'm using default protocol, i'm not sure if i can upgrade so simply, because
> i'm using many modules for Py2.2


Then use protocol 1 instead -- that has been the binary pickle protocol
for a long time, and works perfectly on Python 2.2.x
(and it's much faster than protocol 0 -- the text protocol)

--Irmen

 
Reply With Quote
 
Scott David Daniels
Guest
Posts: n/a
 
      08-15-2003
Drochom wrote:
> Thanks for help
> Here is simple example:
> frankly speaking it's a graph with 100000 nodes:
> STRUCTURE:
> [(('k', 5, 0),), (('*', 0, 0),), (('t', 1, 1),), (('o', 2, 0),), (('t', 3,
> 0),), (('a', 4, 0), ('o', 2, 0))]


Perhaps this matches your spec:

from random import randrange
import pickle, cPickle, time

source = [(chr(randrange(33, 127)), randrange(100000), randrange(i+50))
for i in range(100000)]


def timed(module, flag, name='file.tmp'):
start = time.time()
dest = file(name, 'wb')
module.dump(source, dest, flag)
dest.close()
mid = time.time()
dest = file(name, 'rb')
result = module.load(dest)
dest.close()
stop = time.time()
assert source == result
return mid-start, stop-mid

On 2.2:
timed(pickle, 0): (7.8, 5.5)
timed(pickle, 1): (9.5, 6.2)
timed(cPickle, 0): (0.41, 4.9)
timed(cPickle, 1): (0.15, .53)

On 2.3:
timed(pickle, 0): (6.2, 5.3)
timed(pickle, 1): (6.6, 5.4)
timed(pickle, 2): (6.5, 3.9)

timed(cPickle, 0): (6.2, 5.3)
timed(pickle, 1): (.88, .69)
timed(pickle, 2): (.80, .67)

(Not tightly controlled -- I'd gues 1.5 digits)

-Scott David Daniels
http://www.velocityreviews.com/forums/(E-Mail Removed)

 
Reply With Quote
 
Drochom
Guest
Posts: n/a
 
      08-15-2003

"Michael Peuser" <(E-Mail Removed)> wrote in message
news:bhj56t$1d8$03$(E-Mail Removed)-online.com...
> o.k - I modified my testprogram - let it run at your machine.
> It took 1.5 seconds - I made it 11 Million records to get to 2 Mbyte.
> Kindly
> Michael
> ------------------
> import cPickle as Pickle
> from time import clock
>
> # generate 1.000.000 records
> r=[(('k', 5, 0),), (('*', 0, 0),), (('t', 1, 1),), (('o', 2, 0),), (('t',

3,
> 0),), (('a', 4, 0), ('o', 2, 0))]
>
> x=[]
>
> for i in xrange(1000000):
> x.append(r)
>
> print len(x), "records"
>
> t0=clock()
> f=open ("test","w")
> Pickle.dump(x,f,1)
> f.close()
> print "out=", clock()-t0
>
> t0=clock()
> f=open ("test")
> x=Pickle.load(f)
> f.close()
> print "in=", clock()-t0
> ---------------------


Hi, i'm really grateful for your help,
i've modyfied your code a bit, check your times and tell me what are they

TRY THIS:

import cPickle as Pickle
from time import clock
from random import randrange


x=[]

for i in xrange(20000):
c = []
for j in xrange(randrange(2,25)):
c.append((chr(randrange(33,120)),randrange(1,10000 0),randrange(1,3)))
c = tuple(c)
x.append(c)
if i%1000==0: print i #it will help you to survive waiting...
print len(x), "records"

t0=clock()
f=open ("test","w")
Pickle.dump(x,f,0)
f.close()
print "out=", clock()-t0


t0=clock()
f=open ("test")
x=Pickle.load(f)
f.close()
print "in=", clock()-t0

Thanks once again



 
Reply With Quote
 
Drochom
Guest
Posts: n/a
 
      08-15-2003
Hello,

> If speed is important, you may want to do different things depending on

e.g.,
> what is in those tuples, and whether they are all the same length, etc.

E.g.,
> if they were all fixed length tuples of integers, you could do hugely

better
> than store the data as a list of tuples.

Those tuples have different length indeed.

> You could store the whole thing in a mmap image, with a length-prefixed

pickle
> string in the front representing index info.

If i only knew how do to it...

> Find a way to avoid doing it? Or doing much of it?
> What are your access needs once the data is accessible?

My structure stores a finite state automaton with polish dictionary (lexicon
to be more precise) and it should be loaded
once but fast!

Thx
Regards,
Przemo Drochomirecki



 
Reply With Quote
 
Drochom
Guest
Posts: n/a
 
      08-15-2003
I forgot to explain you why i use tuples instead of lists
i was squeezing a lexicon => minimalization of automaton => using a
dictionary => using hashable objects =>using tuples(lists aren't hashable)


Regards,
Przemo Drochomirecki


 
Reply With Quote
 
Drochom
Guest
Posts: n/a
 
      08-15-2003


> Perhaps this matches your spec:
>
> from random import randrange
> import pickle, cPickle, time
>
> source = [(chr(randrange(33, 127)), randrange(100000), randrange(i+50))
> for i in range(100000)]
>
>
> def timed(module, flag, name='file.tmp'):
> start = time.time()
> dest = file(name, 'wb')
> module.dump(source, dest, flag)
> dest.close()
> mid = time.time()
> dest = file(name, 'rb')
> result = module.load(dest)
> dest.close()
> stop = time.time()
> assert source == result
> return mid-start, stop-mid
>
> On 2.2:
> timed(pickle, 0): (7.8, 5.5)
> timed(pickle, 1): (9.5, 6.2)
> timed(cPickle, 0): (0.41, 4.9)
> timed(cPickle, 1): (0.15, .53)
>
> On 2.3:
> timed(pickle, 0): (6.2, 5.3)
> timed(pickle, 1): (6.6, 5.4)
> timed(pickle, 2): (6.5, 3.9)
>
> timed(cPickle, 0): (6.2, 5.3)
> timed(pickle, 1): (.88, .69)
> timed(pickle, 2): (.80, .67)
>
> (Not tightly controlled -- I'd gues 1.5 digits)
>
> -Scott David Daniels
> (E-Mail Removed)
>

Hello, and Thanks, your code was extremely helpful

Regards
Przemo Drochomirecki


 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      08-16-2003
On Sat, 16 Aug 2003 00:41:42 +0200, "Drochom" <(E-Mail Removed)> wrote:

>Hello,
>
>> If speed is important, you may want to do different things depending on

>e.g.,
>> what is in those tuples, and whether they are all the same length, etc.

>E.g.,
>> if they were all fixed length tuples of integers, you could do hugely

>better
>> than store the data as a list of tuples.

>Those tuples have different length indeed.
>
>> You could store the whole thing in a mmap image, with a length-prefixed

>pickle
>> string in the front representing index info.

>If i only knew how do to it...
>
>> Find a way to avoid doing it? Or doing much of it?
>> What are your access needs once the data is accessible?

>My structure stores a finite state automaton with polish dictionary (lexicon
>to be more precise) and it should be loaded
>once but fast!
>

I wonder how much space it would take to store the Polish complete language word
list with one entry each in a Python dictionary. 300k words of 6-7 characters avg?
Say 2MB plus the dict hash stuff. I bet it would be fast.

Is that in effect what you are doing, except sort of like a regex state machine
to match words character by character?

Regards,
Bengt Richter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem using pickle / cPickle Jesse Bloom Python 1 01-03-2004 05:25 AM
RE: cPickle from 2.2 to 2.1 Tim Peters Python 6 12-15-2003 06:14 PM
cPickle from 2.2 to 2.1 paul Python 0 12-14-2003 04:06 PM
Is there a difference between cPickle / pickle for dump? Guenter Walser Python 0 10-15-2003 07:38 AM
Jython: jythonc and cPickle Carsten Gips Python 0 09-09-2003 02:40 PM



Advertisments