Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Segmenting a pickle stream without unpickling

Reply
Thread Tools

Segmenting a pickle stream without unpickling

 
 
Boris Borcic
Guest
Posts: n/a
 
      05-19-2006
Assuming that the items of my_stream share no content (they are
dumps of db cursor fetches), is there a simple way to do the
equivalent of

def pickles(my_stream) :
from cPickle import load,dumps
while 1 :
yield dumps(load(my_stream))

without the overhead associated with unpickling objects
just to pickle them again ?

TIA, Boris Borcic
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      05-19-2006
Boris Borcic <(E-Mail Removed)> writes:
> def pickles(my_stream) :
> from cPickle import load,dumps
> while 1 :
> yield dumps(load(my_stream))
>
> without the overhead associated with unpickling objects
> just to pickle them again ?


I think you'd have to write something special. The unpickler parses
as it goes along, and all the dispatch actions build up objects.
You'd have to write a set of actions that just read past the
representations. I think there's no way to know where an object ends
without parsing it, including parsing any objects nested inside it.

 
Reply With Quote
 
 
 
 
Tim Peters
Guest
Posts: n/a
 
      05-19-2006
[Boris Borcic]
> Assuming that the items of my_stream share no content (they are
> dumps of db cursor fetches), is there a simple way to do the
> equivalent of
>
> def pickles(my_stream) :
> from cPickle import load,dumps
> while 1 :
> yield dumps(load(my_stream))
>
> without the overhead associated with unpickling objects
> just to pickle them again ?


cPickle (but not pickle.py) Unpickler objects have a barely documented
noload() method. This "acts like" load(), except doesn't import
modules or construct objects of user-defined classes. The return
value of noload() is undocumented and usually useless. ZODB uses it a
lot

Anyway, that can go much faster than load(), and works even if the
classes and modules referenced by pickles aren't available in the
unpickling environment. It doesn't return the individual pickle
strings, but they're easy to get at by paying attention to the file
position between noload() calls. For example,

import cPickle as pickle
import os

# Build a pickle file with 4 pickles.

PICKLEFILE = "temp.pck"

class C:
pass

f = open(PICKLEFILE, "wb")
p = pickle.Pickler(f, 1)

p.dump(2)
p.dump([3, 4])
p.dump(C())
p.dump("all done")

f.close()

# Now use noload() to extract the 4 pickle
# strings in that file.

f = open(PICKLEFILE, "rb")
limit = os.path.getsize(PICKLEFILE)
u = pickle.Unpickler(f)
pickles = []
pos = 0
while pos < limit:
u.noload()
thispos = f.tell()
f.seek(pos)
pickles.append(f.read(thispos - pos))
pos = thispos

from pprint import pprint
pprint(pickles)

That prints a list containing the 4 pickle strings:

['K\x02.',
']q\x01(K\x03K\x04e.',
'(c__main__\nC\nq\x02o}q\x03b.',
'U\x08all doneq\x04.']

You could do much the same by calling pickletools.dis() and ignoring
its output, but that's likely to be slower.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Exception classes don't follow pickle protocol, problems unpickling Irmen de Jong Python 3 01-26-2011 11:14 AM
unpickling a stream msolem@linuxmail.org Python 1 05-28-2009 05:34 AM
ValueError in pickle module during unpickling a infinite float(python 2.5.2) rehn@iwm.mw.tu-dresden.de Python 2 03-12-2008 06:16 PM
segmenting/splitting 2900XL? Henry Yen Cisco 4 06-20-2004 09:18 AM
Segmenting 2900xl? Henry Yen Cisco 0 06-15-2004 05:51 AM



Advertisments