Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Sequential Object Store

Reply
Thread Tools

Sequential Object Store

 
 
GZ
Guest
Posts: n/a
 
      08-07-2010
Hi All,

I need to store a large number of large objects to file and then
access them sequentially. I am talking about a few thousands of
objects and each with size of a few hundred kilobytes, and total file
size a few gigabytes. I tried shelve, but it is not good at
sequentially accessing the data. In essence, shelve.keys() takes
forever.

I am wondering if there is a module that can persist a stream of
objects without having to load everything into memory. (For this
reason, I think Pickle is out, too, because it needs everything to be
in memory.)

Thanks,
GZ
 
Reply With Quote
 
 
 
 
Alex Willmer
Guest
Posts: n/a
 
      08-07-2010
On Aug 7, 5:26*pm, GZ <(E-Mail Removed)> wrote:
> I am wondering if there is a module that can persist a stream of
> objects without having to load everything into memory. (For this
> reason, I think Pickle is out, too, because it needs everything to be
> in memory.)


From the pickle docs it looks like you could do something like:

try:
import cPickle as pickle
except ImportError
import pickle

file_obj = open('whatever', 'wb')
p = pickle.Pickler(file_obj)

for x in stream_of_objects:
p.dump(x)
p.memo.clear()

del p
file_obj.close()

then later

file_obj = open('whatever', 'rb')
p = pickle.Unpickler(file_obj)

while True:
try:
x = p.load()
do_something_with(x)
except EOFError:
break

Your loading loop could be wrapped in a generator function, so only
one object should be held in memory at once.
 
Reply With Quote
 
 
 
 
GZ
Guest
Posts: n/a
 
      08-09-2010
Hi Alex,

On Aug 7, 6:54*pm, Alex Willmer <(E-Mail Removed)> wrote:
> On Aug 7, 5:26*pm, GZ <(E-Mail Removed)> wrote:
>
> > I am wondering if there is a module that can persist a stream of
> > objects without having to load everything into memory. (For this
> > reason, I think Pickle is out, too, because it needs everything to be
> > in memory.)

>
> From the pickle docs it looks like you could do something like:
>
> try:
> * * import cPickle as pickle
> except ImportError
> * * import pickle
>
> file_obj = open('whatever', 'wb')
> p = pickle.Pickler(file_obj)
>
> for x in stream_of_objects:
> * * p.dump(x)
> * * p.memo.clear()
>
> del p
> file_obj.close()
>
> then later
>
> file_obj = open('whatever', 'rb')
> p = pickle.Unpickler(file_obj)
>
> while True:
> * * try:
> * * * * x = p.load()
> * * * * do_something_with(x)
> * * except EOFError:
> * * * * break
>
> Your loading loop could be wrapped in a generator function, so only
> one object should be held in memory at once.


This totally works!

Thanks!
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Object creation - Do we really need to create a parent for a derieved object - can't the base object just point to an already created base object jon wayne C++ 9 09-22-2005 02:06 AM
Sequential Circuits power up Reset john VHDL 7 07-01-2005 06:09 PM
Signals and variables, concurrent and sequential assignments Taras_96 VHDL 5 04-14-2005 03:07 AM
to store or not to store an image =?Utf-8?B?UnVkeQ==?= ASP .Net 6 03-30-2005 05:51 AM
Sequential Machines john VHDL 6 11-05-2004 05:24 AM



Advertisments