Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > marshal.dumps quadratic growth and marshal.dump not allowingfile-like objects

Reply
Thread Tools

marshal.dumps quadratic growth and marshal.dump not allowingfile-like objects

 
 
bkustel@gmail.com
Guest
Posts: n/a
 
      06-15-2008
I'm stuck on a problem where I want to use marshal for serialization
(yes, yes, I know (c)Pickle is normally recommended here). I favor
marshal for speed for the types of data I use.

However it seems that marshal.dumps() for large objects has a
quadratic performance issue which I'm assuming is that it grows its
memory buffer in constant increments. This causes a nasty slowdown for
marshaling large objects. I thought I would get around this by passing
a cStringIO.StringIO object to marshal.dump() instead but I quickly
learned this is not supported (only true file objects are supported).

Any ideas about how to get around the marshal quadratic issue? Any
hope for a fix for that on the horizon? Thanks for any information.
 
Reply With Quote
 
 
 
 
TheSaint
Guest
Posts: n/a
 
      06-15-2008
On 16:04, domenica 15 giugno 2008 wrote:

> cStringIO.StringIO object to marshal.dump() instead but I quickly
> learned this is not supported (only true file objects are supported).
>
> Any ideas about how to get around the marshal quadratic issue? Any
> hope for a fix for that on the horizon?

If you zip the cStringIO.StringIO object, would it be possible?

--
Mailsweeper Home : http://it.geocities.com/call_me_not_now/index.html
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      06-15-2008
wrote:

> I'm stuck on a problem where I want to use marshal for serialization
> (yes, yes, I know (c)Pickle is normally recommended here). I favor
> marshal for speed for the types of data I use.
>
> However it seems that marshal.dumps() for large objects has a
> quadratic performance issue which I'm assuming is that it grows its
> memory buffer in constant increments. This causes a nasty slowdown for
> marshaling large objects. I thought I would get around this by passing
> a cStringIO.StringIO object to marshal.dump() instead but I quickly
> learned this is not supported (only true file objects are supported).
>
> Any ideas about how to get around the marshal quadratic issue? Any
> hope for a fix for that on the horizon? Thanks for any information.


Here's how marshal resizes the string:

newsize = size + size + 1024;
if (newsize > 32*1024*1024) {
newsize = size + 1024*1024;
}

Maybe you can split your large objects and marshal multiple objects to keep
the size below the 32MB limit.

Peter
 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      06-15-2008
On Jun 15, 1:04*am, bkus...@gmail.com wrote:
> However it seems that marshal.dumps() for large objects has a
> quadratic performance issue which I'm assuming is that it grows its
> memory buffer in constant increments.


Looking at the source in http://svn.python.org/projects/pytho...thon/marshal.c
, it looks like the relevant fragment is in w_more():

. . .
size = PyString_Size(p->str);
newsize = size + size + 1024;
if (newsize > 32*1024*1024) {
newsize = size + 1024*1024;
}
if (_PyString_Resize(&p->str, newsize) != 0) {
. . .

When more space is needed, the resize operation over-allocates by
double the previous need plus 1K. This should give amortized O(1)
performance just like list.append().

However, when that strategy requests more than 32Mb, the resizing
becomes less aggressive and grows only in 1MB blocks and giving your
observed nasty quadratic behavior.

Raymond
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      06-15-2008
On Jun 15, 7:47 pm, Peter Otten <__pete...@web.de> wrote:
> bkus...@gmail.com wrote:
> > I'm stuck on a problem where I want to use marshal for serialization
> > (yes, yes, I know (c)Pickle is normally recommended here). I favor
> > marshal for speed for the types of data I use.

>
> > However it seems that marshal.dumps() for large objects has a
> > quadratic performance issue which I'm assuming is that it grows its
> > memory buffer in constant increments. This causes a nasty slowdown for
> > marshaling large objects. I thought I would get around this by passing
> > a cStringIO.StringIO object to marshal.dump() instead but I quickly
> > learned this is not supported (only true file objects are supported).

>
> > Any ideas about how to get around the marshal quadratic issue? Any
> > hope for a fix for that on the horizon? Thanks for any information.

>
> Here's how marshal resizes the string:
>
> newsize = size + size + 1024;
> if (newsize > 32*1024*1024) {
> newsize = size + 1024*1024;
> }
>
> Maybe you can split your large objects and marshal multiple objects to keep
> the size below the 32MB limit.
>


But that change went into the svn trunk on 11-May-2008; perhaps the OP
is using a production release which would have the previous version,
which is merely "newsize = size + 1024;".

Do people really generate 32MB pyc files, or is stopping doubling at
32MB just a safety valve in case someone/something runs amok?

Cheers,
John
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      06-15-2008
John Machin wrote:

>> Here's how marshal resizes the string:
>>
>> newsize = size + size + 1024;
>> if (newsize > 32*1024*1024) {
>> newsize = size + 1024*1024;
>> }
>>
>> Maybe you can split your large objects and marshal multiple objects to
>> keep the size below the 32MB limit.
>>

>
> But that change went into the svn trunk on 11-May-2008; perhaps the OP
> is using a production release which would have the previous version,
> which is merely "newsize = size + 1024;".


That is indeed much worse. Depending on what the OP means by "large objects"
the problem may be fixed in subversion then.

> Do people really generate 32MB pyc files, or is stopping doubling at
> 32MB just a safety valve in case someone/something runs amok?


A 32MB pyc would correspond to a module of roughly the same size. So
someone/something runs amok in either case.

Peter
 
Reply With Quote
 
Christian Heimes
Guest
Posts: n/a
 
      06-15-2008
Raymond Hettinger wrote:
> When more space is needed, the resize operation over-allocates by
> double the previous need plus 1K. This should give amortized O(1)
> performance just like list.append().
>
> However, when that strategy requests more than 32Mb, the resizing
> becomes less aggressive and grows only in 1MB blocks and giving your
> observed nasty quadratic behavior.


The marshal code has been revamped in Python 2.6. The old code in Python
2.5 uses a linear growth strategy:

size = PyString_Size(p->str);
newsize = size + 1024;
if (_PyString_Resize(&p->str, newsize) != 0) {
p->ptr = p->end = NULL;
}

Anyway marshal should not be used by user code to serialize objects.
It's only meant for Python byte code. Please use the pickle/cPickle
module instead.

Christian

 
Reply With Quote
 
bkustel@gmail.com
Guest
Posts: n/a
 
      06-15-2008
On Jun 15, 3:16*am, John Machin <sjmac...@lexicon.net> wrote:
> But that change went into the svn trunk on 11-May-2008; perhaps the OP
> is using a production release which would have the previous version,
> which is merely "newsize = size + 1024;".
>
> Do people really generate 32MB pyc files, or is stopping doubling at
> 32MB just a safety valve in case someone/something runs amok?


Indeed. I (the OP) am using a production release which has the 1k
linear growth.
I am seeing the problems with ~5MB and ~10MB sizes.
Apparently this will be improved greatly in Python 2.6, at least up to
the 32MB limit.

Thanks all for responding.
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      06-15-2008
On Jun 16, 1:08 am, bkus...@gmail.com wrote:
> On Jun 15, 3:16 am, John Machin <sjmac...@lexicon.net> wrote:
>
> > But that change went into the svn trunk on 11-May-2008; perhaps the OP
> > is using a production release which would have the previous version,
> > which is merely "newsize = size + 1024;".

>
> > Do people really generate 32MB pyc files, or is stopping doubling at
> > 32MB just a safety valve in case someone/something runs amok?

>
> Indeed. I (the OP) am using a production release which has the 1k
> linear growth.
> I am seeing the problems with ~5MB and ~10MB sizes.
> Apparently this will be improved greatly in Python 2.6, at least up to
> the 32MB limit.


Apparently you intend to resist good advice and persist [accidental
pun!] with marshal -- how much slower is cPickle for various sizes of
data? What kinds of objects are you persisting?

 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      06-16-2008
On Jun 15, 8:08*am, bkus...@gmail.com wrote:
> Indeed. I (the OP) am using a production release which has the 1k
> linear growth.
> I am seeing the problems with ~5MB and ~10MB sizes.
> Apparently this will be improved greatly in Python 2.6, at least up to
> the 32MB limit.


I've just fixed this for Py2.5.3 and Py2.6. No more quadratic
behavior.


Raymond
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Quadratic curve fitting! Vinodh Kumar C++ 6 12-01-2010 12:30 PM
Quadratic Optimization Problem amitsoni.1984@gmail.com Python 3 11-22-2006 10:45 PM
Multiple Polynomial Quadratic Sieve Philip Smith Python 2 05-30-2006 10:29 PM
Quadratic Solutions in C++ Sunny Computer Support 12 02-26-2006 11:14 AM
Quadratic Formula Woes fb C Programming 6 09-01-2004 08:38 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57