Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   dicts,instances,containers, slotted instances, et cetera. (http://www.velocityreviews.com/forums/t667703-dicts-instances-containers-slotted-instances-et-cetera.html)

ocschwar@gmail.com 01-28-2009 08:38 PM

dicts,instances,containers, slotted instances, et cetera.
 
Hi, all.

I have an application that that creates, manipulates, and finally
archives on disk 10^6 instances of an object that in CS/DB terms is
best described as a relation.

It has 8 members, all of them common Python datatypes. 6 of these are
set once and then not modified. 2 are modified around 4 times before
the instance's archving. Large collections (of small lists) of these
objects are created, iterated through, and sorted using any and all of
the 8 members as sorting keys.

It neither has nor needs custom methods.

I used a simple dictionary to create the application prototype. Now I
need to speed things up.
I first tried changing to a new style class, with __slots__, __init__,
__getstate__& __setstate__ (for pickling) and was shocked to see
things SLOW down over dictionaries.

So of these options, where should I go first to satisfy my need for
speed?

0. Back to dict
1. old style class
2. new style class
3. new style class, with __slots__, with or without some nuance I'm
missing.
4. tuple, with constants to mark the indices
5. namedTuple
6. other...

Aaron Brady 01-28-2009 09:50 PM

Re: dicts,instances,containers, slotted instances, et cetera.
 
On Jan 28, 2:38*pm, ocsch...@gmail.com wrote:
> Hi, all.
>
> I have an application that that creates, manipulates, and finally
> archives on disk 10^6 instances of an object that in CS/DB terms is
> best described as a relation.
>
> It has 8 members, all of them common Python datatypes. 6 of these are
> set once and then not modified. 2 are modified around 4 times before
> the instance's archving. Large collections (of small lists) of these
> objects are created, iterated through, and sorted using any and all of
> the 8 members as sorting keys.
>
> It neither has nor needs custom methods.
>
> I used a simple dictionary to create the application prototype. Now I
> need to speed things up.
> I first tried changing to a new style class, with __slots__, __init__,
> __getstate__& __setstate__ (for pickling) and was shocked to see
> things SLOW down over dictionaries.
>
> So of these options, where should I go first to satisfy my need for
> speed?
>
> 0. Back to dict
> 1. old style class
> 2. new style class
> 3. new style class, with __slots__, with or without some nuance I'm
> missing.
> 4. tuple, with constants to mark the indices
> 5. namedTuple
> 6. other...


Hello, quoting myself from another thread today:

There is the 'shelve' module. You could create a shelf that tells you
the filename of the 5 other ones. A million keys should be no
problem, I guess. (It's standard library.) All your keys have to be
strings, though, and all your values have to be pickleable. If that's
a problem, yes you will need ZODB or Django (I understand), or another
relational DB.

There is currently no way to store live objects.

Diez B. Roggisch 01-28-2009 10:21 PM

Re: dicts,instances,containers, slotted instances, et cetera.
 
ocschwar@gmail.com schrieb:
> Hi, all.
>
> I have an application that that creates, manipulates, and finally
> archives on disk 10^6 instances of an object that in CS/DB terms is
> best described as a relation.
>
> It has 8 members, all of them common Python datatypes. 6 of these are
> set once and then not modified. 2 are modified around 4 times before
> the instance's archving. Large collections (of small lists) of these
> objects are created, iterated through, and sorted using any and all of
> the 8 members as sorting keys.
>
> It neither has nor needs custom methods.
>
> I used a simple dictionary to create the application prototype. Now I
> need to speed things up.
> I first tried changing to a new style class, with __slots__, __init__,
> __getstate__& __setstate__ (for pickling) and was shocked to see
> things SLOW down over dictionaries.
>
> So of these options, where should I go first to satisfy my need for
> speed?
>
> 0. Back to dict
> 1. old style class
> 2. new style class
> 3. new style class, with __slots__, with or without some nuance I'm
> missing.
> 4. tuple, with constants to mark the indices
> 5. namedTuple
> 6. other...


Use a database? Or *maybe* a C-extension wrapped by ctypes.

Diez

ocschwar@gmail.com 01-28-2009 11:20 PM

Re: dicts,instances,containers, slotted instances, et cetera.
 
On Jan 28, 4:50*pm, Aaron Brady <castiro...@gmail.com> wrote:
> On Jan 28, 2:38*pm, ocsch...@gmail.com wrote:
>
> Hello, quoting myself from another thread today:
>
> There is the 'shelve' module. *You could create a shelf that tells you
> the filename of the 5 other ones. *A million keys should be no
> problem, I guess. *(It's standard library.) *All your keys have to be
> strings, though, and all your values have to be pickleable. *If that's
> a problem, yes you will need ZODB or Django (I understand), or another
> relational DB.
>
> There is currently no way to store live objects.



The problem is NOT archiving these objects. That works fine.

It's the computations I'm using these thigns for that are slow, and
that failed to speed up using __slots__.

What I need is something that will speed up getattr() or its
equivalent, and to a lesser degree setattr() or its equivalent.

ocschwar@gmail.com 01-28-2009 11:23 PM

Re: dicts,instances,containers, slotted instances, et cetera.
 
On Jan 28, 5:21*pm, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
> ocsch...@gmail.com schrieb:
>
>
>
> > Hi, all.

>
> > I have an application that that creates, manipulates, and finally
> > archives on disk 10^6 instances of an object that in CS/DB terms is
> > best described as a relation.

>
> > It has 8 members, all of them common Python datatypes. 6 of these are
> > set once and then not modified. 2 are modified around 4 times before
> > the instance's archving. Large collections (of small lists) of these
> > objects are created, iterated through, and sorted using any and all of
> > the 8 members as sorting keys.

>
> > It neither has nor needs custom methods.

>
> > I used a simple dictionary to create the application prototype. Now I
> > need to speed things up.
> > I first tried changing to a new style class, with __slots__, __init__,
> > __getstate__& __setstate__ (for pickling) and was shocked to see
> > things SLOW down over dictionaries.

>
> > So of these options, where should I go first to satisfy my need for
> > speed?

>
> > 0. Back to dict
> > 1. old style class
> > 2. new style class
> > 3. new style class, with __slots__, with or without some nuance I'm
> > missing.
> > 4. tuple, with constants to mark the indices
> > 5. namedTuple
> > 6. other...

>
> Use a database? Or *maybe* a C-extension wrapped by ctypes.
>
> Diez


I can't port the entire app to be a stored database procedure.

ctypes, maybe. I just find it odd that there's no quick answer on the
fastest way in Python to implement a mapping in this context.

Diez B. Roggisch 01-28-2009 11:24 PM

Re: dicts,instances,containers, slotted instances, et cetera.
 
ocschwar@gmail.com schrieb:
> On Jan 28, 4:50 pm, Aaron Brady <castiro...@gmail.com> wrote:
>> On Jan 28, 2:38 pm, ocsch...@gmail.com wrote:
>>
>> Hello, quoting myself from another thread today:
>>
>> There is the 'shelve' module. You could create a shelf that tells you
>> the filename of the 5 other ones. A million keys should be no
>> problem, I guess. (It's standard library.) All your keys have to be
>> strings, though, and all your values have to be pickleable. If that's
>> a problem, yes you will need ZODB or Django (I understand), or another
>> relational DB.
>>
>> There is currently no way to store live objects.

>
>
> The problem is NOT archiving these objects. That works fine.


I know. But if they are sorted to various criteria, doing that inside a
DB might also be faster. That was the point I wanted to make.

Diez

Steven D'Aprano 01-29-2009 04:44 AM

Re: dicts,instances,containers, slotted instances, et cetera.
 
On Wed, 28 Jan 2009 15:20:41 -0800, ocschwar wrote:

> On Jan 28, 4:50*pm, Aaron Brady <castiro...@gmail.com> wrote:
>> On Jan 28, 2:38*pm, ocsch...@gmail.com wrote:
>>
>> Hello, quoting myself from another thread today:
>>
>> There is the 'shelve' module. *You could create a shelf that tells you
>> the filename of the 5 other ones. *A million keys should be no problem,
>> I guess. *(It's standard library.) *All your keys have to be strings,
>> though, and all your values have to be pickleable. *If that's a
>> problem, yes you will need ZODB or Django (I understand), or another
>> relational DB.
>>
>> There is currently no way to store live objects.

>
>
> The problem is NOT archiving these objects. That works fine.
>
> It's the computations I'm using these thigns for that are slow, and that
> failed to speed up using __slots__.


You've profiled and discovered that the computations are slow, not the
archiving?

What parts of the computations are slow?


> What I need is something that will speed up getattr() or its equivalent,
> and to a lesser degree setattr() or its equivalent.


As you've found, __slots__ is not that thing.

>>> class Slotted(object):

.... __slots__ = 'a'
.... a = 1
....
>>> class Unslotted(object):

.... a = 1
....
>>> t1 = Timer('x.a', 'from __main__ import Slotted; x = Slotted()')
>>> t2 = Timer('x.a', 'from __main__ import Unslotted; x = Unslotted()')
>>>
>>> min(t1.repeat(10))

0.1138761043548584
>>> min(t2.repeat(10))

0.11414718627929688


One micro-optimization you can do is something like this:

for i in xrange(1000000):
obj.y = obj.x + 3*obj.x**2
obj.x = obj.y - obj.x
# 12 name lookups per iteration


Becomes:


y = None
x = obj.x
try:
for i in xrange(1000000):
y = x + 3*x**2
x = y - x
# 6 name lookups per iteration
finally:
obj.y = y
obj.x = x


Unless you've profiled and has evidence that the bottleneck is attribute
access, my bet is that the problem is some other aspect of the
computation. In general, your intuition about what's fast and what's slow
in Python will be misleading if you're used to other languages. E.g. in C
comparisons are fast and moving data is slow, but in Python comparisons
are slow and moving data is fast.


--
Steven

Michele Simionato 01-29-2009 06:17 AM

Re: dicts,instances,containers, slotted instances, et cetera.
 
On Jan 29, 12:23*am, ocsch...@gmail.com wrote:

> I just find it odd that there's no quick answer on the
> fastest way in Python to implement a mapping in this context.


A Python dict is as fast as you can get. If that is not enough, your
only choice is to try something at the C level, which may give the
desired speedup or not. Good luck!

Michele Simionato

James Stroud 01-29-2009 09:52 AM

Re: dicts,instances,containers, slotted instances, et cetera.
 
ocschwar@gmail.com wrote:
> I can't port the entire app to be a stored database procedure.


Perhaps I underestimate what you mean by this, but you may want to look
at pyTables (http://www.pytables.org/moin/HowToUse).

> ctypes, maybe. I just find it odd that there's no quick answer on the
> fastest way in Python to implement a mapping in this context.


Your explanation of where your prototype is slow is a little unclear. If
your data is largely numerical, you may want to rethink your
organization and use a numeric package. I did something similar and saw
an order of magnitude speed increase by switching from python data types
to numpy combined with careful tuning of how I managed the data.

You may have to spend more time on this than you would like, but if you
really put some thought into it and grind at your organization, you can
probably get a significant performance increase.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com


All times are GMT. The time now is 02:08 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.