Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Persistent objects

Reply
Thread Tools

Persistent objects

 
 
Paul Rubin
Guest
Posts: n/a
 
      12-12-2004
I've had this recurring half-baked desire for long enough that I
thought I'd post about it, even though I don't have any concrete
proposals and the whole idea is fraught with hazards.

Basically I wish there was a way to have persistent in-memory objects
in a Python app, maybe a multi-process one. So you could have a
persistent dictionary d, and if you say
d[x] = Frob(foo=9, bar=23)
that creates a Frob instance and stores it in d[x]. Then if you
exit the app and restart it later, there'd be a way to bring d back
into the process and have that Frob instance be there.

Please don't suggest using a pickle or shelve; I know about those
already. I'm after something higher-performance. Basically d would
live in a region of memory that could be mmap'd to a disk file as well
as shared with other processes. One d was rooted into that region,
any entries created in it would also be in that region, and any
objects assigned to the entries would also get moved to that region.

There'd probably have to be a way to lock the region for update, using
semaphores. Ordinary subscript assignments would lock automatically,
but there might be times when you want to update several structures in
a single transaction.

A thing like this could save a heck of a lot of SQL traffic in a busy
server app. There are all kinds of bogus limitations you see on web
sites, where you can't see more than 10 items per html page or
whatever, because they didn't want loading a page to cause too many
database hits. With the in-memory approach, all that data could be
right there in the process, no TCP messages needed and no context
switches needed, just ordinary in-memory dictionary references. Lots
of machines now have multi-GB of physical memory which is enough to
hold all the stuff from all but the largest sites. A site like
Slashdot, for example, might get 100,000 logins and 10,000 message
posts per day. At a 1k bytes per login (way too much) and 10k bytes
per message post (also way too much), that's still just 200 megabytes
for a full day of activity. Even a low-end laptop these days comes
with more ram than that, and multi-GB workstations are no big deal any
more. Occasionally someone might look at a several-day-old thread and
that might cause some disk traffic, but even that can be left in
memory (the paging system can handle it).

On the other hand, there'd either have to be interpreter hair to
separate the persistent objects from the non-persistent ones, or else
make everything persistent and then have some way to keep processes
sharing memory from stepping on each other. Maybe the abstraction
machinery in PyPy can make this easy.

Well, as you can see, this idea leaves a lot of details not yet
thought out. But it's alluring enough that I thought I'd ask if
anyone else sees something to pursue here.
 
Reply With Quote
 
 
 
 
Max M
Guest
Posts: n/a
 
      12-12-2004
Paul Rubin wrote:

> Basically I wish there was a way to have persistent in-memory objects
> in a Python app, maybe a multi-process one. So you could have a
> persistent dictionary d, and if you say
> d[x] = Frob(foo=9, bar=23)
> that creates a Frob instance and stores it in d[x]. Then if you
> exit the app and restart it later, there'd be a way to bring d back
> into the process and have that Frob instance be there.


Have you considdered using the standalone ZODB from Zope?


--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      12-12-2004
Max M <(E-Mail Removed)> writes:
> > Basically I wish there was a way to have persistent in-memory objects
> > in a Python app, maybe a multi-process one. So you could have a
> > persistent dictionary d, and if you say d[x] = Frob(foo=9, bar=23)
> > that creates a Frob instance and stores it in d[x]. Then if you
> > exit the app and restart it later, there'd be a way to bring d back
> > into the process and have that Frob instance be there.

>
> Have you considdered using the standalone ZODB from Zope?


No. I've heard that it's quite slow, and works sort of the way shelve
does. Am I mistaken? I want the objects to never leave memory except
through mmap.
 
Reply With Quote
 
Duncan Booth
Guest
Posts: n/a
 
      12-12-2004
Paul Rubin wrote:

> Well, as you can see, this idea leaves a lot of details not yet
> thought out. But it's alluring enough that I thought I'd ask if
> anyone else sees something to pursue here.
>


Have you looked at ZODB and ZEO? It does most of what you ask for, although
not necessarily in the way you suggest.

It doesn't attempt to hold everything in memory, but so long as most of
your objects are cache hits this shouldn't matter. Nor does it use shared
memory: using ZEO you can have a client server approach so you aren't
restricted to a single machine.

Instead of a locking scheme each thread works within a transaction, and
only when the transaction is committed do you find out whether your changes
are accepted or rejected. If they are rejected then you simply try again.
So long as most of your accesses are read-only, and transactions are
committed quickly this scheme can work better than locking.
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      12-12-2004
Duncan Booth <(E-Mail Removed)> writes:
> Have you looked at ZODB and ZEO? It does most of what you ask for,
> although not necessarily in the way you suggest.


You're the second person to mention these, so maybe I should check into
them more. But I thought they were garden-variety persistent object
schemes that wrote pickles into disk files. That's orders of magnitude
slower than what I had in mind.

> It doesn't attempt to hold everything in memory, but so long as most of
> your objects are cache hits this shouldn't matter. Nor does it use shared
> memory: using ZEO you can have a client server approach so you aren't
> restricted to a single machine.


Well, if it doesn't use shared memory, what does it do instead? If
every access has to go through the TCP stack, you're going to get
creamed speed-wise. The mmap scheme should be able to do millions of
operations per second. Are there any measurements of how many
ops/second you can get through ZODB?
 
Reply With Quote
 
Nigel Rowe
Guest
Posts: n/a
 
      12-12-2004
Paul Rubin wrote:

> I've had this recurring half-baked desire for long enough that I
> thought I'd post about it, even though I don't have any concrete
> proposals and the whole idea is fraught with hazards.
>
> Basically I wish there was a way to have persistent in-memory objects
> in a Python app, maybe a multi-process one.

<<snip>>

Maybe POSH (http://poshmodule.sourceforge.net/) is what you want.

From the "About POSH"

Python Object Sharing, or POSH for short, is an extension module to Python
that allows objects to be placed in shared memory. Objects in shared memory
can be accessed transparently, and most types of objects, including
instances of user-defined classes, can be shared. POSH allows concurrent
processes to communicate simply by assigning objects to shared container
objects.

--
Nigel Rowe
A pox upon the spammers that make me write my address like..
rho (snail) swiftdsl (stop) com (stop) au
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      12-12-2004
Nigel Rowe <(E-Mail Removed)> writes:
> Maybe POSH (http://poshmodule.sourceforge.net/) is what you want.


Thanks, that is great. The motivation was somewhat different but it's
clear that the authors faced and dealt with most of the same issues
that were bugging me. I had hoped to avoid the use of those proxy
objects but I guess there's no decent way around them in a
multi-process setting. The authors similarly had to reimplement the
basic Python container types, which I'd also hoped could be avoided,
but I guess what they did was straightforward if messier than I'd like.

POSH also makes no attempt to implement persistence, but maybe that's
a fairly simple matter of mmap'ing the shared memory region and
storing some serialized representation of the proxy objects. If I
correctly understand how POSH works, the number of proxies active at
any moment should be fairly low.
 
Reply With Quote
 
Alan Kennedy
Guest
Posts: n/a
 
      12-12-2004
Hi Paul,

[Paul Rubin]
> Basically I wish there was a way to have persistent in-memory objects
> in a Python app, maybe a multi-process one. So you could have a
> persistent dictionary d, and if you say
> d[x] = Frob(foo=9, bar=23)
> that creates a Frob instance and stores it in d[x]. Then if you
> exit the app and restart it later, there'd be a way to bring d back
> into the process and have that Frob instance be there.


Have you looked at Ian Bicking's SQLObject?

http://sqlobject.org/

To define a class

class MyPersistentObj(SQLObject):

foo = IntCol()
bar = IntCol()

To instantiate a new object

my_new_object = MyPersistentObj(foo=9, bar=23)

Once the new object has been created, it has already been persisted into
a RDBMS table automatically. To reload it from the table/database, e.g.
after a system restart, simply supply its id.

my_existing_object = MyPersistentObj.get(id=42)

Select a subset of your persistent objects using SQL-style queries

my_foo_9_objects = MyPersistentObj.select(MyPersistentObj.q.foo == 9)
for o in my_foo_nine_objects:
process(o)

SQLObject also takes care of caching, in that objects are optionally
cached, associated with a specific connection to the database. (this
means that it is possible to have different versions of the same object
cached with different connections, but that's easy to solve with good
application architecture). So in your case, if your (web?) app is
persistent/long-running, then you can simply have SQLObject cache all
your objects, assuming you've got enough memory. (Hmm, I wonder if
SQLObject could be made to work with weak-references?). Lastly, caching
can be disabled.

I've found performance of SQLObject to be pretty good, but since you
haven't specified particular requirements for performance, it's not
possible to say if it meets your criteria. Although I feel comfortable
in saying that SQLObject combined with an SQLite in-memory database
should give pretty good performance, if you've got the memory to spare
for the large databases you describe.

Other nice features include

1. RDBMS independent: currently supported are PostGres, FireBird, MySQL,
SQLite, Oracle, Sybase, DBM. SQLServer support is in the pipepline.
SQLObject code should be completely portable between such backend stores.

2. Full support for ACID transactional updates to data.

3. A nice facility for building SQL queries using python syntax.

4. Automated creation of tables and databases. Table structure
modification supported on most databases.

5. Full support for one-to-one, one-to-many and many-to-many
relationships between objects.

All in all, a great little package. I recommend that you take a close look.

Regards,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      12-12-2004
Alan Kennedy <(E-Mail Removed)> writes:
> Have you looked at Ian Bicking's SQLObject?
>
> http://sqlobject.org/


That sounds like Python object wrappers around SQL transactions.
That's the opposite of what I want. I'm imagining a future version of
Python with native compilation. A snippet like

user_history[username].append(time())

where user_history is an ordinary Python dict, would take a few dozen
machine instructions. If user_history is a shared memory object of
the type I'm imagining, there might be a few dozen additional
instructions of overhead dealing with the proxy objects. But if SQL
databases are involved, that's thousands of instructions, context
switches, TCP messages, and whatever. That's orders of magnitude
difference.
 
Reply With Quote
 
Irmen de Jong
Guest
Posts: n/a
 
      12-12-2004
Paul Rubin wrote:
> Basically I wish there was a way to have persistent in-memory objects
> in a Python app, maybe a multi-process one. So you could have a
> persistent dictionary d, and if you say
> d[x] = Frob(foo=9, bar=23)
> that creates a Frob instance and stores it in d[x]. Then if you
> exit the app and restart it later, there'd be a way to bring d back
> into the process and have that Frob instance be there.


If I'm not mistaken, PyPersist (http://www.pypersyst.org/) is
something like this. Haven't used it though...

--Irmen
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Persistent field and Persistent properties - difference gk Java 7 10-12-2010 09:43 PM
Re: Persistent Distributed Objects bouncyinc@gmail.com Python 2 10-10-2009 09:40 AM
querying persistent ruby objects in memory braver Ruby 2 05-26-2007 10:33 PM
How can i have persistent objects in the server side? gnufied Ruby 1 05-26-2006 09:13 AM
C++ and persistent objects Perry St-Germain C++ 2 11-18-2003 02:22 PM



Advertisments