Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Minimalistic Software Transactional Memory

Reply
Thread Tools

Minimalistic Software Transactional Memory

 
 
Michael Sparks
Guest
Posts: n/a
 
      12-08-2007
Hi,


I'm interested in writing a simple, minimalistic, non persistent (at this
stage) software transactional memory (STM) module. The idea being it should
be possible to write such a beast in a way that can be made threadsafe fair
easily.

For those who don't know, STM is a really fancy way of saying variables
with version control (as far as I can tell designed to enable threadsafe
shared data.

I'm starting with the caveat here that the following code is almost
certainly not threadsafe (not put any real thought into that as yet),
and I'm interested in any feedback on the following:

* Does the API look simple enough?
* Are there any glaring mistakes in the code ? (It's always harder to see
your own bugs)
* What key areas appear least threadsafe, and any general suggestions
around that.

If I get no feedback I hope this is of interest. Since these things get
archived, if you're reading this a month, 6 months, a year or more from
now, I'll still be interested in feedback...

OK, API.

First of all we need to initialise the store:

S = Store()

We then want to get a value from the store such that we can use the value,
and do stuff with it:

greeting = S.using("hello")

Access the value:

print repr(greeting.value)

Update the value:

greeting.set("Hello World")

Commit the value back to the store:

greeting.commit()

If you have concurrent updates of the same value, the following exception
gets thrown:
ConcurrentUpdate

cf:
>>> S = Store()
>>> greeting = S.using("hello")
>>> par = S.using("hello")
>>> greeting.set("Hello World")
>>> par.set("Woo")
>>> greeting.commit()
>>> par.commit()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 12, in commit
File "<stdin>", line 11, in set
__main__.ConcurrentUpdate

That's pretty much the simplest API I can come up with. (I've tried a few
others but didn't really like them)

The way this works is we have a Store that manages Values. (perhaps a better
name may be variables to avoid clashing with pythons parlance of labels and
values?)

Anyhow, you can ask the store for Value, which it will give you. The Value
knows what it's called and where it's from, so when you tell it to commit,
it can try to do so. The little detail here is you get a /copy/ of the
Value not the stored Value. (This is to avoid accidental concurrent update
of the same actual object)

As a result, Store looks like this:

class Store(object):
def __init__(self):
self.store = {}

def get(self, key):
return self.store[key].clone()

def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1, value.value, self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

def using(self, key):
try:
return self.get(key)
except KeyError:
self.store[key] = Value(0, None,self,key)
return self.get(key)

def dump(self):
for k in self.store:
print k, ":", self.store[k]

You'll note that the set method is the greatest candidate for any possible
race hazard here - though I can see a possible boundary issue in "using".
(I think

Otherwise I think the above code is relatively straightforward. You'll note
that this API allows this:

greeting.set("Hello")
greeting.commit()
greeting.set("Hello World")
greeting.commit()
greeting.set("Hello World. Game")
greeting.commit()
greeting.set("Hello World. Game Over")
greeting.commit()

The other class is value that looks like this:

class Value(object):
def __init__(self, version, value,store,key):
self.version = version
self.value = value
self.store = store
self.key = key

def __repr__(self):
return "Value"+repr((self.version,self.value,self.store,s elf.key))

def set(self, value):
self.value = value

def commit(self):
self.store.set(self.key, self)

def clone(self):
return Value(self.version, self.value,self.store,self.key)

To me this looks like a pretty complete minimalistic thing, which does seem
to work OK, but I'm interested in the three points I mention above - if
anyone is willing to comment - specifcally:

* Does the API look simple enough?
* Are there any glaring mistakes in the code ? (It's always harder to see
your own bugs)
* What key areas appear least threadsafe, and any general suggestions
around that.

Full code below.

Many thanks for any comments in advance,


Michael
--
Michael Sparks, Kamaelia Project Lead
http://kamaelia.sourceforge.net/Developers/
http://yeoldeclue.com/blog

#!/usr/bin/python

class ConcurrentUpdate(Exception): pass

class Value(object):
def __init__(self, version, value,store,key):
self.version = version
self.value = value
self.store = store
self.key = key

def __repr__(self):
return "Value"+repr((self.version,self.value))

def set(self, value):
self.value = value

def commit(self):
self.store.set(self.key, self)

def clone(self):
return Value(self.version, self.value,self.store,self.key)

class Store(object):
def __init__(self):
self.store = {}

def get(self, key):
return self.store[key].clone()

def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1, value.value, self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

def using(self, key):
try:
return self.get(key)
except KeyError:
self.store[key] = Value(0, None,self,key)
return self.get(key)

def dump(self):
for k in self.store:
print k, ":", self.store[k]

S = Store()
greeting = S.using("hello")
print repr(greeting.value)
greeting.set("Hello World")
greeting.commit()
# ------------------------------------------------------
print greeting
S.dump()
# ------------------------------------------------------
par = S.using("hello")
par.set("Woo")
par.commit()
# ------------------------------------------------------
print greeting
S.dump()
# ------------------------------------------------------
greeting.set("Woo")
greeting.commit()

print repr(greeting), repr(greeting.value)

S.dump()

 
Reply With Quote
 
 
 
 
Fuzzyman
Guest
Posts: n/a
 
      12-08-2007
On Dec 8, 10:53 pm, Michael Sparks <(E-Mail Removed)> wrote:
> Hi,
>
> I'm interested in writing a simple, minimalistic, non persistent (at this
> stage) software transactional memory (STM) module. The idea being it should
> be possible to write such a beast in a way that can be made threadsafe fair
> easily.
>
> For those who don't know, STM is a really fancy way of saying variables
> with version control (as far as I can tell designed to enable threadsafe
> shared data.
>
> I'm starting with the caveat here that the following code is almost
> certainly not threadsafe (not put any real thought into that as yet),
> and I'm interested in any feedback on the following:
>
> * Does the API look simple enough?
> * Are there any glaring mistakes in the code ? (It's always harder to see
> your own bugs)
> * What key areas appear least threadsafe, and any general suggestions
> around that.
>
> If I get no feedback I hope this is of interest. Since these things get
> archived, if you're reading this a month, 6 months, a year or more from
> now, I'll still be interested in feedback...
>
> OK, API.
>
> First of all we need to initialise the store:
>
> S = Store()
>
> We then want to get a value from the store such that we can use the value,
> and do stuff with it:
>
> greeting = S.using("hello")
>
> Access the value:
>
> print repr(greeting.value)
>
> Update the value:
>
> greeting.set("Hello World")
>
> Commit the value back to the store:
>
> greeting.commit()
>
> If you have concurrent updates of the same value, the following exception
> gets thrown:
> ConcurrentUpdate
>
> cf:
> >>> S = Store()
> >>> greeting = S.using("hello")
> >>> par = S.using("hello")
> >>> greeting.set("Hello World")
> >>> par.set("Woo")
> >>> greeting.commit()
> >>> par.commit()

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 12, in commit
> File "<stdin>", line 11, in set
> __main__.ConcurrentUpdate
>
> That's pretty much the simplest API I can come up with. (I've tried a few
> others but didn't really like them)
>
> The way this works is we have a Store that manages Values. (perhaps a better
> name may be variables to avoid clashing with pythons parlance of labels and
> values?)
>
> Anyhow, you can ask the store for Value, which it will give you. The Value
> knows what it's called and where it's from, so when you tell it to commit,
> it can try to do so. The little detail here is you get a /copy/ of the
> Value not the stored Value. (This is to avoid accidental concurrent update
> of the same actual object)
>


Wouldn't it be more useful having a store (possibly with a mapping
API?) that stored several values. Then a single commit can be done at
the end of the transaction.

If the transaction fails, then the whole process can be repeated with
none of the changes having been committed.

The tricky part is that other threads asking for the value should get
the *old* value until the commit has been done - so you need to know
which thread 'value' is being asked for from.

Michael
 
Reply With Quote
 
 
 
 
Michael Sparks
Guest
Posts: n/a
 
      12-09-2007
Fuzzyman wrote:

> Wouldn't it be more useful having a store (possibly with a mapping
> API?) that stored several values. Then a single commit can be done at
> the end of the transaction.
>
> If the transaction fails, then the whole process can be repeated with
> none of the changes having been committed.


Quite probably yes

However initially I thought I'd start off with trying to get single values
right (A single value can be a dict of course, but your point holds for
arbitrary unrelated values)

I suspect the way to deal with that might be to do something like:

S = Store()
D = S.using("account_one", "account_two", "myaccount")
D["myaccount"].set(D["account_one"].value+D["account_two"].value)
D["account_one"].set(0)
D["account_two"].set(0)
D.commit()

Which seems pretty nice/simple. It'd be useful for the commit failure to
indicate which value was broken by a concurrent update, but not necessary.

Context:

At present Kamaelia has something which tracks keys and values - primarily
services ("where's the person who manages the pygame display?"). At present
those are only really accessible by generator based components which means
that key/value context is actually accessed in a single threaded way making
services safe for non-threaded components.

However, making accessing that store safe from threads requires changing to
something more advanced, either locking, or using STM. STM seems more in
keeping with Kamaelia being generally lock-free.

The way this store is currently used is exclusively in a single key/value
lookup, which is why I was thinking of that first - though I can see to
be more generally useful allowing multi-value updates would be useful

> The tricky part is that other threads asking for the value should get
> the old value until the commit has been done - so you need to know
> which thread 'value' is being asked for from.


I think this approach deals with this naturally. cf:
def get(self, key): # in Store
return self.store[key].clone()
....
def clone(self): # in Value
return Value(self.version, self.value,self.store,self.key)
....
def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1, value.value, self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

This means that the value in 2 threads for the same checked out value names
can be different, but they get warning of this when they try to commit.

eg suppose the following interleaving of commands:

S = Store()
D = S.using("account_one", "account_two", "myaccount")
D["myaccount"].set(D["account_one"].value+D["account_two"].value)
D["account_one"].set(0)
E = S.using("account_one", "myaccount")
E["myaccount"].set(E["myaccount"].value-100)
E["account_one"].set(100)
E.commit()
D["account_two"].set(0)
D.commit()

I did think that this would work with the current code but changing the
value to something more complex - like this:

S = Store()
D = S.using("accounts")
D.set({"account_one":50, "account_two":100, "myaccount":0})
D.commit()
print "here"
S.dump()
X = D.value
X["myaccount"] = X["account_one"] + X["account_two"]
X["account_one"] = 0
X["account_two"] = 0
D.set(X)
D.commit()
S.dump()
print "Committed", D.value["myaccount"]

But in practice, because I'm not doing a _copy_ here of self.value:
def clone(self): # in Value
return Value(self.version, self.value,self.store,self.key)

I end up with accidental concurrent access, which is easy to fix, for
moderately complex datatypes:
import copy
.....
def clone(self): # in Value
return Value(self.version, copy.deepcopy(self.value),
self.store, self.key)

As well as here:
def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1,
copy.deepcopy(value.value), self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

If I do that, then the following code fails as would be expected:
S = Store()
D = S.using("accounts")
D.set({"account_one":50, "account_two":100, "myaccount":0})
D.commit() # First
S.dump()
X = D.value
X["myaccount"] = X["account_one"] + X["account_two"]
X["account_one"] = 0

E = S.using("accounts")
Y = E.value
Y["myaccount"] = Y["myaccount"]-100
Y["account_one"]= 100
E.set(Y)
E.commit() # Second
S.dump()

X["account_two"] = 0
D.set(X)
D.commit() # Third
S.dump()
print "Committed", D.value["myaccount"]

Fails as follows:

accounts : Value(1, {'account_two': 100, 'myaccount': 0, 'account_one': 50})
accounts : Value(2, {'account_two': 100, 'myaccount': -100, 'account_one':
100})
Traceback (most recent call last):
File "./NewSTM.py", line 70, in <module>
D.commit() # Third
File "./NewSTM.py", line 20, in commit
self.store.set(self.key, self)
File "./NewSTM.py", line 37, in set
raise ConcurrentUpdate
__main__.ConcurrentUpdate

(which is of course the error wanted - since we want to be able to detect
failure)

It's probably useful to know for the more general approach you suggest.

Thanks!


Michael.
--
Michael Sparks, Kamaelia Project Lead
http://kamaelia.sourceforge.net/Developers/
http://yeoldeclue.com/blog

 
Reply With Quote
 
Duncan Booth
Guest
Posts: n/a
 
      12-09-2007
Michael Sparks <(E-Mail Removed)> wrote:

> I'm interested in writing a simple, minimalistic, non persistent (at
> this stage) software transactional memory (STM) module. The idea being
> it should be possible to write such a beast in a way that can be made
> threadsafe fair easily.
>
> For those who don't know, STM is a really fancy way of saying
> variables with version control (as far as I can tell designed to
> enable threadsafe shared data.
>
> I'm starting with the caveat here that the following code is almost
> certainly not threadsafe (not put any real thought into that as yet),
> and I'm interested in any feedback on the following:
>
> * Does the API look simple enough?
> * Are there any glaring mistakes in the code ? (It's always harder
> to see
> your own bugs)
> * What key areas appear least threadsafe, and any general
> suggestions
> around that.
>
> If I get no feedback I hope this is of interest. Since these things
> get archived, if you're reading this a month, 6 months, a year or more
> from now, I'll still be interested in feedback...


Unless you really are desperate to reinvent the wheel, have you looked at
ZODB? https://launchpad.net/zodb

ZODB gives you the transactional model you want. It also gives you
persistence, but if you don't want that you can simply connect to a non-
persistent store.
 
Reply With Quote
 
Fuzzyman
Guest
Posts: n/a
 
      12-09-2007
> STM seems more in
> keeping with Kamaelia being generally lock-free.


STM isn't lock free - it just abstracts the locks away from the
'user'. You still need to lock around committing the transaction.
Other threads accessing the data during the transaction should see the
old values until the commit is successful.

Michael
http://www.manning.com/foord
 
Reply With Quote
 
Michael Sparks
Guest
Posts: n/a
 
      12-09-2007
Duncan Booth wrote:

> Michael Sparks <(E-Mail Removed)> wrote:
>
>> I'm interested in writing a simple, minimalistic, non persistent (at
>> this stage) software transactional memory (STM) module. The idea being
>> it should be possible to write such a beast in a way that can be made
>> threadsafe fair easily.
>>
>> For those who don't know, STM is a really fancy way of saying
>> variables with version control (as far as I can tell designed to
>> enable threadsafe shared data.
>>
>> I'm starting with the caveat here that the following code is almost
>> certainly not threadsafe (not put any real thought into that as yet),
>> and I'm interested in any feedback on the following:
>>
>> * Does the API look simple enough?
>> * Are there any glaring mistakes in the code ? (It's always harder
>> to see
>> your own bugs)
>> * What key areas appear least threadsafe, and any general
>> suggestions
>> around that.
>>
>> If I get no feedback I hope this is of interest. Since these things
>> get archived, if you're reading this a month, 6 months, a year or more
>> from now, I'll still be interested in feedback...

>
> Unless you really are desperate to reinvent the wheel, have you looked at
> ZODB? https://launchpad.net/zodb
>
> ZODB gives you the transactional model you want. It also gives you
> persistence, but if you don't want that you can simply connect to a non-
> persistent store.


That seems somewhat overkill for my needs. ZODB's distribution is 3.3MB in
size, whereas the system I want a minimalistic, non-persistant[1]
transactional memory for is 345K in size. (The Axon part of kamaelia)
I'll take a look though.

[1] I said "at this stage" because its something I *may* want in future
for some possible usecases, but generally I'm not really very interested
in persistence. (At least not where I'm putting this

Context: I want thing that use the following class threadsafe, and can think
of a few ways of doing this, and STM seems one of the more "nicer" ways of
doing so. (it's clients are currently just generators which makes it safe at
the moment)

https://kamaelia.svn.sourceforge.net...tantTracker.py

It's used to provide an ENVironment like facility (similar to os.environ)
for generator based Kamaelia components.

Thanks for the feedback


Michael.


 
Reply With Quote
 
Michael Sparks
Guest
Posts: n/a
 
      12-09-2007
Fuzzyman wrote:

>> STM seems more in
>> keeping with Kamaelia being generally lock-free.

>
> STM isn't lock free - it just abstracts the locks away from the
> 'user'. You still need to lock around committing the transaction.
>


I perhaps phrased what I meant too tersely.

Kamaelia isn't lock free either. It's generally lock free. User code
however, doesn't ever see locks... (Though threaded components do
communicate over thread safe Queues which contain locks internally.
Generator components don't use locking)

I suppose the reason why I like STM (conceptually) is because it appears to
follow the same sort of idea - the user of the code doesn't have to worry
about locks - the just have to worry about whether they're using the system
correctly or not.

Based on this, I think I'm right in thinking then that given the code I
posted, that at minimum I need locking here, since it is the critical
section.

def set(self, key, value):
# Attempt to acquire lock
if not (self.store[key].version > value.version):
# Also assuming we have lock
self.store[key] = Value(value.version+1,
copy.deepcopy(value.value), self, key)
value.version= value.version+1
# Release lock
else:
# Also do this if we can't get access to the lock
raise ConcurrentUpdate

I know I then have a choice of locking the value (ie a lock specific to
self.store[key]) or the self.store as a whole. Depends on how antisocial
you want the locking to be I suppose

> Other threads accessing the data during the transaction should see the
> old values until the commit is successful.


The implementation I'm suggesting means that other threads would see the
old values until the try to commit their changes. (I'm after something
simple since I'm expecting to deal with simple cases)

Many thanks


Michael.

 
Reply With Quote
 
John J. Lee
Guest
Posts: n/a
 
      12-09-2007
Michael Sparks <(E-Mail Removed)> writes:

> Duncan Booth wrote:
>
>> Michael Sparks <(E-Mail Removed)> wrote:
>>
>>> I'm interested in writing a simple, minimalistic, non persistent (at
>>> this stage) software transactional memory (STM) module. The idea being
>>> it should be possible to write such a beast in a way that can be made
>>> threadsafe fair easily.

[...]
>> Unless you really are desperate to reinvent the wheel, have you looked at
>> ZODB? https://launchpad.net/zodb

[...]
> That seems somewhat overkill for my needs. ZODB's distribution is 3.3MB in
> size, whereas the system I want a minimalistic, non-persistant[1]
> transactional memory for is 345K in size. (The Axon part of kamaelia)
> I'll take a look though.

[...]

Durus might be worth a look too (though I doubt it's suitable for your
situation):

http://www.mems-exchange.org/software/durus/


The link to their paper about it seems to be broken, but I think it
was based somewhat on ZODB, but is simpler (67k tarball .


John
 
Reply With Quote
 
Michael Sparks
Guest
Posts: n/a
 
      12-09-2007
John J. Lee wrote:

> Durus might be worth a look too (though I doubt it's suitable for your
> situation):
>
> http://www.mems-exchange.org/software/durus/
>
> The link to their paper about it seems to be broken, but I think it
> was based somewhat on ZODB, but is simpler (67k tarball .


Much appreciated. I've downloaded this and it looks more suitable,
however it still looks like overkill... After all, I'm just after a
simple system for concurrent update of in-memory shared values by
multiple threads - how hard can that be ?[1]

[1] Yes, yes, for those who don't know me, I know, I know...

Thanks to Fuzzyman's comments and a code review on IRC I think I've got what
I think is a minimally sufficient system - though I'll extend to include
Fuzzyman's suggestion regarding concurrent update of multiple independent
values

Code is here for those curious/wanting something similar/similarly
lightweight:

https://kamaelia.svn.sourceforge.net...ents/NewSTM.py

I'll package it up independently of the rest of Kamaelia as well probably
once I've made the addition noted above

Regards,


Michael
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      01-11-2008
Fuzzyman <(E-Mail Removed)> writes:
> STM isn't lock free - it just abstracts the locks away from the
> 'user'. You still need to lock around committing the transaction.


The idea is that readers don't need locks. They just look at the
version number before they start reading and after they finish. If
the number didn't change, the read was successful. If it changed,
they have to retry.

On multiprocessor systems this relies on an instruction like CMPXCHG
for writers to increment the version number. Otherwise it requires a
lock separate from the version number, and multiple operations on the
lock.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Announcement: depikt - the minimalistic python gate to gtk DreiJane Python 0 11-20-2009 10:51 AM
MiLi: minimalistic libraries dgutson C++ 0 04-01-2009 12:29 PM
New interesting application of Transactional Memory forsingle-threading Dmitriy V'jukov C++ 3 10-23-2008 08:24 PM
ANN: Axon.STM 1.0.1 (release) Minimalistic Software TransactionalMemory (with examples) Michael Sparks Python 0 12-24-2007 11:56 AM
ANN: Axon.STM 1.0.0 (beta) Minimalistic Software Transactional Memory Michael Sparks Python 0 12-10-2007 01:17 AM



Advertisments