Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > The future of Python immutability

Reply
Thread Tools

The future of Python immutability

 
 
Francesco Bochicchio
Guest
Posts: n/a
 
      09-04-2009
On Sep 3, 9:07*pm, Nigel Rantor <(E-Mail Removed)> wrote:
>
> Right, this is where I would love to have had more experience with Haksell.
>
> Yes, as soon as you get to a situation where no thread can access shared
> state that is mutable your problems go away, you're also getting no work
> done becasue the threads, whilst they may be performing lots of
> interesting calculations have no way of allowing the rest of the
> program, or operating system, know about it.
>


Threads could communicate only with channels, message queue, or
equivalent. Is what
I do that as much as I can, to avoid the headache of sharing data
between threads. It is
less efficient than the shared data model and adds latency, but ensure
that each thread
is self-contained, making for safer programming and opening the way to
better parallelization.
AFAIK erlang Processes and scala Actors implement a similar model at
language level.

In python, there is kamaelia that implements a similar paradigm,
although it is more concerned
with logical parallelization than with multitheading performance
issue.

I believe this kind of paradigms will bring us to the massive
multicore world easier than FP.
Consider that most FP languages have accepted a compromise and become
'unpure' (i.e. have
constructs to change variable in place). Even haskell, the only pure
FP
language I know (sort of), has things like mutable arrays.
All these features expose current FP languages at the same 'shared
resource' risks of imperative one,
although admittedly at a less level. And FP languages have their own
crop of problems - e.g how to deal
efficiently with deep recursion levels, parameters passed by copy,
huge list built in memory (if you use eager evaluation)
or build-up of thunks (if you use lazy evaluation).

But then, I'm just a programmer, so I could be easily wrong

Ciao
-----
FB


 
Reply With Quote
 
 
 
 
Ulrich Eckhardt
Guest
Posts: n/a
 
      09-04-2009
Nigel Rantor wrote:
> John Nagle wrote:
>> Immutability is interesting for threaded programs, because
>> immutable objects can be shared without risk. Consider a programming
>> model where objects shared between threads must be either immutable or
>> "synchronized" in the sense that Java uses the term.

>
> Yes, this is one of the reasons I am currently learning Haskell, I am
> not yet anywhwere near proficient but the reason I am looking into FP is
> because of some of the claims of the FP community, particularly Erlang,
> regarding the benefits of pure FP with respect to multi-threading.


Actually, I wouldn't say that FP itself changes anything there. However, FP
usually (correct me if I'm wrong) implies immutability of objects, i.e. no
variable assignment.


> > Such programs are free of most race conditions, without much
> > programmer effort to make them so.

>
> I disagree. They are not free of most race conditions, and it still
> takes a lot of effort. Where did you get this idea from? Have you been
> reading some Java primer that attempts to make it sound easy?

[...]
>> With this mechanism, multi-thread programs with shared data
>> structures can be written with little or no explicit locking by
>> the programmer. If the restrictions are made a bit stricter,
>> strict enough that threads cannot share mutable unsynchronized data,
>> removal of the "global interpreter lock" is potentially possible.
>> This is a route to improved performance on modern multi-core CPUs.

>
> Right, this is where I would love to have had more experience with
> Haksell.
>
> Yes, as soon as you get to a situation where no thread can access shared
> state that is mutable your problems go away, you're also getting no work
> done becasue the threads, whilst they may be performing lots of
> interesting calculations have no way of allowing the rest of the
> program, or operating system, know about it.


I think it is the combination of immutable objects and message passing
(which he omitted mentioning) that is the point. I initially encountered
this principle in Erlang, but it also applies in other languages. For
example in C++, you can create MT-program using std::auto_ptr<T> for
mutable, exclusively owned objects and boost::shared_pointer<T const> for
shared, immutable objects.


> So, in response to your point of trying to get an immutable API so that
> Python can easily have multi-threaded programs that do not present race
> conditions I would say the following:
>
> That is not the challenge, that's the easy part. The challenge is
> getting useful information out of a system that has only been fed
> immutable objects.


You're overly pessimistic. Immutable objects are a tool, not the holy grail.
Having them available can help you solve some problems, not all of them.
You obviously still need mutable data, but if you can restrict access to
them to just one thread, there is no need for synchronisation for them.
Lastly, for the message passing, you also need shared, mutable structures
(queues), so you can't live completely without conventional locking.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      09-04-2009
Ulrich Eckhardt <(E-Mail Removed)> writes:
> Lastly, for the message passing, you also need shared, mutable structures
> (queues), so you can't live completely without conventional locking.


But that can be completely behind the scenes in the language or
library implementation. The application programmer doesn't have to
think about those locks.
 
Reply With Quote
 
Hendrik van Rooyen
Guest
Posts: n/a
 
      09-04-2009
On Thursday 03 September 2009 21:07:21 Nigel Rantor wrote:

> That is not the challenge, that's the easy part. The challenge is
> getting useful information out of a system that has only been fed
> immutable objects.


Is it really that difficult? (completely speculative):

class MyAnswer(object):
def __init__(self)
self.finished = False
self.Answer = None
self.collected = False

Then when you start a thread, you give it an instance:

ans = MyAnswer()
list_of_answers.append(c)

worker_thread = thread.start_new_thread(worker, (ans, immutable_stuff ))

def worker(ans):
ans.Answer = do_some_stuff()
ans.finished = True

Then to see if everybody is finished:

runbool = True
While runbool:
finished = False
runbool = False
for answer in list_of_answers:
if answer.finished and not answer.collected:
do_something(answer.Answer)
answer.collected = True
else:
runbool = True

This scheme gives only one thread the license to make a change to the answer
instance, so how can it go wrong?

You can also extend it for long running threads by playing ping pong with the
two booleans - the collector sets the collected boolean and clears the
finished, and the worker must clear the collected boolean before starting a
new cycle. ( the worker waits for not finished, then clears collected)

I do similar stuff in assembler and I have no problems - Am I missing
something subtle?

Of course - working with immutable objects only is a bit like working with
read only memory, or worse, write only memory - not easy.

- Hendrik


 
Reply With Quote
 
Adam Skutt
Guest
Posts: n/a
 
      09-04-2009
On Sep 3, 2:03*pm, John Nagle <(E-Mail Removed)> wrote:
> * * *Suppose, for discussion purposes, we had general "immutable objects".
> Objects inherited from "immutableobject" instead of "object" would be
> unchangeable once "__init__" had returned. *Where does this take us?


You can create this in various ways through metaclasses. I've done
it, mostly because I was curious to see how hard it would be and if it
actually gained me anything useful.

> * * *With this mechanism, multi-thread programs with shared data
> structures can be written with little or no explicit locking by
> the programmer. *If the restrictions are made a bit stricter,
> strict enough that threads cannot share mutable unsynchronized data,
> removal of the "global interpreter lock" is potentially possible.
> This is a route to improved performance on modern multi-core CPUs.

Nope, preventing mutation of the objects themselves is not enough.
You also have to forbid variables from being rebound (pointed at
another object). Consider this simple example:

---------- Thread 1 ---------- | ---------- Thread 2 ----------
a = "Foo"
spawn Thread 2
a = "Bar" print "thread 2: %s" % a
print "thread 1: %s" % a

You could see (ignoring the fact the output isn't ordered):
"thread 1: Bar"
"thread 2: Foo"
or:
"thread 1: Bar"
"thread 2: Bar"

so the fact "Foo" and "Bar" are immutable isn't enough to solve the
problem. The variables themselves, since they obey pointer semantics,
must also be forbidden from being reseated (i.e., they must be
references in the C++ sense or become 'T const * const' pointers).

Thanks,
Adam
 
Reply With Quote
 
sturlamolden
Guest
Posts: n/a
 
      09-05-2009
On 4 Sep, 06:20, John Nagle <(E-Mail Removed)> wrote:

> > In the current CPython implementation, every object has a reference
> > count, even immutable ones. This must be a writable field - and here you
> > have your race condition, even for immutable objects.

>
> * * That's an implementation problem with CPython. *I'm looking at this as
> a language design issue. *Shed Skin, which is garbage-collected, doesn't
> have that problem. *Look ahead to a new generation of Python implementations
> that go fast and use multiprocessors effectively.


But in that case, the problem is reference counting, not mutable
objects. If you get rid of reference counts you get rid of the GIL. At
the same time you introduce far worse problems: Memory use will
increase (garbage pile up), long pauses while garbage are collected
(very bad for servers), deallocator methods of extension types not
called when you expect them to. Java's VM may be faster than CPython's
VM, but it runs less smooth.










 
Reply With Quote
 
sturlamolden
Guest
Posts: n/a
 
      09-05-2009
On 3 Sep, 20:03, John Nagle <(E-Mail Removed)> wrote:

> * * *Python doesn't have immutable objects as a general concept, but
> it may be headed in that direction. *There was some fooling around
> with an "immmutability API" associated with NumPy back in 2007, but
> that was removed. *As more immutable types are added, a more general
> approach may be useful.


I one did a test of NumPy's mutable arrays against Matlab's immutable
arrays on D4 wavelet transforms. On an array of 64 MB of double
precision floats, the Python/NumPy version was faster by an order of
magnitude. On the other hand, immutable arrays does make
multithreading easier. They are particularly interesting for GPGPUs
(OpenCL/CUDA) where multithreading is pervasive. Also they allow
removal of temporary arrays, which are created by NumPy's binary
operators.










 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      09-05-2009
On Fri, 04 Sep 2009 19:23:06 -0700, sturlamolden wrote:

> I one did a test of NumPy's mutable arrays against Matlab's immutable
> arrays on D4 wavelet transforms. On an array of 64 MB of double
> precision floats, the Python/NumPy version was faster by an order of
> magnitude.


Is the difference because of mutability versus immutability, or because
of C code in Numpy versus Matlab code? Are you comparing bananas and
pears?

Without knowing what the test consisted of, and what it actually
measures, this comparison is meaningless.


--
Steven
 
Reply With Quote
 
sturlamolden
Guest
Posts: n/a
 
      09-05-2009
On 5 Sep, 05:12, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> Is the difference because of mutability versus immutability, or because
> of C code in Numpy versus Matlab code? Are you comparing bananas and
> pears?



It consisted of something like this


import numpy

def D4_Transform(x, s1=None, d1=None, d2=None):
C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:
d1 = numpy.zeros(x.size/2)
s1 = numpy.zeros(x.size/2)
d2 = numpy.zeros(x.size/2)
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]
s1[:-1] = even[:-1] + C2*d1[:-1] + C3*d1[1:]
s1[-1] = even[-1] + C2*d1[-1] + C3*d1[0]
d2[0] = d1[0] + s1[-1]
d2[1:] = d1[1:] + s1[:-1]
even[:] = C5 * s1[:]
odd[:] = C4 * d2[:]
if x.size > 2:
D4_Transform(even,s1[0:even.size/2],
d1[0:even.size/2],d2[0:even.size/2])


against Matlab:

function x = D4_Transform(x)
C1 = 1.7320508075688772;
C2 = 0.4330127018922193;
C3 = -0.066987298107780702;
C4 = 0.51763809020504137;
C5 = 1.9318516525781364;
s1 = zeros(ceil(size(x)/2));
d1 = zeros(ceil(size(x)/2));
d2 = zeros(ceil(size(x)/2));
odd = x(2:2:end);
even = x(1:2:end-1);
d1( = odd - C1*even;
s1(1:end-1) = even(1:end-1) + C2*d1(1:end-1) + C3*d1(2:end);
s1(end) = even(end) + C2*d1(end) + C3*d1(1);
d2(1) = d1(1) + s1(end);
d2(2:end) = d1(2:end) + s1(1:end-1);
x(1:2:end-1) = C5*s1;
x(2:2:end) = C4*d2;
if (length(x) > 2)
x(1:2:end-1) = D4_Transform(x(1:2:end-1));
end


I wasn't comparing bananas against pears. Mathworks informed me they
were using my code to investigate why Matlab was showing such slow-
downs. I am not sure what they found out, eventially. I have also
wondered if immutability vs. mutability was the reason, as NumPy
generates temporary arrays. But I have not found a better explanation
either. Anyhow, ~30 seconds for Matlab vs. ~3 seconds for Python is a
major difference. (After I did this test, Matlab has acquired a JIT
compiler which might change the timings. I haven't tested as I don't
have access to it.)


Sturla Molden



 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      09-05-2009
On Fri, 04 Sep 2009 06:36:59 -0700, Adam Skutt wrote:

> Nope, preventing mutation of the objects themselves is not enough. You
> also have to forbid variables from being rebound (pointed at another
> object). Consider this simple example:
>
> ---------- Thread 1 ---------- | ---------- Thread 2 ----------
> a = "Foo"
> spawn Thread 2
> a = "Bar" print "thread 2: %s" % a
> print "thread 1: %s" % a
>
> You could see (ignoring the fact the output isn't ordered):
> "thread 1: Bar"
> "thread 2: Foo"
> or:
> "thread 1: Bar"
> "thread 2: Bar"
>
> so the fact "Foo" and "Bar" are immutable isn't enough to solve the
> problem.


This is a side-effect of writing code that relies on global variables.
Global variables are generally a bad idea. Global constants are fine.



> The variables themselves, since they obey pointer semantics,


What do you mean by "variables"? Do you mean names?

What are pointer semantics?

> must also be forbidden from being reseated (i.e., they must be
> references in the C++ sense or become 'T const * const' pointers).


Assuming you mean names must be forbidden from being rebound, no,
incorrect. It's only names which are shared between both threads which
must not be re-bound. If you have names local to the thread, you can
change them all you want without affecting any other thread.



--
Steven
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Immutability and Python andrea crotti Python 8 11-08-2012 07:38 AM
Re: Immutability and Python andrea crotti Python 0 10-29-2012 03:44 PM
Python String Immutability Broken! Roy Smith Python 2 08-25-2008 02:41 AM
Easy immutability in python? Terry Hancock Python 3 03-04-2006 08:17 PM
Re: Easy immutability in python? Terry Hancock Python 2 03-04-2006 08:17 PM



Advertisments