Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Towards faster Python implementations - theory

Reply
Thread Tools

Towards faster Python implementations - theory

 
 
John Nagle
Guest
Posts: n/a
 
      05-08-2007
Some faster Python implementations are under development.
JPython has been around for a while, and PyPy and ShedSkin
continue to move forward. It's worth thinking about what slows
down Python implementations.

It isn't the dynamism, really. As others have pointed out
in the Python literature, most of the time, the more elaborate
dynamic features aren't being used for any given variable or
object. If code has something like "x = 1.0", the odds are that
"x" is going to stay a floating point number, and not suddenly turn
into a list or an object reference. The type inference of Shed Skin
builds on that assumption, adding some restrictions about changing of
variable types.

The Shed Skin effort indicates that explicit typing, via 'decorators'
or otherwise, isn't really necessary. What's necessary is the avoidance
of "surprises" in the language. In this context, a "surprise" is
the use of a dynamic feature in a way that can't be seen at compile time.

A typical "surprise" would be the use of "setattr" on an object from
outside the compilation unit that defines the object. Within a module,
"setattr" on an object in that module is no problem; the compiler can see
it and generate the extra machinery needed to make an object dynamically
alterable at run time. But if an object doesn't need that extra machinery
and associated dictionary, it's a big win to discard the excess baggage
and use a simpler fixed-size representation, comparable to a C struct,
for the object.

On the typing front, the neatest way to express typing is via
initialization. With the Shed Skin restrictions, if all variables are
initialized before use (preferably in __init__), there's no need to
maintain an "undefined" flag for a variable. And, of course, if
the type of a varaible is simple and can't change, it doesn't have to
be "boxed", (enclosed in an object) which is a big win.

The point here is that we don't need language changes or declarations
to make Python much faster. All we need are a few restrictions that
insure that, when you're doing something unusual, the compiler can
tell.

John Nagle
 
Reply With Quote
 
 
 
 
Paul Boddie
Guest
Posts: n/a
 
      05-08-2007
On 8th May, 17:53, John Nagle <(E-Mail Removed)> wrote:
> Some faster Python implementations are under development.
> JPython


Ahem: Jython!

> has been around for a while, and PyPy and ShedSkin
> continue to move forward. It's worth thinking about what slows
> down Python implementations.


It's the dynamicity! But things like clever memory allocation can
pay dividends, too, and I wouldn't be surprised if this explained
Python's better-than-expected showing when compared to other languages
- such as that comparison you provoked with those claims of superior
JavaScript performance.

> It isn't the dynamism, really.


In theory, no, but in practice CPython doesn't optimise away the
dynamism. Recent experiments with method caches seem to have shown
modest performance improvements, and I'd guess that such things are
fairly established in other dynamic language implementations.

> As others have pointed out
> in the Python literature, most of the time, the more elaborate
> dynamic features aren't being used for any given variable or
> object. If code has something like "x = 1.0", the odds are that
> "x" is going to stay a floating point number, and not suddenly turn
> into a list or an object reference. The type inference of Shed Skin
> builds on that assumption, adding some restrictions about changing of
> variable types.


The problem here, amongst many others, is knowing for sure whether the
more elaborate features have been avoided. Approaches which attempt to
avoid knowing such things include just-in-time compilation (you avoid
knowing in advance) and type declarations (you give up thinking about
whether it's possible and have the programmer do all the work).

> The Shed Skin effort indicates that explicit typing, via 'decorators'
> or otherwise, isn't really necessary. What's necessary is the avoidance
> of "surprises" in the language. In this context, a "surprise" is
> the use of a dynamic feature in a way that can't be seen at compile time.


I concur with your assessment of the supposed necessity of explicit
typing. However, you might be surprised as to what constitutes a
"surprise" in Python...

> A typical "surprise" would be the use of "setattr" on an object from
> outside the compilation unit that defines the object. Within a module,
> "setattr" on an object in that module is no problem; the compiler can see
> it and generate the extra machinery needed to make an object dynamically
> alterable at run time. But if an object doesn't need that extra machinery
> and associated dictionary, it's a big win to discard the excess baggage
> and use a simpler fixed-size representation, comparable to a C struct,
> for the object.


You don't even need to bring out setattr to make the inference
activity a difficult one. Even straightforward attribute access needs
to be repeatedly checked to see if the target of a normal attribute
assignment or query has changed. Traditionally, people have shown some
trivial function in isolation...

def f(x):
return x.a

....and said, "We don't know anything! It's all impossible!" But
context is everything, as you know, and whole program analysis is the
only way to go with the aforementioned goals in mind. What if the
parameter to the above itself comes from attribute access?

def g(y):
return f(y.b)

You can descend into some fairly demanding situations with respect to
keeping track of all the possibilities.

> On the typing front, the neatest way to express typing is via
> initialization. With the Shed Skin restrictions, if all variables are
> initialized before use (preferably in __init__), there's no need to
> maintain an "undefined" flag for a variable. And, of course, if
> the type of a varaible is simple and can't change, it doesn't have to
> be "boxed", (enclosed in an object) which is a big win.


I'm fairly sure, although my intuition unfortunately doesn't provide a
ready example right now, that typing via initialisation, whilst an
important tool, may either impose unacceptable restrictions if
enforced too rigidly or limit the precision that one might desire in
determining type information in the resulting system. But it is a
worthwhile objective to identify fixed-size structures and unboxed
values, in my opinion.

> The point here is that we don't need language changes or declarations
> to make Python much faster. All we need are a few restrictions that
> insure that, when you're doing something unusual, the compiler can
> tell.


Agreed. And I don't buy into the negative "lesser Python" labelling of
such work, either. People seem to have forgotten how to use older,
conservative Python features, preferring to show off with metaclasses
and closures even for problems that could be solved using simple
classes and objects. If that's "greater Python" then call me a
minimalist!

Paul

 
Reply With Quote
 
 
 
 
Marc 'BlackJack' Rintsch
Guest
Posts: n/a
 
      05-08-2007
In <(E-Mail Removed). com>, Paul Boddie
wrote:

>> On the typing front, the neatest way to express typing is via
>> initialization. With the Shed Skin restrictions, if all variables are
>> initialized before use (preferably in __init__), there's no need to
>> maintain an "undefined" flag for a variable. And, of course, if
>> the type of a varaible is simple and can't change, it doesn't have to
>> be "boxed", (enclosed in an object) which is a big win.


A variable?

Maybe that last sentence should start with: And, of course, if the type of
objects bound to a name won't change…

> I'm fairly sure, although my intuition unfortunately doesn't provide a
> ready example right now, that typing via initialisation, whilst an
> important tool, may either impose unacceptable restrictions if
> enforced too rigidly or limit the precision that one might desire in
> determining type information in the resulting system.


I often bind a name to `None` first if it is going to be bound to it's
"real" value later on. So this initialization doesn't say anything about
the type(s) that will be bound to it later.

I don't see how this type inference for static types will work unless some
of the dynamism of the language will get restricted. But is this really
necessary? Isn't a JIT compiler and maybe type hinting good enough? By
type hinting I really mean just recommendations from the programmer. So
you can say this name should be an `int` and the JIT compiler produces
code in advance, but it's okay to pass another object instead.

Ciao,
Marc 'BlackJack' Rintsch
 
Reply With Quote
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      05-08-2007
John Nagle a écrit :
> Some faster Python implementations are under development.
> JPython has been around for a while,


s/JP/J/

And FWIW, I'm not sure Jython is really faster than CPython...
 
Reply With Quote
 
John Nagle
Guest
Posts: n/a
 
      05-09-2007
Marc 'BlackJack' Rintsch wrote:

> I don't see how this type inference for static types will work unless some
> of the dynamism of the language will get restricted. But is this really
> necessary? Isn't a JIT compiler and maybe type hinting good enough?


Not necessarily. One of the more powerful optimizations is to optimize
reference count updates. Often, you can hoist reference count updates
out of loops, which is a big win. But to do that, you need to be sure
that the code executed by the loop won't change unexpectedly.

My point is that while all the dynamism in Python is occasionally
useful, most of the time for most of the variables most of it isn't
being used. If we can discard the excess baggage, unnecessary
dictionary lookups, and unnecessary reference count updates a
large fraction of the time, it's a big win. This requires
"surprise-free" language semantics, so that when something unusual
is going on, it's visibly reflected in the code.

Surprise-free semantics also make code easier to debug.


John Nagle
 
Reply With Quote
 
Alex Martelli
Guest
Posts: n/a
 
      05-09-2007
Bruno Desthuilliers <(E-Mail Removed)> wrote:

> John Nagle a écrit :
> > Some faster Python implementations are under development.
> > JPython has been around for a while,

>
> s/JP/J/


These days, yes, but it WAS originally called JPython (it was renamed to
Jython later). So, "has been around a while" is an appropriate context
for using the good old name.


> And FWIW, I'm not sure Jython is really faster than CPython...


Definitely not, last I measured. Not IronPython either, overall (though
it's much better than Jython and does get to beat CPython on some
specific benchmarks).


Alex
 
Reply With Quote
 
Kay Schluehr
Guest
Posts: n/a
 
      05-09-2007
On May 9, 1:25 pm, John Nagle <(E-Mail Removed)> wrote:
> Marc 'BlackJack' Rintsch wrote:
> > I don't see how this type inference for static types will work unless some
> > of the dynamism of the language will get restricted. But is this really
> > necessary? Isn't a JIT compiler and maybe type hinting good enough?

>
> Not necessarily. One of the more powerful optimizations is to optimize
> reference count updates. Often, you can hoist reference count updates
> out of loops, which is a big win. But to do that, you need to be sure
> that the code executed by the loop won't change unexpectedly.


The advantage of having a JIT is just that it can record data at
runtime and respond flexible to them. It doesn't run into global
static analysis problems mentioned by Paul. A "semi-dynamical"
compromise would mean to use a profile of samples of runtime data and
assert that they reflect typical" behaviour. Then the system needs an
ability to fall back to usual bytecode interpretation. Psyco does that
by analyzing the next opcode and a very clever dispatch mechanism.

A more code oriented, profile driven approach would factor source into
natively compilable parts and those that have to be bytecode
interpreted. I wonder whether bridges to Pyrex, Boost.Python or the
celerid bridge to D could be used and what the performance penalties
are for all the argument/return value wrapping and unwrapping.
Unfortunately ShedSkin lacks CPython integration. We talked about this
here recently.

Kay

 
Reply With Quote
 
Hendrik van Rooyen
Guest
Posts: n/a
 
      05-09-2007
"John Nagle" <(E-Mail Removed)> wrote:

8<---------------- summary of existing work and thinking ------------------

> The point here is that we don't need language changes or declarations
> to make Python much faster. All we need are a few restrictions that
> insure that, when you're doing something unusual, the compiler can
> tell.
>


I am relatively new on this turf, and from what I have seen so far, it
would not bother me at all to tie a name's type to its first use, so that
the name can only be bound to objects of the same type as the type
of the object that it was originally bound to.

But maybe I am missing the point of dynamism.

Would an implementation of the above break lots of stuff in practice?

It seems to me that it could have the effect of a declaration without
the wart of actually doing it.

- Hendrik


 
Reply With Quote
 
Paul Boddie
Guest
Posts: n/a
 
      05-09-2007
On 9 May, 08:09, "Hendrik van Rooyen" <(E-Mail Removed)> wrote:
>
> I am relatively new on this turf, and from what I have seen so far, it
> would not bother me at all to tie a name's type to its first use, so that
> the name can only be bound to objects of the same type as the type
> of the object that it was originally bound to.


But it's interesting to consider the kinds of names you could restrict
in this manner and what the effects would be. In Python, the only kind
of name that can be considered difficult to arbitrarily modify "at a
distance" - in other words, from outside the same scope - are locals,
and even then there are things like closures and perverse
implementation-dependent stack hacks which can expose local namespaces
to modification, although any reasonable "conservative Python"
implementation would disallow the latter.

In a local namespace you could restrict names in this way, although
I'd argue that with the limitations on locals, you don't gain as much
as you would by restricting other names similarly. However, by
restricting other kinds of names (eg. instance attributes) you have to
start thinking about polymorphism: what if attribute x on instances of
class C can have different types? If you aren't careful, you've
introduced interfaces as the primary mechanism permitting some kind of
polymorphism.

Paul

 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      05-09-2007

"Hendrik van Rooyen" <(E-Mail Removed)> wrote in message
news:013d01c79210$5e441280$03000080@hendrik...
| I am relatively new on this turf, and from what I have seen so far, it
| would not bother me at all to tie a name's type to its first use, so that
| the name can only be bound to objects of the same type as the type
| of the object that it was originally bound to.
|
| But maybe I am missing the point of dynamism.
|
| Would an implementation of the above break lots of stuff in practice?

For function local variables, if you mean 'originally bound to' in the
current call invocation, that would sometimes be ok (and that is sort of
what Psycho does). But if you mean in the original binding in the first
call invocation, then that would cripple many functions.


tjr



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
The spinoza papers: towards a theory of progress reporting spinoza1111 C Programming 1 05-16-2008 11:31 AM
The spinoza papers: towards a theory of progress reporting spinoza1111 C Programming 0 05-16-2008 10:45 AM
Looking for Python Finite State Machine Implementations Leonard J. Reder Python 0 06-24-2005 05:22 PM
Python and XForms: existing implementations? Christian Wilcox Python 11 05-21-2004 05:50 PM
New book on Information Theory and State-of-the-art Coding Theory David MacKay VOIP 0 11-05-2003 10:42 PM



Advertisments