Velocity Reviews > Re: Python speed vs csharp

# Re: Python speed vs csharp

David M. Cooke
Guest
Posts: n/a

 07-31-2003
At some point, Mike <(E-Mail Removed)> wrote:
> Here's the chunk of code that I'm spending most of my time executing:
>
> # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
> # Fifth order approximation. |error| <= 1.5e-7 for all x
> #
> def erfc( x ):
> p = 0.3275911
> a1 = 0.254829592
> a2 = -0.284496736
> a3 = 1.421413741
> a4 = -1.453152027
> a5 = 1.061405429
>
> t = 1.0 / (1.0 + p*float(x))
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> return erfcx

With python2.2, a million iterations takes 9.93 s on my machine.
With python2.3, 8.13 s.

Replacing float(x) with just x (multiplying by p (a float) is enough to
coerce x to a float), with python 2.3 I'm down to 6.98 s. Assigning
math.exp to a global variable exp cuts this down to 6.44 s:

exp = math.exp
def erfc(x):
[...]
t = 1.0 / (1.0 + p*x)
erfcx = ... * exp(-x**2)
return erfcx

You can go a little further and look up ways of optimising the
polynomial evaluation -- Horner's method is a good, general-purpose
technique, but you can get faster if you're willing to put some effort
into it. I believe Ralston and Rabonwitz's numerical analysis book has
a discussion on this (sorry, I don't have the complete reference; the
book's at home).

That's about the best you can do in pure python.

Now look at how you're calling this routine. Can you call it on a
bunch of numbers at once? Using Numeric arrays (http://numpy.org/),
calling the above routine on an array of a million numbers takes
1.6 s. Calling it 1000 times on an array of 1000 numbers takes 0.61 s.

Using erfc from scipy.special, this takes 0.65 s on a array of a
million numbers. Scipy (from http://scipy.org/) is a library of
scientific tools, supplementing Numeric. I believe the routine it uses
for erfc comes from the Cephes library (http://netlib.org/), which has
a error < 3.4e-14 for x in (-13,0). Calling it 1000 times on an array
of 1000 numbers takes 0.43 s.

(Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
fast as once on an array of 1000*1000 numbers.)

scipy.special.erfc on moderate-sized arrays is 18.9x faster. That's
pretty good

For numerical codes in Python, I *strongly* suggest that you use
Numeric (or numarray, it's successor). It's _much_ better to pass 1000
numbers around as a aggregate, instead of 1 number 1000 times.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

Terry Reedy
Guest
Posts: n/a

 08-01-2003

"David M. Cooke" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed).. .
> At some point, Mike <(E-Mail Removed)> wrote:
> > Here's the chunk of code that I'm spending most of my time

executing:
> >
> > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec.

7.1.26)
> > # Fifth order approximation. |error| <= 1.5e-7 for all x
> > #
> > def erfc( x ):
> > p = 0.3275911
> > a1 = 0.254829592
> > a2 = -0.284496736
> > a3 = 1.421413741
> > a4 = -1.453152027
> > a5 = 1.061405429
> >
> > t = 1.0 / (1.0 + p*float(x))
> > erfcx = ( (a1 + (a2 + (a3 +
> > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> > return erfcx

Does inlining the constants give any speedup? or is local lookup so
fast that it does not matter?

TJR

David M. Cooke
Guest
Posts: n/a

 08-01-2003
At some point, "Terry Reedy" <(E-Mail Removed)> wrote:

> "David M. Cooke" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed).. .
>> At some point, Mike <(E-Mail Removed)> wrote:
>> > Here's the chunk of code that I'm spending most of my time

> executing:
>> >
>> > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec.

> 7.1.26)
>> > # Fifth order approximation. |error| <= 1.5e-7 for all x
>> > #
>> > def erfc( x ):
>> > p = 0.3275911
>> > a1 = 0.254829592
>> > a2 = -0.284496736
>> > a3 = 1.421413741
>> > a4 = -1.453152027
>> > a5 = 1.061405429
>> >
>> > t = 1.0 / (1.0 + p*float(x))
>> > erfcx = ( (a1 + (a2 + (a3 +
>> > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
>> > return erfcx

>
> Does inlining the constants give any speedup? or is local lookup so
> fast that it does not matter?

Should've thought of that too . It runs in 6.07 s now (my previous
version was 6.45 s, and the unmodified version above was 8.23 s).

Here's my best version:

exp = math.exp
def erfc( x ):
t = 1.0 + 0.3275911*x
return ( (0.254829592 + (
(1.421413741 + (-1.453152027 + 1.061405429/t)/t)/t - 0.284496736)
/t)/t ) * exp(-(x**2))

The reversal of the order of the a2 evaluation is because looking at
the bytecodes (using the dis module) it turns out that -0.284 was
being stored as 0.284, then negated. However, -1.45 is stored as that.
This runs in 5.77 s. You could speed that up a little if you're
willing to manipulate the bytecode directly

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

Brendan Hahn
Guest
Posts: n/a

 08-01-2003
(E-Mail Removed) (David M. Cooke) wrote:
>(Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
>fast as once on an array of 1000*1000 numbers.)

Probably cache effects.

--
brendan DOT hahn AT hp DOT com

Andrew Dalke
Guest
Posts: n/a

 08-01-2003
David M. Cooke:
> Here's my best version:
>
> exp = math.exp
> def erfc( x ):

Change that to

def erfc(x, exp = math.exp)

so there's a local namespace lookup instead of a global one. This
is considered a hack.

Andrew
http://www.velocityreviews.com/forums/(E-Mail Removed)