Velocity Reviews > Monte Carlo Method and pi

# Monte Carlo Method and pi

beliavsky@aol.com
Guest
Posts: n/a

 07-12-2004
Dan Christensen <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> something like:
>
> from random import random
> from math import sqrt
>
> def v1(n = 500000):
> rand = random
> sqr = sqrt
> sm = 0
> for i in xrange(n):
> sm += sqr(1-rand()**2)
> return 4*sm/n
>
> This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
> about 0.29s with psyco (using Python 2.3.4).
>
> One of the great things about python is that it's possible (and fun)
> to move loops into the C level, e.g. with this implementation:
>
> def v2(n = 500000):
> a = numarray.random_array.uniform(0, 1, n)
> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
>
> which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
simulation runs in about 0.09s (measured using timethis.exe on Windows
XP), REGARDLESS of whether the calcuation is done with a loop or an
array. If 5e6 instead of 5e5 random numbers are used, the time taken
is about 0.4s, and the scaling with #iterations is about linear beyond
that. Here are the codes, which were compiled with the -optimize:4
option using Compaq Visual Fortran 6.6c.

program xpi_sim
! compute pi with simulation, using an array
implicit none
integer, parameter :: n = 500000
real (kind=kind(1.0d0)) :: xx(n),pi
call random_seed()
call random_number(xx)
pi = 4*sum(sqrt(1-xx**2))/n
print*,pi
end program xpi_sim

program xpi_sim
! compute pi with simulation, using a loop
implicit none
integer :: i
integer, parameter :: n = 500000
real (kind=kind(1.0d0)) :: xx,pi
call random_seed()
do i=1,n
call random_number(xx)
pi = pi + sqrt(1-xx**2)
end do
pi = 4*pi/n
print*,pi
end program xpi_sim

Tim Hochberg
Guest
Posts: n/a

 07-12-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Dan Christensen <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
>
>>something like:
>>
>>from random import random
>>from math import sqrt
>>
>>def v1(n = 500000):
>> rand = random
>> sqr = sqrt
>> sm = 0
>> for i in xrange(n):
>> sm += sqr(1-rand()**2)
>> return 4*sm/n
>>
>>This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
>>about 0.29s with psyco (using Python 2.3.4).

I can more than double the speed of this under psyco by replacing **2
with x*x. I have inside information that pow is extra slow on psyco(*)
plus, as I understand it, x*x is preferred over **2 anyway.

from math import sqrty
from random import random

def v3(n = 500000):
sm = 0
for i in range(n):
x = random()
sm += sqrt(1-x*x)
return 4*sm/n
psyco.bind(v3)

(*) Psyco's support for floating point operations is considerable slower
than its support for integer operations. The latter are optimized all
the way to assembly when possible, but the latter are, at best,
optimized into C function calls that perform the operation. Thus there
is always some function call overhead. Fixing this would require someone
who groks both the guts of Psyco and the intracies of x86 floating point
machine code to step forward. In the case of pow, I believe that there
is always callback into the python interpreter, so it's extra slow.

>>
>>One of the great things about python is that it's possible (and fun)
>>to move loops into the C level, e.g. with this implementation:
>>
>>def v2(n = 500000):
>> a = numarray.random_array.uniform(0, 1, n)
>> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
>>
>>which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

The same trick (replacing a**2 with a*a) also double the speed of this
version.

def v4(n = 500000):
a = numarray.random_array.uniform(0, 1, n)
return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n

>
> For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
> simulation runs in about 0.09s (measured using timethis.exe on Windows
> XP), REGARDLESS of whether the calcuation is done with a loop or an
> array. If 5e6 instead of 5e5 random numbers are used, the time taken
> is about 0.4s, and the scaling with #iterations is about linear beyond
> that. Here are the codes, which were compiled with the -optimize:4
> option using Compaq Visual Fortran 6.6c.

For yet more comparison, here are times on a 2.4 GHz P4, timed with
timeit (10 loops):

v1 0.36123797857
v2 0.289118589094
v3 0.158556464579
v4 0.148470238536

If one takes into the accout the speed difference of the two CPUs this
puts the both the numarray and psyco solutions within about 50% of the
Fortran solution, which I'm impressed by. Of course, the Fortran code
uses x**2 instead of x*x, so it too, might be sped by some tweaks.

-tim

beliavsky@aol.com
Guest
Posts: n/a

 07-12-2004

Tim Hochberg <(E-Mail Removed)> wrote:
>(E-Mail Removed) wrote:
>> Dan Christensen <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
>>

is
>>>something like:
>>>
>>>from random import random
>>>from math import sqrt
>>>
>>>def v1(n = 500000):
>>> rand = random
>>> sqr = sqrt
>>> sm = 0
>>> for i in xrange(n):
>>> sm += sqr(1-rand()**2)
>>> return 4*sm/n
>>>
>>>This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
>>>about 0.29s with psyco (using Python 2.3.4).

>
>I can more than double the speed of this under psyco by replacing **2
>with x*x. I have inside information that pow is extra slow on psyco(*)
>plus, as I understand it, x*x is preferred over **2 anyway.
>
>from math import sqrty
>from random import random
>
>def v3(n = 500000):
> sm = 0
> for i in range(n):
> x = random()
> sm += sqrt(1-x*x)
> return 4*sm/n
>psyco.bind(v3)
>
>
>(*) Psyco's support for floating point operations is considerable slower

>than its support for integer operations. The latter are optimized all
>the way to assembly when possible, but the latter are, at best,
>optimized into C function calls that perform the operation. Thus there
>is always some function call overhead. Fixing this would require someone

>who groks both the guts of Psyco and the intracies of x86 floating point

>machine code to step forward. In the case of pow, I believe that there
>is always callback into the python interpreter, so it's extra slow.
>
>>>
>>>One of the great things about python is that it's possible (and fun)
>>>to move loops into the C level, e.g. with this implementation:
>>>
>>>def v2(n = 500000):
>>> a = numarray.random_array.uniform(0, 1, n)
>>> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
>>>
>>>which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

>
>The same trick (replacing a**2 with a*a) also double the speed of this
>version.
>
>def v4(n = 500000):
> a = numarray.random_array.uniform(0, 1, n)
> return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n
>
>>
>> For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
>> simulation runs in about 0.09s (measured using timethis.exe on Windows
>> XP), REGARDLESS of whether the calcuation is done with a loop or an
>> array. If 5e6 instead of 5e5 random numbers are used, the time taken
>> is about 0.4s, and the scaling with #iterations is about linear beyond
>> that. Here are the codes, which were compiled with the -optimize:4
>> option using Compaq Visual Fortran 6.6c.

>
>For yet more comparison, here are times on a 2.4 GHz P4, timed with
>timeit (10 loops):
>
>v1 0.36123797857
>v2 0.289118589094
>v3 0.158556464579
>v4 0.148470238536
>
>If one takes into the accout the speed difference of the two CPUs this
>puts the both the numarray and psyco solutions within about 50% of the
>Fortran solution, which I'm impressed by. Of course, the Fortran code
>uses x**2 instead of x*x, so it too, might be sped by some tweaks.
>

With Compaq Visual Fortran, the speed with x**2 and x*x are the same. Recently
on comp.lang.fortran, a Fortran expert opined that a compiler that did not
optimize the calculation of integer powers to be as fast as the hand-coded
equivalent would not be taken seriously.

Having to write out integer powers in Python is awkward, although the performance
improvement may be worth it. The code x*x*x*x is less legible than x**4,
especially if 'x' is replaced 'longname[1,2,3]'.

Often, what you want to compute powers of is itself an expression. Is Python
with Psyco able to run code such as

z = sin(x)*sin(x)

as fast as

y = sin(x)
z = y*y ?

Does it make two calls to sin()? There is a performance hit for 'y = sin(x)*sin(x)'
when not using Psyco.

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Dan Christensen
Guest
Posts: n/a

 07-13-2004
Tim Hochberg <(E-Mail Removed)> writes:

> I can more than double the speed of this under psyco by replacing **2
> with x*x. I have inside information that pow is extra slow on psyco(*)

Thanks for pointing this out. This change sped up all of the
methods I've tried. It's too bad that Python does optimize this,
but I case pow could be redefined at runtime.

> (*) Psyco's support for floating point operations is considerable
> slower than its support for integer operations. The latter are
> optimized all the way to assembly when possible, but the latter are,
> at best, optimized into C function calls that perform the
> operation.

That's unfortunate.

> Thus there is always some function call overhead. Fixing
> this would require someone who groks both the guts of Psyco and the
> intracies of x86 floating point machine code to step forward.

If someone volunteers, that would be great. I don't know anything
about x86 floating point; is it a real mess?

> If one takes into the accout the speed difference of the two CPUs this
> puts the both the numarray and psyco solutions within about 50% of the
> Fortran solution, which I'm impressed by.

I've now run a bunch of timings of all the methods that have been
proposed (except that I used C instead of Fortran) on one machine,
a 2.66GHz P4. Here are the results, in seconds:

Python 2.3.4 Python 2.4a1

naive loop: 0.657765 0.589741
naive loop psyco: 0.159085
naive loop weave: 0.084898 *
numarray: 0.115775 **
imap lots: 1.359815 0.994505
imap fn: 0.979589 0.758308
custom gen: 0.841540 0.681277
gen expr: 0.694567

naive loop C: 0.040 *

Only one of the tests used psyco. I did run the others with psyco
too, but the times were about the same or slower.

For me, psyco was about four times slower than C, and numarray was
almost 3 times slower. I'm surprised by how close psyco and numarray
were in your runs, and how slow Fortran was in beliavsky's test.
My C program measures user+sys cpu time for the loop itself, but
not start-up time for the program. In any case, getting within
a factor of four of C, with a random number generator that is probably
better, is pretty good!

One really should compare the random number generators more carefully,
since they could take a significant fraction of the time. The lines
with one * use C's rand(). The line with ** uses numarray's random
array function. Does anyone know what random number generator is used
by numarray?

By the way, note how well Python 2.4 performs compared with 2.3.4.
(Psyco, weave, numarray not shown because I don't have them for 2.4.)

I'm still curious about whether it could be possible to get
really fast loops in Python using iterators and expressions like
sum(1 + it1 - 2 * it2), where it1 and it2 are iterators that produce
numbers. Could Python be clever enough to implement that as a
for loop in C with just two or three C function calls in the loop?

Dan

Here's the code I used:

-----pi.py-----
from random import random
from itertools import *
from math import sqrt
from operator import pow, sub, add

from timeTest import *

try:
import weave
haveWeave = True
except:
haveWeave = False

try:
import numarray
import numarray.random_array
haveNumarray = True
except:
haveNumarray = False

def v1(n = 500000):
"naive loop"
rand = random
sqr = sqrt
sm = 0
for i in range(n):
r = rand()
sm += sqr(1-r*r)
return 4*sm/n

def v1weave(n = 500000):
"naive loop weave"

support = "#include <stdlib.h>"

code = """
double sm;
float rnd;
srand(1); // seed random number generator
sm = 0.0;
for(int i=0;i<n;++i) {
rnd = rand()/(RAND_MAX+1.0);
sm += sqrt(1.0-rnd*rnd);
}
return_val = 4.0*sm/n;"""
return weave.inline(code,('n'),support_code=support)

if(haveNumarray):
def v2(n = 500000):
"numarray"
a = numarray.random_array.uniform(0, 1, n)
return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n

def v3(n = 500000):
"imap lots"
rand = random
pw = pow
sqr = sqrt
sq = lambda x: x*x
sm = sum(imap(sqr,
imap(sub, repeat(1),
imap(sq,
imap(lambda dummy: rand(), repeat(None, n))))))
return 4*sm/n

def v4(n = 500000):
"imap fn"
rand = random
pw = pow
sqr = sqrt

def fn(dummy):
r = rand()
return sqr(1-r*r)

sm = sum(imap(fn, repeat(None, n)))
return 4*sm/n

def custom(n):
rand = random
for i in xrange(n):
yield(sqrt(1-rand()**2))

def v5(n = 500000):
"custom gen"
sm = sum(custom(n))
return 4*sm/n

# The commented out psyco runs don't show significant improvement
# (or are slower).

if haveWeave:
v1weave()

timeTest(v1)
timeTestWithPsyco(v1)
if haveWeave:
timeTest(v1weave)
if haveNumarray:
timeTest(v2)
# timeTestWithPsyco(v2)
timeTest(v3)
#timeTestWithPsyco(v3)
timeTest(v4)
#timeTestWithPsyco(v4)
timeTest(v5)
#psyco.bind(custom)
#timeTestWithPsyco(v5)

-----pi2.py----- (only for Python2.4)
from random import random
from math import sqrt

from timeTest import *

# Even if this is not executed, it is parsed, so Python 2.3 complains.

def v6(n = 500000):
"gen expr"
rand = random
sqr = sqrt
sm = sum(sqr(1-rand()**2) for i in xrange(n))
return 4*sm/n

timeTest(v6)

-----timeTest.py-----
import time

try:
import psyco
havePsyco = True
except:
havePsyco = False

def timeTest(f):
s = time.time()
f()
print "%18s: %f" % (f.__doc__, time.time()-s)

def timeTestWithPsyco(f):
if havePsyco:
g = psyco.proxy(f)
g.__doc__ = f.__doc__ + " psyco"
g()
timeTest(g)

-----pi.c-----
#include <stdlib.h>

#include "/home/spin/C/spin_time.h"

int main()
{
double sm;
float rnd;
int i;
int n = 500000;
spin_time start, end;

spin_time_set(start);
srand(1); // seed random number generator
sm = 0.0;

for(i=0;i<n;++i) {
rnd = rand()/(RAND_MAX+1.0);
sm += sqrt(1.0-rnd*rnd);
}
spin_time_set(end);
printf("%f\n", 4.0*sm/n);
printf("%18s: %.3f\n", "naive loop C", spin_time_both(start, end));
}

-----spin_time.h-----
#include <time.h>
#include <sys/times.h>
#include <stdio.h>

#define FACTOR (1.0/100.0)

typedef struct {
clock_t real;
struct tms t;
} spin_time;

#define spin_time_set(st) st.real = times(&(st.t))

// floating point number of seconds:
#define spin_time_both(st_start, st_end) \
((st_end.t.tms_utime-st_start.t.tms_utime)*FACTOR + \
(st_end.t.tms_stime-st_start.t.tms_stime)*FACTOR)

Tim Hochberg
Guest
Posts: n/a

 07-13-2004
(E-Mail Removed) wrote:
> Tim Hochberg <(E-Mail Removed)> wrote:

[HACK]
>>If one takes into the accout the speed difference of the two CPUs this
>>puts the both the numarray and psyco solutions within about 50% of the
>>Fortran solution, which I'm impressed by. Of course, the Fortran code
>>uses x**2 instead of x*x, so it too, might be sped by some tweaks.
>>

>
>
> With Compaq Visual Fortran, the speed with x**2 and x*x are the same. Recently
> on comp.lang.fortran, a Fortran expert opined that a compiler that did not
> optimize the calculation of integer powers to be as fast as the hand-coded
> equivalent would not be taken seriously.

I suppose that's not suprising. In that case, I'm more impressed with
the psyco results than I was. Particularly since Psycos floating point
support is incomplete.

> Having to write out integer powers in Python is awkward, although the performance
> improvement may be worth it. The code x*x*x*x is less legible than x**4,
> especially if 'x' is replaced 'longname[1,2,3]'.

I think that this is something that Psyco could optimize. However,
currently psyco just punts on pow, passing it off to Python.

> Often, what you want to compute powers of is itself an expression. Is Python
> with Psyco able to run code such as
>
> z = sin(x)*sin(x)
>
> as fast as
>
> y = sin(x)
> z = y*y ?

Dunno. I'll check...

> Does it make two calls to sin()? There is a performance hit for 'y = sin(x)*sin(x)'
> when not using Psyco.

It appears that it does not make this optimization. I suppose that's not
suprising since my understanding is that Psyco generates really simple
machine code.

-tim

Tim Hochberg
Guest
Posts: n/a

 07-13-2004
Dan Christensen wrote:

> Tim Hochberg <(E-Mail Removed)> writes:
>
>
>>I can more than double the speed of this under psyco by replacing **2
>>with x*x. I have inside information that pow is extra slow on psyco(*)

>
>
> Thanks for pointing this out. This change sped up all of the
> methods I've tried. It's too bad that Python does optimize this,
> but I case pow could be redefined at runtime.

>>(*) Psyco's support for floating point operations is considerable
>>slower than its support for integer operations. The latter are
>>optimized all the way to assembly when possible, but the latter are,
>>at best, optimized into C function calls that perform the
>>operation.

>
>
> That's unfortunate.

Yes. There's still no psyco support for complex numbers at all, either.
Meaning that they run at standard Python speed. However, I expect that
to change any day now.

>>Thus there is always some function call overhead. Fixing
>>this would require someone who groks both the guts of Psyco and the
>>intracies of x86 floating point machine code to step forward.

>
>
> If someone volunteers, that would be great. I don't know anything
> about x86 floating point; is it a real mess?

From what I understand yes, but I haven't looked at assembly, let alone
machine code for many, many years and I was never particularly adept at
doing anything with it.

>
>>If one takes into the account the speed difference of the two CPUs this
>>puts the both the numarray and psyco solutions within about 50% of the
>>Fortran solution, which I'm impressed by.

>
>
> I've now run a bunch of timings of all the methods that have been
> proposed (except that I used C instead of Fortran) on one machine,
> a 2.66GHz P4. Here are the results, in seconds:
>
> Python 2.3.4 Python 2.4a1
>
> naive loop: 0.657765 0.589741
> naive loop psyco: 0.159085
> naive loop weave: 0.084898 *
> numarray: 0.115775 **
> imap lots: 1.359815 0.994505
> imap fn: 0.979589 0.758308
> custom gen: 0.841540 0.681277
> gen expr: 0.694567
>
> naive loop C: 0.040 *
>
> Only one of the tests used psyco. I did run the others with psyco
> too, but the times were about the same or slower.

FYI, generators are something that is not currently supported by Psyco.
For a more complete list, see
http://psyco.sourceforge.net/psycogu...supported.html. If you
bug^H^H^H, plead with Armin, maybe he'll implement them.

> For me, psyco was about four times slower than C, and numarray was
> almost 3 times slower. I'm surprised by how close psyco and numarray
> were in your runs, and how slow Fortran was in beliavsky's test.
> My C program measures user+sys cpu time for the loop itself, but
> not start-up time for the program. In any case, getting within
> a factor of four of C, with a random number generator that is probably
> better, is pretty good!

I'm using a somewhat old version of numarray, which could be part of it.

>
> One really should compare the random number generators more carefully,
> since they could take a significant fraction of the time. The lines
> with one * use C's rand(). The line with ** uses numarray's random
> array function. Does anyone know what random number generator is used
> by numarray?
>
> By the way, note how well Python 2.4 performs compared with 2.3.4.
> (Psyco, weave, numarray not shown because I don't have them for 2.4.)
>
> I'm still curious about whether it could be possible to get
> really fast loops in Python using iterators and expressions like
> sum(1 + it1 - 2 * it2), where it1 and it2 are iterators that produce
> numbers. Could Python be clever enough to implement that as a
> for loop in C with just two or three C function calls in the loop?

That would require new syntax, would it not? That seems unlikely. It
seems that in 2.4, sum(1+i1-2*i2 for (i1, i2) in izip(i1, i2)) would
work. It also seems that with sufficient work psyco could be made to
optimize that in the manner you suggest. I wouldn't hold your breath though.

-tim

>
> Dan
>
> Here's the code I used:
>
> -----pi.py-----
> from random import random
> from itertools import *
> from math import sqrt
> from operator import pow, sub, add
>
> from timeTest import *
>
> try:
> import weave
> haveWeave = True
> except:
> haveWeave = False
>
> try:
> import numarray
> import numarray.random_array
> haveNumarray = True
> except:
> haveNumarray = False
>
> def v1(n = 500000):
> "naive loop"
> rand = random
> sqr = sqrt
> sm = 0
> for i in range(n):
> r = rand()
> sm += sqr(1-r*r)
> return 4*sm/n
>
> def v1weave(n = 500000):
> "naive loop weave"
>
> support = "#include <stdlib.h>"
>
> code = """
> double sm;
> float rnd;
> srand(1); // seed random number generator
> sm = 0.0;
> for(int i=0;i<n;++i) {
> rnd = rand()/(RAND_MAX+1.0);
> sm += sqrt(1.0-rnd*rnd);
> }
> return_val = 4.0*sm/n;"""
> return weave.inline(code,('n'),support_code=support)
>
> if(haveNumarray):
> def v2(n = 500000):
> "numarray"
> a = numarray.random_array.uniform(0, 1, n)
> return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n
>
> def v3(n = 500000):
> "imap lots"
> rand = random
> pw = pow
> sqr = sqrt
> sq = lambda x: x*x
> sm = sum(imap(sqr,
> imap(sub, repeat(1),
> imap(sq,
> imap(lambda dummy: rand(), repeat(None, n))))))
> return 4*sm/n
>
> def v4(n = 500000):
> "imap fn"
> rand = random
> pw = pow
> sqr = sqrt
>
> def fn(dummy):
> r = rand()
> return sqr(1-r*r)
>
> sm = sum(imap(fn, repeat(None, n)))
> return 4*sm/n
>
> def custom(n):
> rand = random
> for i in xrange(n):
> yield(sqrt(1-rand()**2))
>
> def v5(n = 500000):
> "custom gen"
> sm = sum(custom(n))
> return 4*sm/n
>
> # The commented out psyco runs don't show significant improvement
> # (or are slower).
>
> if haveWeave:
> v1weave()
>
> timeTest(v1)
> timeTestWithPsyco(v1)
> if haveWeave:
> timeTest(v1weave)
> if haveNumarray:
> timeTest(v2)
> # timeTestWithPsyco(v2)
> timeTest(v3)
> #timeTestWithPsyco(v3)
> timeTest(v4)
> #timeTestWithPsyco(v4)
> timeTest(v5)
> #psyco.bind(custom)
> #timeTestWithPsyco(v5)
>
> -----pi2.py----- (only for Python2.4)
> from random import random
> from math import sqrt
>
> from timeTest import *
>
> # Even if this is not executed, it is parsed, so Python 2.3 complains.
>
> def v6(n = 500000):
> "gen expr"
> rand = random
> sqr = sqrt
> sm = sum(sqr(1-rand()**2) for i in xrange(n))
> return 4*sm/n
>
> timeTest(v6)
>
> -----timeTest.py-----
> import time
>
> try:
> import psyco
> havePsyco = True
> except:
> havePsyco = False
>
> def timeTest(f):
> s = time.time()
> f()
> print "%18s: %f" % (f.__doc__, time.time()-s)
>
> def timeTestWithPsyco(f):
> if havePsyco:
> g = psyco.proxy(f)
> g.__doc__ = f.__doc__ + " psyco"
> g()
> timeTest(g)
>
> -----pi.c-----
> #include <stdlib.h>
>
> #include "/home/spin/C/spin_time.h"
>
> int main()
> {
> double sm;
> float rnd;
> int i;
> int n = 500000;
> spin_time start, end;
>
> spin_time_set(start);
> srand(1); // seed random number generator
> sm = 0.0;
>
> for(i=0;i<n;++i) {
> rnd = rand()/(RAND_MAX+1.0);
> sm += sqrt(1.0-rnd*rnd);
> }
> spin_time_set(end);
> printf("%f\n", 4.0*sm/n);
> printf("%18s: %.3f\n", "naive loop C", spin_time_both(start, end));
> }
>
> -----spin_time.h-----
> #include <time.h>
> #include <sys/times.h>
> #include <stdio.h>
>
> #define FACTOR (1.0/100.0)
>
> typedef struct {
> clock_t real;
> struct tms t;
> } spin_time;
>
> #define spin_time_set(st) st.real = times(&(st.t))
>
> // floating point number of seconds:
> #define spin_time_both(st_start, st_end) \
> ((st_end.t.tms_utime-st_start.t.tms_utime)*FACTOR + \
> (st_end.t.tms_stime-st_start.t.tms_stime)*FACTOR)

Paul Rubin
Guest
Posts: n/a

 07-13-2004
Tim Hochberg <(E-Mail Removed)> writes:
> > If someone volunteers, that would be great. I don't know anything
> > about x86 floating point; is it a real mess?

>
> From what I understand yes, but I haven't looked at assembly, let
> alone machine code for many, many years and I was never particularly
> adept at doing anything with it.

x86 floating point is a simple stack-based instruction set (like an HP
calculator), plus some hacks that let you improve efficiency by
shuffling the elements of the stack around in parallel with doing
other operations (requiring careful instruction scheduling to use
effectively), plus several completely different and incompatible
vector-register oriented instruction sets (SSE2, 3DNow!) that are only
on some processor models and are sort of a bag on the side of the x86.
So x86 floating point is a mess in the sense that generating really
optimal code on every cpu model requires understanding all this crap.
But if you just use the basic stack-based stuff, it's pretty simple to
generate code for, and that code will certainly outperform interpreted
code by a wide margin.

Peter Hansen
Guest
Posts: n/a

 07-13-2004
Tim Hochberg wrote:

> (E-Mail Removed) wrote:
>> Does it make two calls to sin()? There is a performance hit for 'y =
>> sin(x)*sin(x)'
>> when not using Psyco.

>
> It appears that it does not make this optimization. I suppose that's not
> suprising since my understanding is that Psyco generates really simple
> machine code.

I suppose technically, since this is Python, two consecutive calls
to sin(x) could easily return different values... It would be
insane to write code that did this, but the dynamic nature of
Python surely allows it and it's even possible such a dynamic
substitution would be of value (or even improve performance) in
some other situations.

-Peter

beliavsky@aol.com
Guest
Posts: n/a

 07-13-2004

Dan Christensen <(E-Mail Removed)> wrote:

> naive loop C: 0.040 *
>
>Only one of the tests used psyco. I did run the others with psyco
>too, but the times were about the same or slower.
>
>For me, psyco was about four times slower than C, and numarray was
>almost 3 times slower. I'm surprised by how close psyco and numarray
>were in your runs, and how slow Fortran was in beliavsky's test.
>My C program measures user+sys cpu time for the loop itself, but
>not start-up time for the program.

When I insert timing code within the Fortran program, the measured time is
0.04 s, so the C and Fortran speeds are the same, although different random
number generators are used.

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---