Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Why is this loop heavy code so slow in Python? Possible Project Euler spoilers

Reply
Thread Tools

Why is this loop heavy code so slow in Python? Possible Project Euler spoilers

 
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      09-01-2007
Ivan Wang a écrit :
> On Sep 2, 9:45 pm, (E-Mail Removed) wrote:
>
>>[snip code]
>>
>>Thanks for that. I realise that improving the algorithm will speed
>>things up. I wanted to know why my less than perfect algorithm was so
>>much slower in python than exactly the same algorithm in C. Even when
>>turning off gcc's optimiser with the -O0 flag, the C version is still
>>
>>
>>
>>
>>>100 times quicker.- Hide quoted text -

>>
>>- Show quoted text -

>
> Maybe Python is the slowest programming language in the world.
> So there is a joke: some python hater said that "python" can only
> crawl rather than run.
>
> Python is slow because:
> (1) dynamic binding

Yes.
> (2) it is a interpretation language

Not quite. It's compiled to byte-code - just like Java (would you call
Java an 'interpreted language' ?)
 
Reply With Quote
 
 
 
 
jwrweatherley@gmail.com
Guest
Posts: n/a
 
      09-02-2007
I'm pretty new to python, but am very happy with it. As well as using
it at work I've been using it to solve various puzzles on the Project
Euler site - http://projecteuler.net. So far it has not let me down,
but it has proved surprisingly slow on one puzzle.

The puzzle is: p is the perimeter of a right angle triangle with
integral length sides, {a,b,c}. which value of p < 1000, is the
number of solutions {a,b,c} maximised?

Here's my python code:

#!/usr/local/bin/python

solutions = [0] * 1001
p = 0

for a in xrange(1, 1000):
for b in xrange(1, 1000 - a):
for c in xrange(1, 1000 - a - b):
p = a + b + c
if p < 1000:
if a ** 2 + b ** 2 == c ** 2:
solutions[p] += 1

max = 0
maxIndex = 0
index = 0
for solution in solutions:
if solution > max:
max = solution
maxIndex = index
index += 1

print maxIndex


It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo MacBook
Pro. Surprised at how slow it was I implemented the same algorithm in
C:

#include <stdio.h>
#include <stdlib.h>

int main() {
int* solutions = calloc(1000, sizeof(int));

int p;
for(int a = 1; a < 1000; ++a) {
for(int b = 1; b < 1000 - a; ++b) {
for(int c = 1; c < 1000 - a - b; ++c) {
p = a + b + c;
if(p < 1000) {
if(a * a + b * b == c * c) {
solutions[p] += 1;
}
}
}
}
}

int max = 0;
int maxIndex = 0;

for(int i = 0; i < 1000; ++i) {
if(solutions[i] > max) {
max = solutions[i];
maxIndex = i;
}
}
printf("%d\n", maxIndex);
return 0;
}


gcc -o 39 -std=c99 -O3 39.c

The resulting executable takes 0.24 seconds to run. I'm not expecting
a scripting language to run faster than native code, but I was
surprised at how much slower it was in this case. Any ideas as to what
is causing python so much trouble in the above code?

 
Reply With Quote
 
 
 
 
Arnaud Delobelle
Guest
Posts: n/a
 
      09-02-2007
On Sep 2, 12:51 pm, (E-Mail Removed) wrote:
> I'm pretty new to python, but am very happy with it. As well as using
> it at work I've been using it to solve various puzzles on the Project
> Euler site -http://projecteuler.net. So far it has not let me down,
> but it has proved surprisingly slow on one puzzle.
>
> The puzzle is: p is the perimeter of a right angle triangle with
> integral length sides, {a,b,c}. which value of p < 1000, is the
> number of solutions {a,b,c} maximised?
>
> Here's my python code:
>
> #!/usr/local/bin/python
>
> solutions = [0] * 1001
> p = 0
>
> for a in xrange(1, 1000):
> for b in xrange(1, 1000 - a):
> for c in xrange(1, 1000 - a - b):
> p = a + b + c
> if p < 1000:
> if a ** 2 + b ** 2 == c ** 2:
> solutions[p] += 1
>
> max = 0
> maxIndex = 0
> index = 0
> for solution in solutions:
> if solution > max:
> max = solution
> maxIndex = index
> index += 1
>
> print maxIndex
>
> It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo MacBook
> Pro. Surprised at how slow it was I implemented the same algorithm in
> C:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main() {
> int* solutions = calloc(1000, sizeof(int));
>
> int p;
> for(int a = 1; a < 1000; ++a) {
> for(int b = 1; b < 1000 - a; ++b) {
> for(int c = 1; c < 1000 - a - b; ++c) {
> p = a + b + c;
> if(p < 1000) {
> if(a * a + b * b == c * c) {
> solutions[p] += 1;
> }
> }
> }
> }
> }
>
> int max = 0;
> int maxIndex = 0;
>
> for(int i = 0; i < 1000; ++i) {
> if(solutions[i] > max) {
> max = solutions[i];
> maxIndex = i;
> }
> }
> printf("%d\n", maxIndex);
> return 0;
>
> }
>
> gcc -o 39 -std=c99 -O3 39.c
>
> The resulting executable takes 0.24 seconds to run. I'm not expecting
> a scripting language to run faster than native code, but I was
> surprised at how much slower it was in this case. Any ideas as to what
> is causing python so much trouble in the above code?


from math import sqrt

solutions = [0] * 1001
p = 0

for a in xrange(1, 1000):
a2 = a*a
for b in xrange(1, 1000 - a):
c = sqrt(a2 + b*b)
if c == int(c) and a+b+c < 1000:
solutions[a+b+int(c)] += 1

max = 0
maxIndex = 0
index = 0
for solution in solutions:
if solution > max:
max = solution
maxIndex = index
index += 1

print maxIndex

--
Arnaud


 
Reply With Quote
 
Greg Copeland
Guest
Posts: n/a
 
      09-02-2007
On Sep 2, 7:20 am, Arnaud Delobelle <(E-Mail Removed)> wrote:
> On Sep 2, 12:51 pm, (E-Mail Removed) wrote:
>
>
>
> > I'm pretty new to python, but am very happy with it. As well as using
> > it at work I've been using it to solve various puzzles on the Project
> > Euler site -http://projecteuler.net. So far it has not let me down,
> > but it has proved surprisingly slow on one puzzle.

>
> > The puzzle is: p is the perimeter of a right angle triangle with
> > integral length sides, {a,b,c}. which value of p < 1000, is the
> > number of solutions {a,b,c} maximised?

>
> > Here's my python code:

>
> > #!/usr/local/bin/python

>
> > solutions = [0] * 1001
> > p = 0

>
> > for a in xrange(1, 1000):
> > for b in xrange(1, 1000 - a):
> > for c in xrange(1, 1000 - a - b):
> > p = a + b + c
> > if p < 1000:
> > if a ** 2 + b ** 2 == c ** 2:
> > solutions[p] += 1

>
> > max = 0
> > maxIndex = 0
> > index = 0
> > for solution in solutions:
> > if solution > max:
> > max = solution
> > maxIndex = index
> > index += 1

>
> > print maxIndex

>
> > It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo MacBook
> > Pro. Surprised at how slow it was I implemented the same algorithm in
> > C:

>
> > #include <stdio.h>
> > #include <stdlib.h>

>
> > int main() {
> > int* solutions = calloc(1000, sizeof(int));

>
> > int p;
> > for(int a = 1; a < 1000; ++a) {
> > for(int b = 1; b < 1000 - a; ++b) {
> > for(int c = 1; c < 1000 - a - b; ++c) {
> > p = a + b + c;
> > if(p < 1000) {
> > if(a * a + b * b == c * c) {
> > solutions[p] += 1;
> > }
> > }
> > }
> > }
> > }

>
> > int max = 0;
> > int maxIndex = 0;

>
> > for(int i = 0; i < 1000; ++i) {
> > if(solutions[i] > max) {
> > max = solutions[i];
> > maxIndex = i;
> > }
> > }
> > printf("%d\n", maxIndex);
> > return 0;

>
> > }

>
> > gcc -o 39 -std=c99 -O3 39.c

>
> > The resulting executable takes 0.24 seconds to run. I'm not expecting
> > a scripting language to run faster than native code, but I was
> > surprised at how much slower it was in this case. Any ideas as to what
> > is causing python so much trouble in the above code?

>
> from math import sqrt
>
> solutions = [0] * 1001
> p = 0
>
> for a in xrange(1, 1000):
> a2 = a*a
> for b in xrange(1, 1000 - a):
> c = sqrt(a2 + b*b)
> if c == int(c) and a+b+c < 1000:
> solutions[a+b+int(c)] += 1
>
> max = 0
> maxIndex = 0
> index = 0
> for solution in solutions:
> if solution > max:
> max = solution
> maxIndex = index
> index += 1
>
> print maxIndex
>
> --
> Arnaud


For the curious:

O O + P A A + P
======= ======= ======= =======
2:22.56 0:25.65 0:00.75 0:00.20

O = Original Implementation
P = Psyco (psyco.full())
A = Arnaud's Revised Implementation

 
Reply With Quote
 
rzed
Guest
Posts: n/a
 
      09-02-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote in
news:(E-Mail Removed) ups.com:

> The puzzle is: p is the perimeter of a right angle triangle with
> integral length sides, {a,b,c}. which value of p < 1000, is the
> number of solutions {a,b,c} maximised?
>
> Here's my python code:
>
> #!/usr/local/bin/python
>
> solutions = [0] * 1001
> p = 0
>
> for a in xrange(1, 1000):
> for b in xrange(1, 1000 - a):
> for c in xrange(1, 1000 - a - b):
> p = a + b + c
> if p < 1000:
> if a ** 2 + b ** 2 == c ** 2:
> solutions[p] += 1
>


Once p >= 1000, it ain't goin' back. If you break out of the
innermost loop here after that happens, you'll save a bunch of
time.

> max = 0
> maxIndex = 0
> index = 0
> for solution in solutions:
> if solution > max:
> max = solution
> maxIndex = index
> index += 1
>
> print maxIndex
>
>
> It takes 2 minutes and twelve seconds on a 2.4GHz Core2Duo
> MacBook Pro.
>

[...]

> The resulting executable takes 0.24 seconds to run. I'm not
> expecting a scripting language to run faster than native code,
> but I was surprised at how much slower it was in this case. Any
> ideas as to what is causing python so much trouble in the above
> code?
>


 
Reply With Quote
 
jwrweatherley@gmail.com
Guest
Posts: n/a
 
      09-02-2007
[snip code]

Thanks for that. I realise that improving the algorithm will speed
things up. I wanted to know why my less than perfect algorithm was so
much slower in python than exactly the same algorithm in C. Even when
turning off gcc's optimiser with the -O0 flag, the C version is still
> 100 times quicker.


 
Reply With Quote
 
Mark Dickinson
Guest
Posts: n/a
 
      09-02-2007
On Sep 2, 9:45 am, (E-Mail Removed) wrote:
> [snip code]
>
> Thanks for that. I realise that improving the algorithm will speed
> things up. I wanted to know why my less than perfect algorithm was so
> much slower in python than exactly the same algorithm in C. Even when
> turning off gcc's optimiser with the -O0 flag, the C version is still
>
> > 100 times quicker.


Well, for one thing, you're creating half a million xrange objects in
the course of the search. All the C code has
to do is increment a few integers.

Mark

 
Reply With Quote
 
Ivan Wang
Guest
Posts: n/a
 
      09-02-2007
On Sep 2, 9:45 pm, (E-Mail Removed) wrote:
> [snip code]
>
> Thanks for that. I realise that improving the algorithm will speed
> things up. I wanted to know why my less than perfect algorithm was so
> much slower in python than exactly the same algorithm in C. Even when
> turning off gcc's optimiser with the -O0 flag, the C version is still
>
>
>
> > 100 times quicker.- Hide quoted text -

>
> - Show quoted text -

Maybe Python is the slowest programming language in the world.
So there is a joke: some python hater said that "python" can only
crawl rather than run.

Python is slow because:
(1) dynamic binding
(2) it is a interpretation language
For example, in C code, an interger expression "a+b" will directly
generate an assembly code "add" for x86 processors.
A python interpreter, on the other side, need detect the type of a and
b first, then choose the right "+" operation for them and use a
evaluation stack to get the result.

Psyco is a JIT compiler with just in time specialization which can
somehow solve these two problems


 
Reply With Quote
 
Arnaud Delobelle
Guest
Posts: n/a
 
      09-02-2007
On Sep 2, 12:51 pm, (E-Mail Removed) wrote:
[...]
> The resulting executable takes 0.24 seconds to run. I'm not expecting
> a scripting language to run faster than native code, but I was
> surprised at how much slower it was in this case. Any ideas as to what
> is causing python so much trouble in the above code?


Sorry I didn't answer your question at all in my previous post
(although my modified method gets the answer in about 6s on a measly
PB G4 1GHz .
Your little code snippet is just about the worst bit of code for
python to be compared to C. Three loops, only integer arithmetic:
that's not going to make Python shine!

Nevertheless as you optimise your C snippet (-O3), there are probably
a few things that the compiler does for you:

* as in the inner loop, a*a + b*b always remain the same, it is
probably stored in a register once and for all
* in the second loop, a*a remains the same so it might be calculated
once and for all as well.

It gives this:

M = 1000
solutions = [0] * M

def f1():
"Your original implementation"
for a in xrange(1, M):
for b in xrange(1, M - a):
for c in xrange(1, M - a - b):
if a**2 + b**2 == c**2:
solutions[a+b+c] += 1

def f2():
"a*a + b*b precalculated"
for a in xrange(1, M):
a2 = a*a
for b in xrange(1, M - a):
s = a2 + b*b
for c in xrange(1, M - a - b):
if s == c*c:
solutions[a+b+c] += 1

It doesn't make it as quick as C of course, but more than halves the
execution time.

--
Arnaud


 
Reply With Quote
 
Cameron Laird
Guest
Posts: n/a
 
      09-02-2007
In article <(E-Mail Removed) om>,
Mark Dickinson <(E-Mail Removed)> wrote:
>On Sep 2, 9:45 am, (E-Mail Removed) wrote:
>> [snip code]
>>
>> Thanks for that. I realise that improving the algorithm will speed
>> things up. I wanted to know why my less than perfect algorithm was so
>> much slower in python than exactly the same algorithm in C. Even when
>> turning off gcc's optimiser with the -O0 flag, the C version is still
>>
>> > 100 times quicker.

>
>Well, for one thing, you're creating half a million xrange objects in
>the course of the search. All the C code has
>to do is increment a few integers.
>
>Mark
>


Right: Mr. Dickinson's original question is entirely
legitimate, and it's not adequate to respond, as some
follow-ups did, with ways to improve the Python-coded
algorithm.

The correct answer, which I want to reinforce, is that
the exhibited Python and C versions are NOT "exactly
the same algorithm", at least not without more quali-
fication. Part of Python expertise is to recognize
that creation of xrange objects, mentioned above, is
far from free. Also, -O3 gives C the opportunity,
also remarked in a follow-up, to factor calculations
outside their loops.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Triple nested loop python (While loop insde of for loop inside ofwhile loop) Isaac Won Python 9 03-04-2013 10:08 AM
number theory libraries / project euler eliben Python 3 02-18-2009 10:36 PM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
How heavy is a litre of heavy water? anthonyberet Computer Support 19 04-10-2004 04:07 PM
Why are heavy-weight components heavy? Minti Java 4 02-12-2004 06:50 PM



Advertisments