Velocity Reviews > Faster for() loops?

# Faster for() loops?

Neo
Guest
Posts: n/a

 09-26-2005
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7,8,9 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster. This
works because it is quicker to process "i--" as the test condition, which
says "is i non-zero? If so, decrement it and continue.". For the original
code, the processor has to calculate "subtract i from 10. Is the result
non-zero? if so, increment i and continue.". In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"

Al Borowski
Guest
Posts: n/a

 09-26-2005
Hi,

Neo wrote:
[...] In tight loops, this make a
> considerable difference.
> How far it holds true.. in the light of modern optimizing compilers? and
> will it make a significant difference in case of embedded systems???

There is nothing like an experiment to test a theory. I just tried with
AVRGCC

void countDown(void){
int i;
for(i=10; i!=0; i--) doSomething();
}
void countUp(void){
int i;
for(i=0;i<10;i++) doSomething();
}

The generated code is

000000ce <countDown>:
}

void countDown(void){
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething();
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: 0e 94 5d 00 call 0xba
da: 21 97 sbiw r28, 0x01 ; 1
dc: e1 f7 brne .-8 ; 0xd6
de: df 91 pop r29
e0: cf 91 pop r28
e2: 08 95 ret

000000e4 <countUp>:
}
void countUp(void){
e4: cf 93 push r28
e6: df 93 push r29
e8: c9 e0 ldi r28, 0x09 ; 9
ea: d0 e0 ldi r29, 0x00 ; 0
int i;
for(i=0;i<10;i++) doSomething();
ec: 0e 94 5d 00 call 0xba
f0: 21 97 sbiw r28, 0x01 ; 1
f2: d7 ff sbrs r29, 7
f4: fb cf rjmp .-10 ; 0xec
f6: df 91 pop r29
f8: cf 91 pop r28
fa: 08 95 ret

Counting down instead of up saves one whole instruction. It could make a
difference I suppose.

However, the compiler cannot optimise as well if anything in the loop
depends on the value of 'i'.

void countDown(void){
int i;
for(i=10; i!=0; i--) doSomething(i);
}
void countUp(void){
int i;
for(i=0;i<10;i++) doSomething(i);
}

Becomes

void countDown(void){
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething(i);
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: ce 01 movw r24, r28
d8: 0e 94 5d 00 call 0xba
dc: 21 97 sbiw r28, 0x01 ; 1
de: d9 f7 brne .-10 ; 0xd6
e0: df 91 pop r29
e2: cf 91 pop r28
e4: 08 95 ret

000000e6 <countUp>:
}
void countUp(void){
e6: cf 93 push r28
e8: df 93 push r29
int i;
for(i=0;i<10;i++) doSomething(i);
ea: c0 e0 ldi r28, 0x00 ; 0
ec: d0 e0 ldi r29, 0x00 ; 0
ee: ce 01 movw r24, r28
f0: 0e 94 5d 00 call 0xba
f4: 21 96 adiw r28, 0x01 ; 1
f6: ca 30 cpi r28, 0x0A ; 10
f8: d1 05 cpc r29, r1
fa: cc f3 brlt .-14 ; 0xee
fc: df 91 pop r29
fe: cf 91 pop r28
100: 08 95 ret

This time there are a whole 2 extra instructions. I don't think this is
such a big deal. Unrolling the loop would give a better result.

cheers,

Al

Ian Bell
Guest
Posts: n/a

 09-26-2005
Neo wrote:

> Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
> states:for( i=0; i<10; i++){ ... }i loops through the values
> 0,1,2,3,4,5,6,7,8,9 If you don't care about the order of the loop counter,
> you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
> through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
> This works because it is quicker to process "i--" as the test condition,
> which says "is i non-zero? If so, decrement it and continue.". For the
> original code, the processor has to calculate "subtract i from 10. Is the
> result non-zero? if so, increment i and continue.". In tight loops, this
> make a considerable difference.
> How far it holds true.. in the light of modern optimizing compilers? and
> will it make a significant difference in case of embedded systems???
>

Many micros have a decrement jmp if zero (or non zero) machine instruction
so a decent optimising compiler should know this and use it in count down
to zero loops. Counting up often needs a compare followed by a jmp zero (or
non zero) which will be a tad slower.

Ian

Peter Bushell
Guest
Posts: n/a

 09-26-2005
"Neo" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
> states:for( i=0; i<10; i++){ ... }i loops through the values
> 0,1,2,3,4,5,6,7,8,9 If you don't care about the order of the loop counter,
> you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
> through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
> This works because it is quicker to process "i--" as the test condition,
> which says "is i non-zero? If so, decrement it and continue.". For the
> original code, the processor has to calculate "subtract i from 10. Is the
> result non-zero? if so, increment i and continue.". In tight loops, this
> make a considerable difference.
> How far it holds true.. in the light of modern optimizing compilers? and
> will it make a significant difference in case of embedded systems???
>
> Thanks,
> -Neo
> "Do U really think, what U think real is really real?"
>

A major advantage of writing in C is that you can, if you choose, write
understandable, maintainable code. This kind of hand-optimisation has the
opposite effect. If you really need to care about exactly how many
instruction cycle a loop takes, code it in assembly language. Otherwise, for
leave the compiler to do the optimisation. These days, most compilers can
optimise almost as well as you can, for most "normal" operations.

Regards,
--
Peter Bushell
http://www.software-integrity.com/

Skarmander
Guest
Posts: n/a

 09-26-2005
Neo wrote:
> Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
> states:for( i=0; i<10; i++){ ... }i loops through the values
> 0,1,2,3,4,5,6,7,8,9 If you don't care about the order of the loop counter,
> you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
> through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster. This
> works because it is quicker to process "i--" as the test condition, which
> says "is i non-zero? If so, decrement it and continue.". For the original
> code, the processor has to calculate "subtract i from 10. Is the result
> non-zero? if so, increment i and continue.". In tight loops, this make a
> considerable difference.
> How far it holds true.. in the light of modern optimizing compilers? and
> will it make a significant difference in case of embedded systems???
>

Regardless of the performance issue, I'd like to point out that after
for( i=10; i--; ) finishes, i will have the value -1, since the
decrement is performed even if i is zero. This is counterintuitive, so
it's worth noting. It also means the following is not equivalent:

for (i = 10; i != 0; --i)

Since here one less decrement is performed. Incidentally, my
compiler/platform generates better code with this version -- it compares
i to -1 in the other, which is no better than comparing it to 10! If you
want to count down, I suggest writing what you mean and separating the
test and decrement parts -- it has the added bonus of making things more
readable. The rest is best left to the compiler.

S.

Scott Moore
Guest
Posts: n/a

 09-26-2005
Neo wrote On 09/25/05 23:41,:
> Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
> states:for( i=0; i<10; i++){ ... }i loops through the values
> 0,1,2,3,4,5,6,7,8,9 If you don't care about the order of the loop counter,
> you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
> through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster. This
> works because it is quicker to process "i--" as the test condition, which
> says "is i non-zero? If so, decrement it and continue.". For the original
> code, the processor has to calculate "subtract i from 10. Is the result
> non-zero? if so, increment i and continue.". In tight loops, this make a
> considerable difference.
> How far it holds true.. in the light of modern optimizing compilers? and
> will it make a significant difference in case of embedded systems???
>
> Thanks,
> -Neo
> "Do U really think, what U think real is really real?"
>
>

Unroll it completely.

Roberto Waltman
Guest
Posts: n/a

 09-26-2005
"Neo" wrote:

>Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
>states:for( i=0; i<10; i++){ ... }i loops through the values
>0,1,2,3,4,5,6,7,8,9 If you don't care about the order of the loop counter,
>you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
>through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
>....
>How far it holds true.. in the light of modern optimizing compilers? and
>will it make a significant difference in case of embedded systems???

It may or not save a couple of assembly language instructions, (of
course depending on the compiler and processor used,) but I doubt this
"noptimization" will make any noticeable change in the performance of
a program, unless your code consist mainly of empty for() loops.

What impact can a minuscule reduction in the time required to decide
if the loop has ended or not have, if the body of the loop, for
example, call functions that format a CAN message, deliver it, wait
for a response, retry if there were errors or timeouts, decode the
response, store the values in a serial EEPROM, and based on them start
a few motors, open pneumatic valves, optionally sending an email
message to Katmandu.

That is not an optimization, but a total waste of time. Read the first
example in "Elements of programming style" and learn...

Roberto Waltman

[ return address is invalid. ]

Joe Butler
Guest
Posts: n/a

 09-26-2005
>
> That is not an optimization, but a total waste of time. Read the first
> example in "Elements of programming style" and learn...

What if the difference is between fitting into memory and not?

Mark McIntyre
Guest
Posts: n/a

 09-26-2005
On Mon, 26 Sep 2005 12:11:23 +0530, in comp.lang.c , "Neo"
<(E-Mail Removed)> wrote:

>Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
>states

(that reversing loop order is faster)

The page is talking rot. It *may* be faster. It *may* be slower. The
only way to know is to benchmark your particular implementation in the
specific case you're examining.

>How far it holds true.. in the light of modern optimizing compilers? and
>will it make a significant difference in case of embedded systems???

Benchmark.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Kevin D. Quitt
Guest
Posts: n/a

 09-26-2005
Depends what you're doing. If you're accessing a large chunk of memory on a system with
cache, you want to go through incrementing addresses to maximize the use of cache.
Decrementing through memory is generally pessimal.

--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up