Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > unrolling nested for-loop

Reply
Thread Tools

unrolling nested for-loop

 
 
V
Guest
Posts: n/a
 
      05-10-2008
Hello:

Consider the following nested for loop:

uint64 TABLE[8][256];

for (i=0; i<=7; i++)
for (j=1; j<=7; j++)
for (k=1; k<=(1<<j)-1; k++)
TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]);

I understand that unrolling just the inner most for-loop would give me
best performance and this is required for my project. But I'm unable
to figure out how to unroll just the innermost for-loop.

Can someone please help!

Thanks!
 
Reply With Quote
 
 
 
 
Ian Collins
Guest
Posts: n/a
 
      05-10-2008
V wrote:
> Hello:
>
> Consider the following nested for loop:
>
> uint64 TABLE[8][256];
>

Better to use standard types, uint64_t.

All caps for variable names isn't a good idea, all caps is usually used
for macros.

> for (i=0; i<=7; i++)
> for (j=1; j<=7; j++)
> for (k=1; k<=(1<<j)-1; k++)
> TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]);
>
> I understand that unrolling just the inner most for-loop would give me
> best performance and this is required for my project. But I'm unable
> to figure out how to unroll just the innermost for-loop.
>

Have you profiled the code?

> Can someone please help!
>

Your compiler probably can. Loop unrolling is a common optimisation.
Check the assembly output for your code with various optimisation settings.

--
Ian Collins.
 
Reply With Quote
 
 
 
 
V
Guest
Posts: n/a
 
      05-10-2008
I have tried to use compile time option to unroll this and checked its
assembly code, but it doesn't seem to have unrolled the loop.

Any suggestions?

Thanks.
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      05-10-2008
V wrote:
> I have tried to use compile time option to unroll this and checked its
> assembly code, but it doesn't seem to have unrolled the loop.
>
> Any suggestions?
>

A better compiler?

The most important thing to do is to profile your application and see if
there really is a bottleneck in this code. At the moment, manual
unrolling of this loop looks like premature micro-optimisation.

If there is an issue and your compiler isn't up the job, just repeat the
inner assignment N times and change the inner loop to increment its
counter by N. This assumes the limit is a multiple of N. Something
like (untested)

for (k=1; k<=(1<<j)-1; k += 2)
{
TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]);
TABLE [i][((1<<j)+k+1)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k+1]);
}

--
Ian Collins.
 
Reply With Quote
 
Antoninus Twink
Guest
Posts: n/a
 
      05-10-2008
On 10 May 2008 at 8:02, Ian Collins wrote:
> This assumes the limit is a multiple of N. Something like (untested)
>
> for (k=1; k<=(1<<j)-1; k += 2)
> {
> TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]);
> TABLE [i][((1<<j)+k+1)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k+1]);
> }


This fails when j=1.

The problem is that the number of iterations is variable, and ranges
between 1 and 127. Finding common divisors of (powers of 2 minus 1) as
an unrolling factor is problematic... (Since the number of iterations is
growing exponentially, even just unrolling the last loop is attractive,
but 127 is prime, so there'll be some fiddling needed.

Just unpacking what on earth the table looks like is an effort. The
outer index i is irrelevant, as there's no interaction between these
outer "layers". The inner 256-long arrays are arranged like this:

j
-1 | 0 |
0 | 1 |
1 | 2 | 3 |
2 | 4 | 5 | 6 | 7 |
3 | 8 | 9 |10 |11 |12 |13 |14 |15 |
4 ...
5 ...
6 ...
7 |128|129| ... |255|

The left-hand column is an input column. To fill in a given row, look at
the value in the left-hand column and xor it in turn with the elements
earlier in the array, in the order 0,1,2,3,...

Given the way these dependencies work, it is also possible to fill in
columns in turn from the left, rather than working row-wise. This
offers some potential for unrolling, but would still be complicated and
the resulting code would be large.

 
Reply With Quote
 
Bart
Guest
Posts: n/a
 
      05-10-2008
On May 10, 7:44*am, V <(E-Mail Removed)> wrote:
> I have tried to use compile time option to unroll this and checked its
> assembly code, but it doesn't seem to have unrolled the loop.
>
> Any suggestions?
>
> Thanks.


I'm not surprised. The inner loop is very complicated. Is there any
way of simplifying it?

Perhaps a rethink of what you're trying to achieve might help.

What factor of speedup do you need?

--
Bartc
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      05-10-2008
Bart wrote:
> On May 10, 7:44 am, V <(E-Mail Removed)> wrote:
>> I have tried to use compile time option to unroll this and checked its
>> assembly code, but it doesn't seem to have unrolled the loop.
>>
>> Any suggestions?
>>
>> Thanks.

>
> I'm not surprised. The inner loop is very complicated. Is there any
> way of simplifying it?
>

The first compiler I tried was happy to unroll the loop, it's not that
hard (for a machine!).

--
Ian Collins.
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      05-10-2008
Antoninus Twink wrote:
> On 10 May 2008 at 8:02, Ian Collins wrote:
>> This assumes the limit is a multiple of N. Something like (untested)
>>
>> for (k=1; k<=(1<<j)-1; k += 2)
>> {
>> TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]);
>> TABLE [i][((1<<j)+k+1)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k+1]);
>> }

>
> This fails when j=1.
>

I know, that why I said "This assumes the limit is a multiple of N", N
== 2 in this case.

--
Ian Collins.
 
Reply With Quote
 
Bart
Guest
Posts: n/a
 
      05-10-2008
On May 10, 12:35*pm, Ian Collins <(E-Mail Removed)> wrote:
> Bart wrote:
> > On May 10, 7:44 am, V <(E-Mail Removed)> wrote:
> >> I have tried to use compile time option to unroll this and checked its
> >> assembly code, but it doesn't seem to have unrolled the loop.

>
> >> Any suggestions?

>
> >> Thanks.

>
> > I'm not surprised. The inner loop is very complicated. Is there any
> > way of simplifying it?

>
> The first compiler I tried was happy to unroll the loop, it's not that
> hard (for a machine!).


Looking at the code again, it just initialies a 2048-element table (of
64-bit values).

If that's all it ever does, then it only needs to be done once, and
speed should not be an issue.

I think more context is needed.

--
Bartc
 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      05-10-2008
Bart wrote:
) Looking at the code again, it just initialies a 2048-element table (of
) 64-bit values).

Not quite. If you look closer you will see that it depends on the first
elements of each of the 8 arrays.

However, unrolling the outermost loop might improve efficiency
because it is a constant (8-fold) unrolling and could give rise to better
pipelining.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Unrolling loops using templates Ashley C++ 6 03-06-2011 07:04 AM
ultra-fast loop unrolling with g++ -O3 mark C Programming 2 06-12-2008 05:14 PM
Unrolling a loop Richard Cavell C++ 3 02-23-2005 02:45 PM
g++ loop unrolling performance =?ISO-8859-1?Q?Per_Nordl=F6w?= C++ 1 09-01-2004 01:52 AM
Loop unrolling question. Does this make sense? John Edwards C++ 5 07-07-2003 02:37 PM



Advertisments