Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > removing a loop cause it to go at half the speed?

Reply
Thread Tools

removing a loop cause it to go at half the speed?

 
 
tom fredriksen
Guest
Posts: n/a
 
      03-15-2006
Hi

I was doing a simple test of the speed of a "maths" operation and when I
tested it I found that removing the loop that initialises the data array
for the operation caused the whole program to spend twice the time to
complete. If the loop is included it takes about 7.48 seconds to
complete, but when removed it takes about 11.48 seconds.

Does anybody have a suggestion as to why this is so and whether I can
trust the results of the code as it is below?

/tom


The code was compiled on linux 2.6.3 with gcc 3.3.2 and glibc 2.2.3


#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int main(int argc, char *argv[])
{
int total = 0;
int count = 65500;
signed int data[count];

/* Initialising array loop */
for(int c=0; c<count; c++) {
data[c]=1+(int) (2000000000.0*rand()/(RAND_MAX+1.0));
}

struct timeval start_time;
struct timeval end_time;

gettimeofday(&start_time, NULL);

for(int d=0; d<50000; d++) {
for(int c=0; c<count; c++) {
total += data[c];
}
}
gettimeofday(&end_time, NULL);

double t1=(start_time.tv_sec*1000)+(start_time.tv_usec/1000.0);
double t2=(end_time.tv_sec*1000)+(end_time.tv_usec/1000.0);

printf("Elapsed time (ms): %.6lf\n", t2-t1);
printf("Total: %u\n", total);

return(0);
}
 
Reply With Quote
 
 
 
 
tom fredriksen
Guest
Posts: n/a
 
      03-15-2006
Rod Pemberton wrote:
> Okay, I compiled your code for DJGPP and I'm not seeing this issue. There
> is a slight variation in execution times, but nothing like what you
> describe. Nor are they different enough to tell which is faster or slower.
> The only way I got a huge difference was compiling one as gcc -O and the
> other as gcc -O2.


Interesting. I compiled without -O2 and with -O, which gave me time
respectively 22 seconds and 11 seconds. Removing the loop made no
difference. Both with the loop, which with -O2 gave me 7 seconds
previously.

Out of cuirosity, what times did you get and what system are you using?

>If not an optimization problem, I did notice that GCC
> shifted more work to the floating point instructions when the loop was
> present. Perhaps, due to instruction pairing/pipelineing, caching, etc.,
> the floating point init just runs faster for you...


Since that happens before the measured loop, thats not an issue. The
time print statement confirms that the operation takes double the time.

/tom
 
Reply With Quote
 
 
 
 
tom fredriksen
Guest
Posts: n/a
 
      03-15-2006
Jan-Hinnerk Dumjahn wrote:
> The time you measure is the real time. It also contains time spend by other
> applications or the system. Perhaps the array needs to be swapped in on the
> first access.
>
> Try running both versions of the programm with "time myProg".


it gave me, for the 7 seconds run:
6.35user 0.00system 0:07.14elapsed 88%CPU

and for the 11 seconds run:
9.98user 0.00system 0:11.07elapsed 90%CPU

> Try a loop that reads through the array once instead of the initialisation.


I changed the loop to the following:

for(int c=0; c<count; c++) {
data2[c] = data[c];
}

and then printed some of the elements at the end of the program so as
not to have it optimised away. That changed the execution time to the
same for both cases. So it seems that because of using floating point in
the rand statement, it works faster. Maybe gcc optimised the program to
use floating point instead of integer operations, which is then faster.

Since it must be an integer operation, that leads to the following
question, how do I modify the statement to use integers instead.

data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0))

just removing ".0" does not make it slower as one would expect.

/tom
 
Reply With Quote
 
tom fredriksen
Guest
Posts: n/a
 
      03-15-2006
Vladimir S. Oka wrote:
> On Wednesday 15 March 2006 16:14, tom fredriksen opined (in
> <44183d6e$(E-Mail Removed)>):
>> I was doing a simple test of the speed of a "maths" operation and when
>> I tested it I found that removing the loop that initialises the data
>> array for the operation caused the whole program to spend twice the
>> time to complete. If the loop is included it takes about 7.48 seconds
>> to complete, but when removed it takes about 11.48 seconds.
>>
>> Does anybody have a suggestion as to why this is so and whether I can
>> trust the results of the code as it is below?

>
> I'd suggest brushing up on your algebra. Since when is 11.48 "twice the"
> 7.48? In my book it's closer to 50% larger ("twice larger" being 100%).


I did not say twice the larger, I said twice the time:/

/tom
 
Reply With Quote
 
Rod Pemberton
Guest
Posts: n/a
 
      03-15-2006

"tom fredriksen" <(E-Mail Removed)> wrote in message
news:441871a0$(E-Mail Removed)...
> Rod Pemberton wrote:
> > Okay, I compiled your code for DJGPP and I'm not seeing this issue.

There
> > is a slight variation in execution times, but nothing like what you
> > describe. Nor are they different enough to tell which is faster or

slower.
> > The only way I got a huge difference was compiling one as gcc -O and the
> > other as gcc -O2.

>
> Interesting. I compiled without -O2 and with -O, which gave me time
> respectively 22 seconds and 11 seconds. Removing the loop made no
> difference. Both with the loop, which with -O2 gave me 7 seconds
> previously.
>
> Out of cuirosity, what times did you get and what system are you using?


I didn't record the times, but the -O2 were in the 45 second range, -O was
in the 90 second range.
I ran them under Win98SE, 500Mhz K6-2, DJGPP v2.03 w/GCC v3.41.

> >If not an optimization problem, I did notice that GCC
> > shifted more work to the floating point instructions when the loop was
> > present. Perhaps, due to instruction pairing/pipelineing, caching,

etc.,
> > the floating point init just runs faster for you...

>
> Since that happens before the measured loop, thats not an issue. The
> time print statement confirms that the operation takes double the time.


Do you have a dual-core CPU? I've seen recent complaints in the assembly
language NG's that the RDTSC returns incorrect information for dual-core
CPUs. It might be that timing routine is using RDTSC.

I had to restructure the code slightly since you were using some C99
features. But, I didn't think it was critical. I moved the declarations
and init of c,c,d out of the for loops to the top of main. Also, I moved
the declarations of start_time and end_time to the top of main. For the
version I ran, I looked at the assembly from gcc -O2 -S for three versions
of your program: no loop, loop set all to zero, loop set rand() stuff. The
assembly code was very different. The no loop version used mostly integer
assembly intructions. The loop zero version used about half integer and
half floating. And the one with the rand() used maybe 70% floating point
intructions.


Rod Pemberton


 
Reply With Quote
 
tom fredriksen
Guest
Posts: n/a
 
      03-15-2006
Rod Pemberton wrote:
> "tom fredriksen" <(E-Mail Removed)> wrote in message
>
>> >If not an optimization problem, I did notice that GCC
>> > shifted more work to the floating point instructions when the loop was
>> > present. Perhaps, due to instruction pairing/pipelineing, caching,

> etc.,
>> > the floating point init just runs faster for you...

>>
>> Since that happens before the measured loop, thats not an issue. The
>> time print statement confirms that the operation takes double the time.

>
> Do you have a dual-core CPU? I've seen recent complaints in the assembly
> language NG's that the RDTSC returns incorrect information for dual-core
> CPUs. It might be that timing routine is using RDTSC.


No, its a single core AMD Athlon 2GHz from a couple of years ago.

> I had to restructure the code slightly since you were using some C99
> features. But, I didn't think it was critical. I moved the declarations
> and init of c,c,d out of the for loops to the top of main. Also, I moved
> the declarations of start_time and end_time to the top of main. For the
> version I ran, I looked at the assembly from gcc -O2 -S for three versions
> of your program: no loop, loop set all to zero, loop set rand() stuff. The
> assembly code was very different. The no loop version used mostly integer
> assembly intructions. The loop zero version used about half integer and
> half floating. And the one with the rand() used maybe 70% floating point
> intructions.


I tested that but it gave no difference either.

>> Out of cuirosity, what times did you get and what system are you using?

>
> I didn't record the times, but the -O2 were in the 45 second range,

-O was
> in the 90 second range.
> I ran them under Win98SE, 500Mhz K6-2, DJGPP v2.03 w/GCC v3.41.


this is highly suspicious... a k6 is slower than an Athlon 2GHz is it
not? so how come you are getting such low times?
Could you post the code you used, and check your times.

BTW did you use gcc? I used these args with gcc

-O2 -Wall -D_LARGEFILE64_SOURCE -std=gnu99

/tom
 
Reply With Quote
 
Rod Pemberton
Guest
Posts: n/a
 
      03-15-2006

"tom fredriksen" <(E-Mail Removed)> wrote in message
news:44188153$(E-Mail Removed)...
> Rod Pemberton wrote:
> > "tom fredriksen" <(E-Mail Removed)> wrote in message
> >
> >> Out of cuirosity, what times did you get and what system are you

using?
> >
> > I didn't record the times, but the -O2 were in the 45 second range,

> -O was
> > in the 90 second range.
> > I ran them under Win98SE, 500Mhz K6-2, DJGPP v2.03 w/GCC v3.41.

>
> this is highly suspicious... a k6 is slower than an Athlon 2GHz is it
> not? so how come you are getting such low times?
> Could you post the code you used, and check your times.


Times are below. I deleted the code, but just copied it again. The only
differences are that the declaration of c and d are no longer in the for()
loops. i.e.,
int c,d;
for(c=0; c<count; c++) {
for(d=0; d<50000; d++) {
for(c=0; c<count; c++) {

> BTW did you use gcc? I used these args with gcc
>
> -O2 -Wall -D_LARGEFILE64_SOURCE -std=gnu99


Gcc v3.41 yes, but DJGPP uses it's own libc. I keep forgetting that DJGPP
has it's own libc which is different from GNU libc. That could _easily_ be
a factor. Also, CPU speeds increase by roughly a factor of 2 per generation
(2), so 1/4 of 44 ~ 11.

(2 runs, w/loop)
gcc -O2
Elapsed time (ms): 42790.000000
Elapsed time (ms): 43450.000000

(2 runs, no loop)
gcc -O2
Elapsed time (ms): 44160.000000
Elapsed time (ms): 44050.000000

(2 runs, w/loop)
gcc -Wall -O2 -D_LARGEFILE64_SOURCE -std=gnu99
Elapsed time (ms): 44600.000000
Elapsed time (ms): 43340.000000

(3 runs, no loop)
gcc -Wall -O2 -D_LARGEFILE64_SOURCE -std=gnu99
Elapsed time (ms): 39600.000000
Elapsed time (ms): 42840.000000
Elapsed time (ms): 40970.000000


It might be time to try a linux NG to see if someone with a similar system
to yours is getting the same issue...


Rod Pemberton


 
Reply With Quote
 
laura fairhead
Guest
Posts: n/a
 
      03-15-2006
On Wed, 15 Mar 2006 17:14:38 +0100, tom fredriksen <(E-Mail Removed)> wrote:

>Hi
>
>I was doing a simple test of the speed of a "maths" operation and when I
>tested it I found that removing the loop that initialises the data array
>for the operation caused the whole program to spend twice the time to
>complete. If the loop is included it takes about 7.48 seconds to
>complete, but when removed it takes about 11.48 seconds.


Hello Tom,

This is most likely because your initialisation is outside of the
timed code. When you are initialising the data that is causing the
memory to be moved into the cache (main memory accesses cost more
time than cached access) and so the next time (in the loop) access
is much faster. So, when you don't initialise your data, all the time
it takes to move the 64k main memory to the cache is added into the
loop instead and you get a corresponding slow-down.

byefornow
laura

--
echo moc.12klat@daehriaf_arual|sed 's/\(.\)\(.\),*/\2,\1/g;h;s/,//g;/@t/q;G;D'
 
Reply With Quote
 
Mark McIntyre
Guest
Posts: n/a
 
      03-15-2006
On Wed, 15 Mar 2006 21:15:11 +0100, in comp.lang.c , tom fredriksen
<(E-Mail Removed)> wrote:

>> On Wednesday 15 March 2006 16:14, tom fredriksen opined (in
>> <44183d6e$(E-Mail Removed)>):
>>> I was doing a simple test of the speed of a "maths" operation and when
>>> I tested it I found that removing the loop that initialises the data
>>> array for the operation caused the whole program to spend twice the
>>> time to complete. If the loop is included it takes about 7.48 seconds
>>> to complete, but when removed it takes about 11.48 seconds.
>>>

>I did not say twice the larger, I said twice the time:/


Assuming you're using base 10, twice 7.48 isn't 11.48.

And to answer your question, who knows? In the "loopless" case you're
invoking undefined behaviour by accessing uninitialised memory, so the
extra time could be spent phoning the kremlin or counting your
diskdrive sectors, or pretty much anything.

Mark McIntyre
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
Reply With Quote
 
David Holland
Guest
Posts: n/a
 
      03-15-2006
On 2006-03-15, tom fredriksen <(E-Mail Removed)> wrote:
> I was doing a simple test of the speed of a "maths" operation and when I
> tested it I found that removing the loop that initialises the data array
> for the operation caused the whole program to spend twice the time to
> complete. If the loop is included it takes about 7.48 seconds to
> complete, but when removed it takes about 11.48 seconds.
>
> Does anybody have a suggestion as to why this is so and whether I can
> trust the results of the code as it is below?
>
> [...]
> int total = 0;
> int count = 65500;
> signed int data[count];


<OT purpose=stop_wild_speculation>

It is because of the virtual memory system. When you initialize the
array beforehand, it forces the virtual memory system to materialize
memory for the pages appearing in the array. When you do not, this
materialization happens in the timed loop, and then of course it's
slower.

You should be able to confirm this by touching some but not all of the
pages beforehand and observing the resulting timings. Hint: page size
for i386 family processors is 4096 bytes. And you won't see it at all
using DJGPP, which is DOS-based and doesn't have a virtual memory
system.

</OT>

I feel compelled to observe that your program exhibits undefined
behavior... and so strictly speaking, reasoning about its performance
properties is meaningless. And you absolutely cannot trust the
results.

Perhaps you meant "total" to be unsigned?

--
- David A. Holland
(the above address works if unscrambled but isn't checked often)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Triple nested loop python (While loop insde of for loop inside ofwhile loop) Isaac Won Python 9 03-04-2013 10:08 AM
comparing the first half of a string to the second half joe chesak Ruby 7 09-23-2010 03:34 AM
ie7 displaying only half of page (cuts page in half)... trint ASP .Net 4 09-11-2007 10:56 AM
regexp that matches half then conditionally excludes the other half Mike Ballard Perl Misc 6 11-15-2005 03:26 PM



Advertisments