Velocity Reviews > GGC's Machine Code Production ?

# GGC's Machine Code Production ?

Taygun Kekec
Guest
Posts: n/a

 05-01-2008
Hello , I have been curious about the speed of 2 version of a matris
filling program and got suprised by the results.
Actually , I am having Data Structures lecture on university though
our professor claimed "using IF structures are costly , so you better
use immediate data addressing " so he gave us the pseudo codes that i
implemented in language C.

Here are the source codes :
/* Optimized.c * /
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define ELEMENTS 15000
#define KOSEGEN 1
#define UPPER 2
#define LOWER 3

int main()
{
int i,j;
int **A;
/* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
A = (int **) malloc( ELEMENTS * sizeof(int*) );
for(i = 0; i < ELEMENTS; i++){
if ( (A[i] = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
printf(" yer yok \n");
return -1;
}
}

time_t start,end;

time (&start); /* gerekli sistem saatini cekme */
for(i=0;i<ELEMENTS-1;i++){
A[i][i] = KOSEGEN;
for(j= i+1;j<ELEMENTS;j++) {
A[i][j]= UPPER;
A[j][i]= LOWER;
}
}
A[ELEMENTS-1][ELEMENTS-1] = 1;
time (&end); /* gerekli sistem saatini cekme */
double dif = difftime (end,start); /* gecen sureyi dif degiskenine
atma */
printf ("%.4lf seconds for optimized algo.\n", dif );
free(A);
scanf("%d");
}
/* Optimized.c * /
----------------------
/* Unoptimized.c * /
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define ELEMENTS 15000
#define KOSEGEN 1
#define UPPER 2
#define LOWER 3

int main()
{
int i,j;
int **A;
/* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
A = (int **) malloc( ELEMENTS * sizeof(int*) );
for(i = 0; i < ELEMENTS; i++){
if ( (A[i] = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
printf(" yer yok \n");
return -1;
}
}

time_t start,end;

time (&start); /* gerekli sistem saatini cekme */
for(i=0;i<ELEMENTS;i++){
for(j=0;j<ELEMENTS;j++) {
if ( i<j)
A[i][j] = UPPER;
else if ( i>j)
A[i][j] = LOWER;
else
A[i][j] = KOSEGEN;
}
}
time (&end); /* gerekli sistem saatini cekme */
double dif = difftime (end,start); /* gecen sureyi dif
degiskenine atma */
printf ("%.4lf seconds for unoptimized algo.\n", dif );
free(A);
scanf("%d");
}
/* Unoptimized.c * /

Whats the reason for this ? Processing unit prefetching or something

Taygun Kekec
Guest
Posts: n/a

 05-01-2008
By the way , the algorithm aims to fill the matris like this :
Upper side of matris to 2
Lower side of matris to 3
for 4x4 :
[1] [2] [2] [2]
[3] [1] [2] [2]
[3] [3] [1] [2]
[3] [3] [3] [1]

Guest
Posts: n/a

 05-01-2008
Taygun Kekec wrote:
> Hello , I have been curious about the speed of 2 version of a matris
> filling program and got suprised by the results.
> Actually , I am having Data Structures lecture on university though
> our professor claimed "using IF structures are costly , so you better
> use immediate data addressing " so he gave us the pseudo codes that i
> implemented in language C.
>
> Here are the source codes :
> /* Optimized.c * /

....
> for(i=0;i<ELEMENTS-1;i++){
> A[i][i] = KOSEGEN;
> for(j= i+1;j<ELEMENTS;j++) {
> A[i][j]= UPPER;
> A[j][i]= LOWER;
> }
> }
> A[ELEMENTS-1][ELEMENTS-1] = 1;

....
> /* Unoptimized.c * /
> for(i=0;i<ELEMENTS;i++){
> for(j=0;j<ELEMENTS;j++) {
> if ( i<j)
> A[i][j] = UPPER;
> else if ( i>j)
> A[i][j] = LOWER;
> else
> A[i][j] = KOSEGEN;
> }
> }

Just looking at the source, the first version executes the inner loop
n(n-1)/2 times, while the last does this n*n times, about double. Also the
which the first doesn't. In general, fewer branches mean faster code.

--

Taygun Kekec
Guest
Posts: n/a

 05-01-2008
You are totally right ,I also think the optimized one should finish
filling quicker than Unoptimized.c does.
But Unoptimized.c finishes quicker...
Compile & Run and you will see.

christian.bau
Guest
Posts: n/a

 05-01-2008
On May 1, 3:20*pm, Taygun Kekec <(E-Mail Removed)> wrote:
> You are totally right ,I also think the optimized one should finish
> filling quicker than Unoptimized.c does.
> But Unoptimized.c *finishes quicker...
> Compile & Run and you will see.

On a typical 32 bit system, the memory that you are filling is about
900 Megabyte.

Now have a look in which order you are storing the values: In the
"unoptimised" version, you are storing to array elements in sequential
order. In the so-called "optimised" version, half of the stores are in
sequential order, but then your code will have to read up to 15000
different pointers, and store a single four byte value. Each value
stored in a completely different location.

Modern processors usually transfer data from and to memory in units of
whole cache lines, often 64 byte or more per cache line. The code
accessing the data in a linear way only writes whole cache lines as
needed. The "optimised" code has to read and write a whole cache line,
just to store a single integer.

Now if you want to really, really optimise the code: Consider that on
most implementations, memcpy is highly optimised in ways that you
could never think of. How could you achieve the task with 99% of the
work done in calls to memcpy?

Willem
Guest
Posts: n/a

 05-01-2008
christian.bau wrote:
) Modern processors usually transfer data from and to memory in units of
) whole cache lines, often 64 byte or more per cache line. The code
) accessing the data in a linear way only writes whole cache lines as
) needed. The "optimised" code has to read and write a whole cache line,
) just to store a single integer.

I think on modern computers, if you read a line of memory, the next line
will be 'prefetched' and will be available a lot quicker than when you
read an other (random) piece of memory.

In other words: the memory is optimized for sequential access.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

A. Sinan Unur
Guest
Posts: n/a

 05-01-2008
Taygun Kekec <(E-Mail Removed)> wrote in news:3e8bc817-a55a-4c0d-
http://www.velocityreviews.com/forums/(E-Mail Removed):

> Hello , I have been curious about the speed of 2 version of a matris
> filling program and got suprised by the results.
> Actually , I am having Data Structures lecture on university though
> our professor claimed "using IF structures are costly , so you better
> use immediate data addressing " so he gave us the pseudo codes that i
> implemented in language C.

As others have pointed out, a branch in code that is already in the
instruction cache is much cheaper than constantly invalidating data in
the cache.

There are some fundamental style issues in the code. I am going to
comment on those.

> /* Optimized.c * /
> #include <stdio.h>
> #include <time.h>
> #include <stdlib.h>
>
> #define ELEMENTS 15000
> #define KOSEGEN 1
> #define UPPER 2
> #define LOWER 3

Naming some of the symbols in Turkish and others in English is not a
good idea if you want others to comprehend easily what you are doing.

> int main()

int main(void)

> {
> int i,j;
> int **A;
> /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
> A = (int **) malloc( ELEMENTS * sizeof(int*) );

A = malloc( ELEMENTS * sizeof(*A) );

Casting the return value of malloc can hide failure to include stdlib.h.

Using sizeof(*A) means one less thing to worry about if the type of
elements of the matrix changes.

You do not check if malloc succeded.

> for(i = 0; i < ELEMENTS; i++){
> if ( (A[i] = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
> printf(" yer yok \n");
> return -1;
> }
> }

Ditto for malloc. Also, error messages should go to stderr. Finally,
exit with EXIT_FAILURE status is preferable to -1.

At the end of the program, you free A, but do not free the individual
row pointers.

I would use an OS provided utility to benchmark, such as time on *nix
systems or timethis on Windows.

How does the following version perform?

#include <stdio.h>
#include <stdlib.h>

enum MATRIX_VALUE { DIAGONAL = 1, UPPER = 2, LOWER = 3 };
enum DIMENSION { ROWS = 15000, COLS = 15000 };

int main(void) {
int r, c;
int *A;

A = malloc( ROWS * COLS * sizeof( *A ) );

if ( !A ) {
fputs( "Insufficient memory\n", stderr );
exit( EXIT_FAILURE );
}

for ( r = 0; r < ROWS; ++r ) {
int base = r * COLS;
for ( c = 0; c < COLS; ++c ) {
int index = base + c;
if ( r < c ) {
A[ index ] = UPPER;
}
else if ( r > c ) {
A[ index ] = LOWER;
}
else {
A[ index ] = DIAGONAL;
}
}
}

free( A );
return EXIT_SUCCESS;
}

Note that the disadvantage of this version is that it requires about 900
MB of contiguous memory.

On my laptop, there is no appreciable difference between this and the
unoptimized version above (after getting rid of the extraneous calls to
time and including code to free individual pointers) measured with
timethis.exe on Windows. On the other hand, your code would continue to
work even when memory is fragmented.

Sinan

--
A. Sinan Unur <(E-Mail Removed)>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Taygun Kekec
Guest
Posts: n/a

 05-01-2008
Thanks for you assistance
That cache logic makes total sense.

Also for the style suggestions.

Barry Schwarz
Guest
Posts: n/a

 05-02-2008
On Thu, 1 May 2008 06:21:21 -0700 (PDT), Taygun Kekec
<(E-Mail Removed)> wrote:

>Hello , I have been curious about the speed of 2 version of a matris
>filling program and got suprised by the results.
>Actually , I am having Data Structures lecture on university though
>our professor claimed "using IF structures are costly , so you better
>use immediate data addressing " so he gave us the pseudo codes that i
>implemented in language C.
>
>Here are the source codes :
>/* Optimized.c * /
>#include <stdio.h>
>#include <time.h>
>#include <stdlib.h>
>
>#define ELEMENTS 15000
>#define KOSEGEN 1
>#define UPPER 2
>#define LOWER 3
>
>
>int main()
>{
> int i,j;
> int **A;
> /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
> A = (int **) malloc( ELEMENTS * sizeof(int*) );

Don't cast the return from malloc. It cannot help but may cause the
compiler to suppress a diagnostic you would really want to see.

> for(i = 0; i < ELEMENTS; i++){
> if ( (A[i] = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
> printf(" yer yok \n");
> return -1;

The only portable return values are 0, EXIT_SUCCESS, and EXIT_FAILURE.

> }
> }
>
> time_t start,end;
>
> time (&start); /* gerekli sistem saatini cekme */
> for(i=0;i<ELEMENTS-1;i++){
> A[i][i] = KOSEGEN;
> for(j= i+1;j<ELEMENTS;j++) {
> A[i][j]= UPPER;
> A[j][i]= LOWER;
> }
> }
> A[ELEMENTS-1][ELEMENTS-1] = 1;

> time (&end); /* gerekli sistem saatini cekme */
> double dif = difftime (end,start); /* gecen sureyi dif degiskenine
>atma */

Unless you have a C99 compiler, declarations need to precede
executable statements.

> printf ("%.4lf seconds for optimized algo.\n", dif );
> free(A);

This causes several memory leaks. The memory allocated to each A[i]
is still allocated but now can never be released.

> scanf("%d");

This invokes undefined behavior. The %d requires an argument of type
int*. What was your intent here? Did you mean to use getchar

>}
>/* Optimized.c * /
>----------------------
>/* Unoptimized.c * /
>#include <stdio.h>
>#include <time.h>
>#include <stdlib.h>
>
>#define ELEMENTS 15000
>#define KOSEGEN 1
>#define UPPER 2
>#define LOWER 3
>
>
>int main()
>{
> int i,j;
> int **A;
> /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
> A = (int **) malloc( ELEMENTS * sizeof(int*) );
> for(i = 0; i < ELEMENTS; i++){
> if ( (A[i] = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
> printf(" yer yok \n");
> return -1;
> }
> }
>
> time_t start,end;
>
> time (&start); /* gerekli sistem saatini cekme */
> for(i=0;i<ELEMENTS;i++){
> for(j=0;j<ELEMENTS;j++) {
> if ( i<j)
> A[i][j] = UPPER;
> else if ( i>j)
> A[i][j] = LOWER;
> else
> A[i][j] = KOSEGEN;
> }
> }
> time (&end); /* gerekli sistem saatini cekme */
> double dif = difftime (end,start); /* gecen sureyi dif
>degiskenine atma */
> printf ("%.4lf seconds for unoptimized algo.\n", dif );
> free(A);
> scanf("%d");
>}
>/* Unoptimized.c * /
>
>
>Whats the reason for this ? Processing unit prefetching or something

Remove del for email