Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Superbasic efficiency question

Reply
Thread Tools

Superbasic efficiency question

 
 
aaronfude@gmail.com
Guest
Posts: n/a
 
      01-24-2005
I'm working on a scientific computing application. I need a class
called Element which is no more than a collection of integers, or
"nodes" and has only on method int getNode(int i).

I would like to implement in the most efficient was possible. So I
summoned up my programming intellect and asked myself: Do I want to
have members such as int a, b, c, d, e or a single member such as int
a[5]. So I wrote the following snippet and compiled it with a -O3 flag:

int main(char *argv[], int argc) {

/*
int a, b, c, d, e;
for (int i = 0; i < 1000000000; i++) {
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}

*/


int a[5];
for (int i = 0; i < 1000000000; i++) {
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
}

return 0;
}

The first (commented out) version ran twice as fast. (For doubles
instead of ints, it was a factor of 4). So the simpleton part of me
thinks that that answers my question. The remaining part tells me that
it is never that simple. Finally, the cynical part of me thinks that it
all doesn't matter and other parts of the program are bound to be far
more time consuming.

Please help me resolve my internal struggle.
Many thanks in advance!

Aaron Fude

 
Reply With Quote
 
 
 
 
Joseph Turian
Guest
Posts: n/a
 
      01-24-2005
Aaron,

Performance on the toy code snippet is not informative of anything more
than performance on the toy code snippet.

> Please help me resolve my internal struggle.


10 Write the code the most clear, maintable way possible.
20 Run the code under a profiler.
30 Optimize the actual performance bottleneck (as opposed to, say,
imaginary performance bottlenecks).
40 GOTO 20

If you have any problems with 30, get back to the group. Until then,
both you and us are just shooting in the dark.

Joseph

 
Reply With Quote
 
 
 
 
David White
Guest
Posts: n/a
 
      01-24-2005
"Joseph Turian" <> wrote in message
news: oups.com...
> Aaron,
>
> Performance on the toy code snippet is not informative of anything more
> than performance on the toy code snippet.
>
> > Please help me resolve my internal struggle.

>
> 10 Write the code the most clear, maintable way possible.
> 20 Run the code under a profiler.
> 30 Optimize the actual performance bottleneck (as opposed to, say,
> imaginary performance bottlenecks).


Optimizing might require significant changes, even major design changes. If
you know from the outset that speed is important, then consider it from the
outset. Toy code snippets can help you determine early on how you'll need to
go about doing certain things.

> 40 GOTO 20
>
> If you have any problems with 30, get back to the group. Until then,
> both you and us are just shooting in the dark.


DW


 
Reply With Quote
 
Sethalicious
Guest
Posts: n/a
 
      01-24-2005
I'm thinking that your getting a silent error on the code. I don't
believe an integer can handle the value 1000000000 in the following:

> for (int i = 0; i < 1000000000; i++)


So what's happening is your integer "i" is overflowing and getting set
to an unpredictable value (possibly negative). You might have better
luck using an "unsigned long" or even "std::size_t".

I'm not sure what the standard says, but I think most compilers
implement 32 bit integers. So an "unsigned" int is 0 to 2^32, and the
"signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

Check your compiler's docs to see the sizes of the integral data types.


Otherwise, there should be no real difference in performance of the
code with optimizations on. When you measure the time, remember to
record cpu clock cycles instead of time in seconds.
Hope this helps!

Chris J. (aka Sethalicious)

 
Reply With Quote
 
davidrubin@warpmail.net
Guest
Posts: n/a
 
      01-24-2005
My feeling is that when you try to implement 'getNode', you will find
that the array-based implementation is faster than a switch statement.
You might be able to squeeze out some more performance though with a
variable-based implementation if you templatize 'getNode' on i, and
specialize for values of i. (Presumably there is some more generic code
in your application which makes 'getNodeA', 'getNodeB', etc.,
prohibitive.)

 
Reply With Quote
 
Gianni Mariani
Guest
Posts: n/a
 
      01-24-2005
wrote:
> I'm working on a scientific computing application. I need a class
> called Element which is no more than a collection of integers, or
> "nodes" and has only on method int getNode(int i).
>
> I would like to implement in the most efficient was possible. So I
> summoned up my programming intellect and asked myself: Do I want to
> have members such as int a, b, c, d, e or a single member such as int
> a[5]. So I wrote the following snippet and compiled it with a -O3 flag:
>
> int main(char *argv[], int argc) {
>
> /*
> int a, b, c, d, e;
> for (int i = 0; i < 1000000000; i++) {
> a = 1;
> b = 2;
> c = 3;
> d = 4;
> e = 5;
> }
>
> */
>
>
> int a[5];
> for (int i = 0; i < 1000000000; i++) {
> a[0] = 1;
> a[1] = 2;
> a[2] = 3;
> a[3] = 4;
> a[4] = 5;
> }
>
> return 0;
> }
>
> The first (commented out) version ran twice as fast. (For doubles
> instead of ints, it was a factor of 4). So the simpleton part of me
> thinks that that answers my question. The remaining part tells me that
> it is never that simple. Finally, the cynical part of me thinks that it
> all doesn't matter and other parts of the program are bound to be far
> more time consuming.


It is more than likely that the compiler re-arranged your code

int a, b, c, d, e;
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
for (int i = 0; i < 1000000000; i++) {
}

Or, perhaps the compiler placed the values in registers.

There is a deeper design question for you.

Are these values really related ? Do you do operations on them in
tandem ? Would you ever think that it might be interesting to write a
template function with a "pointer to member" of one of these values ?

I would go with the 5 separate values if they are truly separate. That
way it will be harder to run into other problems like going past array
bounds or issues with using the wrong index etc.

Anyhow, below is an example where the compiler can't (easily) make the
optimization above. The results are essentially identical with:
gcc version 4.0.0 20050102 (experimental)


#include <ostream>
#include <iostream>

struct X
{
virtual void F() = 0; // hard for compiler to optimize this
};

struct A
{
int a, b, c, d, e;
};

struct B
{
int a[5];
};


struct Av
: A, X
{
Av()
: A()
{
}

virtual void F()
{
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}
};

struct Bv
: B, X
{
Bv()
: B()
{
}

virtual void F()
{
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
a[5] = 6;
}
};

int main( int argc, char ** argv )
{
X * x;
if ( argc >= 2 )
{
std::cout << "Making an A\n";
x = new Av;
}
else
{
std::cout << "Making a B\n";
x = new Bv;
}

for (int i = 0; i < 1000000000; i++)
{
x->F();
}

}


$ g++ -O3 -o ./optimal_array_or_members ./optimal_array_or_members.cpp
$ time ./optimal_array_or_members
Making a B
6.900u 0.000s 0:06.92 99.7% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members d
Making an A
6.770u 0.000s 0:06.78 99.8% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members
Making a B
6.960u 0.010s 0:06.96 100.1% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members s
Making an A
6.920u 0.000s 0:06.92 100.0% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members
Making a B
7.010u 0.000s 0:07.00 100.1% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members s
Making an A
6.950u 0.000s 0:06.95 100.0% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members
Making a B
6.770u 0.000s 0:06.76 100.1% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members s
Making an A
6.850u 0.000s 0:06.84 100.1% 0+0k 0+0io 216pf+0w

 
Reply With Quote
 
David White
Guest
Posts: n/a
 
      01-24-2005
<> wrote in message
news: oups.com...

[snip]

> int main(char *argv[], int argc) {
>
> /*
> int a, b, c, d, e;
> for (int i = 0; i < 1000000000; i++) {
> a = 1;
> b = 2;
> c = 3;
> d = 4;
> e = 5;
> }
>
> */
>
>
> int a[5];
> for (int i = 0; i < 1000000000; i++) {
> a[0] = 1;
> a[1] = 2;
> a[2] = 3;
> a[3] = 4;
> a[4] = 5;
> }
>
> return 0;
> }
>
> The first (commented out) version ran twice as fast. (For doubles
> instead of ints, it was a factor of 4). So the simpleton part of me
> thinks that that answers my question. The remaining part tells me that
> it is never that simple. Finally, the cynical part of me thinks that it
> all doesn't matter and other parts of the program are bound to be far
> more time consuming.


The most likely reason for the difference is that the compiler automatically
places arrays in main memory but individual objects in registers when
possible. I don't see how this test helps for your particular problem. You
need your collection to be indexable, so you don't really have the option of
using individual named variables. Although test programs like this can be
useful, you have to determine the reason for the performance difference and
consider whether the results will hold for your program. If the reason is
register use, then the test probably doesn't mean much.

DW


 
Reply With Quote
 
Joseph Turian
Guest
Posts: n/a
 
      01-24-2005
Aaron,

Performance on the toy code snippet is not informative of anything more
than performance on the toy code snippet.

> Please help me resolve my internal struggle.


10 Write the code the most clear, maintable way possible.
20 Run the code under a profiler.
30 Optimize the actual performance bottleneck (as opposed to, say,
imaginary performance bottlenecks).
40 GOTO 20

If you have any problems with 30, get back to the group. Until then,
both you and us are just shooting in the dark.

Joseph

 
Reply With Quote
 
davidrubin@warpmail.net
Guest
Posts: n/a
 
      01-24-2005
My feeling is that when you try to implement 'getNode', you will find
that an array-based implemenation is faster than a switch statement.
OTOH, you might find that templatizing 'getNode' on i, and then
specializing for values of i is faster yet. Presumably, there is some
more generic part of your application that makes 'getNodeA',
'getNodeB', etc., prohibitive. /david

 
Reply With Quote
 
davidrubin@warpmail.net
Guest
Posts: n/a
 
      01-24-2005
My feeling is that when you try to implement 'getNode', you will find
that an array-based implemenation is faster than a switch statement.
OTOH, you might find that templatizing 'getNode' on i, and then
specializing for values of i is faster yet. Presumably, there is some
more generic part of your application that makes 'getNodeA',
'getNodeB', etc., prohibitive.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DataGrid paging - a question of efficiency Shawn ASP .Net 6 02-13-2005 08:44 PM
Superbasic efficiency question aaronfude@gmail.com C Programming 38 01-26-2005 02:13 AM
Huge SQL query and ASP.NET...Question of efficiency The Eeediot ASP .Net 3 11-16-2004 10:12 PM
Performance/efficiency question Edward A Thompson Java 11 01-27-2004 03:00 AM
dataset efficiency question Trevor Hartman ASP .Net 0 07-03-2003 08:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57