Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Re: C/C++ speed optimization bible/resources/pointers needed!

Reply
Thread Tools

Re: C/C++ speed optimization bible/resources/pointers needed!

 
 
galathaea
Guest
Posts: n/a
 
      07-26-2007
In article <. com>, wrote:

!! C/C++ speed optimization bible/resources/pointers needed!
!!
!! Hi all,
!!
!! I am in the middle of programming to solve an engineering problem
!! where the speed is huge concern. The project involving lots of
!! numerical integration and then there are several loops/levels of
!! optimization on top of the function evaluation engine. As you probably
!! know, the key to a successful optimization is a fast underlying
!! objective function evaluator. The faster it is, the more promising the
!! optimization result(perhaps global optimal). However our project
!! requires many numerical integrations which prohibits us from making it
!! super fast. At the heart of the numerical integration is a smart
!! integrator and a super-fast integrand function evaluator. Even worse,
!! our function evaluation is in complex-domain. So the kay point is how
!! to arrange our C/C++ code to make it highly efficient in every aspect.
!! Could anybody give some advice/pointers on how to improve the speed of
!! C/C++ program? How to arrange code? How to make it highly efficient
!! and super fast? What options do I have if I don't have luxury to use
!! multi-threaded, multi-core or distributed computing? But I do have a
!! P4 at least. Please recommend some good bibles and resources! Thank
!! you!

your best friend is a good optimising compiler

in my experience
codewarrior was top dog when metrowerks owned it
but now it looks like intel has the best reputation

you have to run benchmarks and profile code execution paths
but just choosing the right compiler has increased code speed 40% in tight loops for me in the past

( in fact - always verify with profiling what needs work
otherwise you will almost certainly waste time "optimising" nonessential areas )

avoid unnecessary copies

that may seem obvious
but that means all nonfundamental types should be passed as a const reference for inputs
and you may need to use "expression templates" to prevent certain intermediate copies in looped evaluations

see "c++ templates" by vandevoorde and josuttis for a description of expression templates
as applied to matrix computations and related problems

experiment with unrolling loops
you don't want to unroll too much and have your code walking all over your cache
but too little wastes unnecessary time messing with the iterator
there is usually a sweet spot

in general
any calculation that can be done during translationtime
saves you runtime

the compiler can usually do much of this
but look at other techniques like metaprogramming if profiling shows you are still wasting a lot of time here

cf. abrahams and gurtovoy's "c++ template metaprogramming"
or czarnecki and eisenecker's "generative programming" for similar methods

use the bit width of your processor
if you have (as you mention with p4) a 32-bit processor
be wary of data structures that use 8-bit chars (if they are that on your architecture)

i found once that a cryptographic routine that looped over chars for encryption
began running 10x faster (not just 4!) when i 32-bitised all the operations
because address space alignment is precious at the hardware level

also
you may benefit from branch prediction hints
compilers like gcc have hints you can add to if statements (likely/unlikely)
that can optimise the most used branches

i have not found that as useful in my work but it is used extensively in the linux kernel

and ask the right groups for help
fortran users are less likely to know c++ tricks than c++ users
( probably because they still think its faster despite the optimisations
that expression templates allow over it! )

i hope this helps some

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
galathaea: prankster, fablist, magician, liar
 
Reply With Quote
 
 
 
 
Luna Moon
Guest
Posts: n/a
 
      07-27-2007

"galathaea" <> wrote in message
news:galathaea-...
> In article <. com>,
> wrote:
>
> !! C/C++ speed optimization bible/resources/pointers needed!
> !!
> !! Hi all,
> !!
> !! I am in the middle of programming to solve an engineering problem
> !! where the speed is huge concern. The project involving lots of
> !! numerical integration and then there are several loops/levels of
> !! optimization on top of the function evaluation engine. As you probably
> !! know, the key to a successful optimization is a fast underlying
> !! objective function evaluator. The faster it is, the more promising the
> !! optimization result(perhaps global optimal). However our project
> !! requires many numerical integrations which prohibits us from making it
> !! super fast. At the heart of the numerical integration is a smart
> !! integrator and a super-fast integrand function evaluator. Even worse,
> !! our function evaluation is in complex-domain. So the kay point is how
> !! to arrange our C/C++ code to make it highly efficient in every aspect.
> !! Could anybody give some advice/pointers on how to improve the speed of
> !! C/C++ program? How to arrange code? How to make it highly efficient
> !! and super fast? What options do I have if I don't have luxury to use
> !! multi-threaded, multi-core or distributed computing? But I do have a
> !! P4 at least. Please recommend some good bibles and resources! Thank
> !! you!
>
> your best friend is a good optimising compiler
>
> in my experience
> codewarrior was top dog when metrowerks owned it
> but now it looks like intel has the best reputation
>
> you have to run benchmarks and profile code execution paths
> but just choosing the right compiler has increased code speed 40% in
> tight loops for me in the past
>
> ( in fact - always verify with profiling what needs work
> otherwise you will almost certainly waste time "optimising" nonessential
> areas )
>
> avoid unnecessary copies
>
> that may seem obvious
> but that means all nonfundamental types should be passed as a const
> reference for inputs
> and you may need to use "expression templates" to prevent certain
> intermediate copies in looped evaluations
>
> see "c++ templates" by vandevoorde and josuttis for a description of
> expression templates
> as applied to matrix computations and related problems
>
> experiment with unrolling loops
> you don't want to unroll too much and have your code walking all over
> your cache
> but too little wastes unnecessary time messing with the iterator
> there is usually a sweet spot
>
> in general
> any calculation that can be done during translationtime
> saves you runtime
>
> the compiler can usually do much of this
> but look at other techniques like metaprogramming if profiling shows you
> are still wasting a lot of time here
>
> cf. abrahams and gurtovoy's "c++ template metaprogramming"
> or czarnecki and eisenecker's "generative programming" for similar methods
>
> use the bit width of your processor
> if you have (as you mention with p4) a 32-bit processor
> be wary of data structures that use 8-bit chars (if they are that on your
> architecture)
>
> i found once that a cryptographic routine that looped over chars for
> encryption
> began running 10x faster (not just 4!) when i 32-bitised all the
> operations
> because address space alignment is precious at the hardware level
>
> also
> you may benefit from branch prediction hints
> compilers like gcc have hints you can add to if statements
> (likely/unlikely)
> that can optimise the most used branches
>
> i have not found that as useful in my work but it is used extensively in
> the linux kernel
>
> and ask the right groups for help
> fortran users are less likely to know c++ tricks than c++ users
> ( probably because they still think its faster despite the optimisations
> that expression templates allow over it! )
>



Indeed, it is very helpful! Thank you!

I am dealing with double and floating points most of the time, since I am
doing numerical programming. Do you and the other experts on these boards
have more to say and more pointers along those lines?

Thanks!


 
Reply With Quote
 
 
 
 
Victor Bazarov
Guest
Posts: n/a
 
      07-27-2007
Luna Moon wrote:
> [..]
> I am dealing with double and floating points most of the time, since
> I am doing numerical programming. Do you and the other experts on
> these boards have more to say and more pointers along those lines?


First off, please don't cross-post so wildly. Since I don't read the
'sci.*' hierarchy, I've removed those from my reply. If you think
the readers of those are going to be interested, post a summary of
your findings there later.

That aside... The secret to optimizing is not doing all those small
things that you're fiding all over the place (loop unrolling, etc.)
The secret is *not to perform unnecessary work*. That's it. The
rest is all about how you learn what is unnecessary and what you can
do to void perfoming it.

I am not sure you can find it in those books you were recommended.
I am not sure experimenting will do you good. If you don't know what
loop to unroll, are you going to unroll each separately, then all
pairs, then all triples, then... (you get the idea) to find which
combination buys you the best fraction of a percent of improvement?
Come on! Use the proper tools to find out what parts of your code
are in fact bottlenecks, and work on those. Once you use the best
algorithms and cache as much data as you can, you might want to try
all those loop unrollings or rearranging your data so you get fewer
page faults when iterating over the array or containers... But
without the tools you're shooting in the dark.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask


 
Reply With Quote
 
Hendrik van der Heijden
Guest
Posts: n/a
 
      07-27-2007
galathaea schrieb:

> you have to run benchmarks and profile code execution paths
> but just choosing the right compiler has increased code
> speed 40% in tight loops for me in the past


I've seen a 500% boost just from adding a compiler option (-mfpmath).
With this option, gcc uses SSE1/2 units for floating point calculations
(SISD, not vector code) instead of traditional x87 code.


Hendrik vdH
 
Reply With Quote
 
Luna Moon
Guest
Posts: n/a
 
      07-27-2007

"Victor Bazarov" <> wrote in message
news:. ..
> Luna Moon wrote:
>> [..]
>> I am dealing with double and floating points most of the time, since
>> I am doing numerical programming. Do you and the other experts on
>> these boards have more to say and more pointers along those lines?

>
> First off, please don't cross-post so wildly. Since I don't read the
> 'sci.*' hierarchy, I've removed those from my reply. If you think
> the readers of those are going to be interested, post a summary of
> your findings there later.
>
> That aside... The secret to optimizing is not doing all those small
> things that you're fiding all over the place (loop unrolling, etc.)
> The secret is *not to perform unnecessary work*. That's it. The
> rest is all about how you learn what is unnecessary and what you can
> do to void perfoming it.
>


I tell you the truth, at the algorithm level, I've been working on it for 1+
year. So we've already push in all what we can do. Now I just want to make
sure that at the implementation level, I won't get hurt by not paying
attention to doing "small" things. From past experience, not paying
attention to "small" things can easily make the implementation 1000% slower,
even with a super smart algorithm. So please focus on giving more detail
suggestions on implementation. Thank you!

> I am not sure you can find it in those books you were recommended.
> I am not sure experimenting will do you good. If you don't know what
> loop to unroll, are you going to unroll each separately, then all
> pairs, then all triples, then... (you get the idea) to find which
> combination buys you the best fraction of a percent of improvement?


I do need books and pointers, because I don't even know what is loop
unrolling.

> Come on! Use the proper tools to find out what parts of your code
> are in fact bottlenecks, and work on those.


How? The details or pointers are what I really need.

> Once you use the best
> algorithms and cache as much data as you can, you might want to try
> all those loop unrollings or rearranging your data so you get fewer
> page faults when iterating over the array or containers... But
> without the tools you're shooting in the dark.
>


Which tools? Please elaborate... Thanks a lot!


 
Reply With Quote
 
Luna Moon
Guest
Posts: n/a
 
      07-27-2007

"Hendrik van der Heijden" <> wrote in message
news:46a99140$0$31634$...
> galathaea schrieb:
>
>> you have to run benchmarks and profile code execution paths
>> but just choosing the right compiler has increased code
> > speed 40% in tight loops for me in the past

>
> I've seen a 500% boost just from adding a compiler option (-mfpmath).
> With this option, gcc uses SSE1/2 units for floating point calculations
> (SISD, not vector code) instead of traditional x87 code.
>
>
> Hendrik vdH


Great! This is exactly what I want... Thanks so much I will try it out!
Which complier?

We have done everything at the algorithm level, now we just want to make
sure our data structure, caches, and code organization don't do stupid
things to slow down and we try various tricks to squeeze up speed further.

Any more pointers? Hopefully there are some books/articles/resource
somewhere on this planet talking about highly efficient C++ code and
complier, etc.



 
Reply With Quote
 
Michael DOUBEZ
Guest
Posts: n/a
 
      07-27-2007
Luna Moon a écrit :
> "Victor Bazarov" <> wrote in message
> news:. ..
>> Luna Moon wrote:
>>> [..]
>>> I am dealing with double and floating points most of the time, since
>>> I am doing numerical programming. Do you and the other experts on
>>> these boards have more to say and more pointers along those lines?

>> First off, please don't cross-post so wildly. Since I don't read the
>> 'sci.*' hierarchy, I've removed those from my reply. If you think
>> the readers of those are going to be interested, post a summary of
>> your findings there later.
>>
>> That aside... The secret to optimizing is not doing all those small
>> things that you're fiding all over the place (loop unrolling, etc.)
>> The secret is *not to perform unnecessary work*. That's it. The
>> rest is all about how you learn what is unnecessary and what you can
>> do to void perfoming it.
>>

>
> I tell you the truth, at the algorithm level, I've been working on it for 1+
> year. So we've already push in all what we can do. Now I just want to make
> sure that at the implementation level, I won't get hurt by not paying
> attention to doing "small" things. From past experience, not paying
> attention to "small" things can easily make the implementation 1000% slower,
> even with a super smart algorithm. So please focus on giving more detail
> suggestions on implementation. Thank you!


Since we don't known your implementation/architecture/envirronement, it
would be hard to give you implementation suggestion except of the broad one:
- tune finely your compiler (use options that fit your needs)
- perhaps try different compilers
- make benchmark to see the impact
- Mark out parts that are bottleneck and find a solution
- avoid too frequent heap allocation
....

In all cases, you'd better validate the correctness and the robustness
of your implementation before any optimisation (good test coverage helps
a lot). Nitty gritty diference between float,double or other floating
point library you might use is irrelevant if you can't evaluate the
performances and the correctness.

>> I am not sure you can find it in those books you were recommended.
>> I am not sure experimenting will do you good. If you don't know what
>> loop to unroll, are you going to unroll each separately, then all
>> pairs, then all triples, then... (you get the idea) to find which
>> combination buys you the best fraction of a percent of improvement?

>
> I do need books and pointers, because I don't even know what is loop
> unrolling.


Loop unrolling is a solution for a problem you don't have. Find the
problem first. Don't worry, you'll find plenty

>
>> Come on! Use the proper tools to find out what parts of your code
>> are in fact bottlenecks, and work on those.

>
> How? The details or pointers are what I really need.
>
>> Once you use the best
>> algorithms and cache as much data as you can, you might want to try
>> all those loop unrollings or rearranging your data so you get fewer
>> page faults when iterating over the array or containers... But
>> without the tools you're shooting in the dark.
>>

>
> Which tools? Please elaborate... Thanks a lot!


You have been working on it for 1+ year and you've never ever performed
any benchmark to see if a lesser implementation would fit the need or
how perform conccurent algorithm ? Or to measure the improovement you
target compared to existing practice ? It must be a terrible algorithm.

The tools depend on your plateform. gprof under linux is widespread.
Google for "performance analysis tool".

Michael
 
Reply With Quote
 
Victor Bazarov
Guest
Posts: n/a
 
      07-27-2007
Luna Moon wrote:
> "Hendrik van der Heijden" <> wrote in message
> news:46a99140$0$31634$...
>> galathaea schrieb:
>>
>>> you have to run benchmarks and profile code execution paths
>>> but just choosing the right compiler has increased code
>>> speed 40% in tight loops for me in the past

>>
>> I've seen a 500% boost just from adding a compiler option (-mfpmath).
>> With this option, gcc uses SSE1/2 units for floating point
>> calculations (SISD, not vector code) instead of traditional x87 code.
>>
>>
>> Hendrik vdH

>
> Great! This is exactly what I want... Thanks so much I will try it
> out! Which complier?


'gcc', didn't you read the sentence?

>
> We have done everything at the algorithm level,


That's what I'd heard many times when folks asked me to find if it
was possible to improve anything. Then I'd find they have been using
'std::set<SomeType*>' to collect things "already touched" in an array
when processing it out of sequence. Some things one can find in some
codebases are unbelievable.

> now we just want to
> make sure our data structure, caches, and code organization don't do
> stupid things to slow down and we try various tricks to squeeze up
> speed further.
> Any more pointers? Hopefully there are some books/articles/resource
> somewhere on this planet talking about highly efficient C++ code and
> complier, etc.


Books: "Efficient C++" by Bulka & Mayhew, "High Performance Computing"
by Wadleigh & Crawford, "Effective" series by Meyers. I am not going
to suggest Knuth's "TACP", it may be too late. But when you have time
after stopping working on your current stuff, do give it a read, at
least leaf through the Contents section.

Tools: Intel VTune, AutomatedQA AQtime.

Web: www.google.com

Don't expect to find the instruction manual on how to speed up *your*
code. Do expect to find recommendations and speculations based on
a whole lot of assumptions. Nobody knows your code better than you,
and no thought process can replace measuring. So, do the measuring
first, then do the thinking.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask


 
Reply With Quote
 
Sherm Pendley
Guest
Posts: n/a
 
      07-27-2007
"Luna Moon" <> writes:

> "Victor Bazarov" <> wrote in message
> news:. ..
>
>> Come on! Use the proper tools to find out what parts of your code
>> are in fact bottlenecks, and work on those.

>
> How? The details or pointers are what I really need.


Use a profiler to measure your code's performance, then follow the 90/10
rule. A profiler will tell you what relative amount of time is being spent
in each part of your code. Most often, you'll find that 90% of the total
execution time is being spent in 10% of the code, which means that focusing
your optimization efforts on that 10% will give the best results.

You'll need to consult your tool vendor's documentation to see how to use
their profiler - for example, "man gprof" if you're using GCC.

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net
 
Reply With Quote
 
Michael Press
Guest
Posts: n/a
 
      07-28-2007
In article <f8clov$qh$>,
"Luna Moon" <> wrote:

> "Hendrik van der Heijden" <> wrote in message
> news:46a99140$0$31634$...
> > galathaea schrieb:
> >
> >> you have to run benchmarks and profile code execution paths
> >> but just choosing the right compiler has increased code
> > > speed 40% in tight loops for me in the past

> >
> > I've seen a 500% boost just from adding a compiler option (-mfpmath).
> > With this option, gcc uses SSE1/2 units for floating point calculations
> > (SISD, not vector code) instead of traditional x87 code.
> >
> >
> > Hendrik vdH

>
> Great! This is exactly what I want... Thanks so much I will try it out!
> Which complier?
>
> We have done everything at the algorithm level, now we just want to make
> sure our data structure, caches, and code organization don't do stupid
> things to slow down and we try various tricks to squeeze up speed further.
>
> Any more pointers? Hopefully there are some books/articles/resource
> somewhere on this planet talking about highly efficient C++ code and
> complier, etc.


Have you profiled your current implementation?

--
Michael Press
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with C++ Speed Optimization required BravoCharlie Software 0 05-09-2011 03:00 PM
Zero Optimization and Sign Optimization??? Ravikiran C Programming 22 11-24-2008 03:19 AM
Re: C/C++ speed optimization bible/resources/pointers needed! Luna Moon C++ 1 07-27-2007 02:50 PM
problem with microsoft speed optimization mjbackues at yahoo C++ 21 01-02-2006 06:56 PM
speed speed speed a.metselaar Computer Support 14 12-30-2003 03:34 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57