In article <. com>,
wrote:
!! C/C++ speed optimization bible/resources/pointers needed!
!!
!! Hi all,
!!
!! I am in the middle of programming to solve an engineering problem
!! where the speed is huge concern. The project involving lots of
!! numerical integration and then there are several loops/levels of
!! optimization on top of the function evaluation engine. As you probably
!! know, the key to a successful optimization is a fast underlying
!! objective function evaluator. The faster it is, the more promising the
!! optimization result(perhaps global optimal). However our project
!! requires many numerical integrations which prohibits us from making it
!! super fast. At the heart of the numerical integration is a smart
!! integrator and a super-fast integrand function evaluator. Even worse,
!! our function evaluation is in complex-domain. So the kay point is how
!! to arrange our C/C++ code to make it highly efficient in every aspect.
!! Could anybody give some advice/pointers on how to improve the speed of
!! C/C++ program? How to arrange code? How to make it highly efficient
!! and super fast? What options do I have if I don't have luxury to use
!! multi-threaded, multi-core or distributed computing? But I do have a
!! P4 at least. Please recommend some good bibles and resources! Thank
!! you!
your best friend is a good optimising compiler
in my experience
codewarrior was top dog when metrowerks owned it
but now it looks like intel has the best reputation
you have to run benchmarks and profile code execution paths
but just choosing the right compiler has increased code speed 40% in tight loops for me in the past
( in fact - always verify with profiling what needs work
otherwise you will almost certainly waste time "optimising" nonessential areas )
avoid unnecessary copies
that may seem obvious
but that means all nonfundamental types should be passed as a const reference for inputs
and you may need to use "expression templates" to prevent certain intermediate copies in looped evaluations
see "c++ templates" by vandevoorde and josuttis for a description of expression templates
as applied to matrix computations and related problems
experiment with unrolling loops
you don't want to unroll too much and have your code walking all over your cache
but too little wastes unnecessary time messing with the iterator
there is usually a sweet spot
in general
any calculation that can be done during translationtime
saves you runtime
the compiler can usually do much of this
but look at other techniques like metaprogramming if profiling shows you are still wasting a lot of time here
cf. abrahams and gurtovoy's "c++ template metaprogramming"
or czarnecki and eisenecker's "generative programming" for similar methods
use the bit width of your processor
if you have (as you mention with p4) a 32-bit processor
be wary of data structures that use 8-bit chars (if they are that on your architecture)
i found once that a cryptographic routine that looped over chars for encryption
began running 10x faster (not just 4!) when i 32-bitised all the operations
because address space alignment is precious at the hardware level
also
you may benefit from branch prediction hints
compilers like gcc have hints you can add to if statements (likely/unlikely)
that can optimise the most used branches
i have not found that as useful in my work but it is used extensively in the linux kernel
and ask the right groups for help
fortran users are less likely to know c++ tricks than c++ users
( probably because they still think its faster despite the optimisations
that expression templates allow over it! )
i hope this helps some
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
galathaea: prankster, fablist, magician, liar