Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > New wikibook about efficient C++

Reply
Thread Tools

New wikibook about efficient C++

 
 
Carlo Milanesi
Guest
Posts: n/a
 
      06-08-2008
Hello,
I just completed writing an online book about developing efficient
software using the C++ language.
You can find it here: http://en.wikibooks.org/wiki/Optimizing_C%2B%2B
It is a wiki, that is everyone can change it or only add critical
comments to the pages.
Everyone is invited to improve it. But before applying major changes,
please read these guidelines:
http://en.wikibooks.org/wiki/Optimiz...es_for_editors

--
Carlo Milanesi
http://digilander.libero.it/carlmila
 
Reply With Quote
 
 
 
 
Juha Nieminen
Guest
Posts: n/a
 
      06-09-2008
Carlo Milanesi wrote:
> I just completed writing an online book about developing efficient
> software using the C++ language.


The text contains mostly irrelevant and sometimes erroneous
suggestions. Most of the suggested micro-optimizations will usually not
make your program any faster at all and thus are completely irrelevant.
Some examples:

"Don't check that a pointer is non-null before calling delete on it."

While it's true that you don't have to check for null before deleting,
speedwise that's mostly irrelevant. In most systems with most compilers
the 'delete' itself is a rather heavy operation, and the extra clock
cycles added by the conditional will not make the program relevantly
slower. It might become more relevant if you have a super-fast
specialized memory allocator where a 'delete' takes next to nothing of
time, and humongous amounts of objects are deleted in a short time.
However, in normal situations it's irrelevant.

"Declare const every member function that does not change the state of
the object on which it is applied."

Mostly irrelevant speedwise.

"Instead of writing a for loop over an STL container, use an STL
algorithm with a function-object or a lambda expression"

Why is that any faster than the for loop? (In fact, it might even be
slower if, for whatever reason, the compiler is unable to inline the
lambda function.)

"Though, every time a function containing substantial code is inlined,
the machine code is duplicated, and therefore the total size of the
program is increased, causing a general slowing down."

Mostly not true. There is no necessary correlation between code size
and speed. In fact, sometimes a longer piece of code may perform faster
than a shorter one (for example loop unrolling performed by the compiler
sometimes produces faster code, even in modern processors).

"Among non-tiny functions, only the performance critical ones are to be
declared inline at the optimization stage."

The main reason to decide whether to declare a larger function
'inline' or not is not about speed. The compiler has heuristics for this
and will not inline the function if it estimates that it would be
counter-productive. The main reason to use or avoid 'inline' has more to
do with the quality of the source code.

"In addition, every virtual member function occupies some more space"

Irrelevant, unless you are developing for an embedded system with a
*very* small amount of memory.

"Do not null a pointer after having called the delete operator on it, if
you are sure that this pointer will no more be used."

Irrelevant. The 'delete' itself will usually be so slow that the
additional assignment won't change the anything.

"Garbage collection, that is automatic reclamation of unreferenced
memory, provides the ease to forget about memory deallocation, and
prevents memory leaks. Such feature is not provided by the standard
library, but is provided by non-standard libraries. Though, such memory
management technique causes a performance worse than explicit
deallocation (that is when the delete operator is explicitly called)."

This is simply not true. In fact, GC can be made faster than explicit
deallocation, at least compared to the default memory allocator used by
most C++ compilers.

"To perform input/output operations, instead of calling the standard
library functions, call directly the operating system primitives."

Dubious advice. The general (and portable) advice for fast I/O is to
use fread() and fwrite() for large blocks of data (the C++ equivalents
may actually be equally fast, if called rarely). If very small amounts
of data (such as characters) need to be read or written individually,
use the correspondent C I/O functions.

In places where I/O speed is irrelevant, this advice is
counter-productive.

"Look-up table"

This was relevant in the early 90's. Nowadays it's less evident. With
modern CPUs sometimes using a lookup table instead of a seemingly "slow"
function might actually be slower, depending on a ton of factors.

"Instead of doing case-insensitive comparisons between a strings,
transform all the letters to uppercase (or to lowercase), and then do
case-sensitive comparisons."

Yeah, because converting the string does not take time?

 
Reply With Quote
 
 
 
 
Carlo Milanesi
Guest
Posts: n/a
 
      06-09-2008
Juha Nieminen ha scritto:
> Carlo Milanesi wrote:
>> I just completed writing an online book about developing efficient
>> software using the C++ language.

>
> The text contains mostly irrelevant and sometimes erroneous
> suggestions. Most of the suggested micro-optimizations will usually not
> make your program any faster at all and thus are completely irrelevant.
> Some examples:


You look too harsh! The book contains 98 advices, your critiques regard
only 11 of them. Are you sure that there others are mostly completely
irrelevant to a non-expert programmer?

> "Don't check that a pointer is non-null before calling delete on it."
>
> While it's true that you don't have to check for null before deleting,
> speedwise that's mostly irrelevant. In most systems with most compilers
> the 'delete' itself is a rather heavy operation, and the extra clock
> cycles added by the conditional will not make the program relevantly
> slower. It might become more relevant if you have a super-fast
> specialized memory allocator where a 'delete' takes next to nothing of
> time, and humongous amounts of objects are deleted in a short time.
> However, in normal situations it's irrelevant.


I agree that in normal situations it's almost irrelevant, but it is
nevertheless a useless operation that I have seen done by some programmers.

> "Declare const every member function that does not change the state of
> the object on which it is applied."
>
> Mostly irrelevant speedwise.


Actually, I never found useful this advice, but I was told that some
compilers could exploit the constness to optimixe the code.
I am going to remove this advice.

> "Instead of writing a for loop over an STL container, use an STL
> algorithm with a function-object or a lambda expression"
>
> Why is that any faster than the for loop? (In fact, it might even be
> slower if, for whatever reason, the compiler is unable to inline the
> lambda function.)


In the book "C++ Coding Standards" it is written:
"algorithms are also often more efficient than naked loops".
It is explained that they avoid minor inefficiencies introduced by
non-expert programmers, that they exploit the inside knowledge of the
standard containers, and some of them implement sophisticated algorithms
that the average programmer does not know or does not have time to
implement.
Do you think it is better to remove altogether this advice, or it is
better to change it?

> "Though, every time a function containing substantial code is inlined,
> the machine code is duplicated, and therefore the total size of the
> program is increased, causing a general slowing down."
>
> Mostly not true. There is no necessary correlation between code size
> and speed. In fact, sometimes a longer piece of code may perform faster
> than a shorter one (for example loop unrolling performed by the compiler
> sometimes produces faster code, even in modern processors).


The correlation is the code cache size. If you inline almost all the
functions, you get code bloat, i.e. the code does not fit the code
caches. Even compilers do not unroll completely a loop of 1000 iterations.
What guideline do you suggest for the first coding (optimization are
considered later)?

> "Among non-tiny functions, only the performance critical ones are to be
> declared inline at the optimization stage."
>
> The main reason to decide whether to declare a larger function
> 'inline' or not is not about speed. The compiler has heuristics for this
> and will not inline the function if it estimates that it would be
> counter-productive. The main reason to use or avoid 'inline' has more to
> do with the quality of the source code.


Before that it is written that if the compiler can decide wich functions
to inline, there is no need to declare them "inline".
This guideline applies to compilers that need explicit inlining.

> "In addition, every virtual member function occupies some more space"
>
> Irrelevant, unless you are developing for an embedded system with a
> *very* small amount of memory.


OK, I am going to remove this.

> "Do not null a pointer after having called the delete operator on it, if
> you are sure that this pointer will no more be used."
>
> Irrelevant. The 'delete' itself will usually be so slow that the
> additional assignment won't change the anything.


Analogous to the check-before-delete.

> "Garbage collection, that is automatic reclamation of unreferenced
> memory, provides the ease to forget about memory deallocation, and
> prevents memory leaks. Such feature is not provided by the standard
> library, but is provided by non-standard libraries. Though, such memory
> management technique causes a performance worse than explicit
> deallocation (that is when the delete operator is explicitly called)."
>
> This is simply not true. In fact, GC can be made faster than explicit
> deallocation, at least compared to the default memory allocator used by
> most C++ compilers.


Then why not everyone is using it, and not every guru is recommending it?
I have never measured GC performance. Are there any research papers
aroun about its performance in C++ projects?

> "To perform input/output operations, instead of calling the standard
> library functions, call directly the operating system primitives."
>
> Dubious advice. The general (and portable) advice for fast I/O is to
> use fread() and fwrite() for large blocks of data (the C++ equivalents
> may actually be equally fast, if called rarely). If very small amounts
> of data (such as characters) need to be read or written individually,
> use the correspondent C I/O functions.
>
> In places where I/O speed is irrelevant, this advice is
> counter-productive.


This is a "bottleneck" optimization, as everyone in chapters 4 and 5.
Anyway, I will add a guideline about big buffers and aother one about
keeping files open.

> "Look-up table"
>
> This was relevant in the early 90's. Nowadays it's less evident. With
> modern CPUs sometimes using a lookup table instead of a seemingly "slow"
> function might actually be slower, depending on a ton of factors.


This is a "possible" optimization, as everyone in chapters 4 and 5.
If the cost of the computation is bigger than the cost to retrieve the
pre-computed result, then the look-up table is faster.
Some functions take a lot to be computed, even with modern CPUs.
Even in the example of "sqrt", that is quite fast, the look-up table
routine, if inlined, is more than twice as fast.
With a function like pow(x, 1./3), it is 13 times as fast on my computer.

> "Instead of doing case-insensitive comparisons between a strings,
> transform all the letters to uppercase (or to lowercase), and then do
> case-sensitive comparisons."
>
> Yeah, because converting the string does not take time?


I meant the following.
- When loading a collection, convert the case of all the strings.
- When searching the collection for a string, convert the case of that
string before searching.
This makes the loading slower, but for an enough large collection, it
makes the search faster.
Many databases are actually case-insensitive. Why?

Thank you for your comments.
Do you have any guidelines to suggest for the inclusion in the book?

--
Carlo Milanesi
http://digilander.libero.it/carlmila
 
Reply With Quote
 
Kai-Uwe Bux
Guest
Posts: n/a
 
      06-09-2008
Carlo Milanesi wrote:

> Juha Nieminen ha scritto:
>> Carlo Milanesi wrote:

[snip]
>> "Instead of writing a for loop over an STL container, use an STL
>> algorithm with a function-object or a lambda expression"
>>
>> Why is that any faster than the for loop? (In fact, it might even be
>> slower if, for whatever reason, the compiler is unable to inline the
>> lambda function.)

>
> In the book "C++ Coding Standards" it is written:
> "algorithms are also often more efficient than naked loops".
> It is explained that they avoid minor inefficiencies introduced by
> non-expert programmers, that they exploit the inside knowledge of the
> standard containers, and some of them implement sophisticated algorithms
> that the average programmer does not know or does not have time to
> implement.
> Do you think it is better to remove altogether this advice, or it is
> better to change it?


In principle, algorithms could make use of special knowledge about
implementation details of containers such a deque and create faster code
that way. Also, such specializations could be provided for stream and
streambuf iterators. I think Dietmar Kuehl had some code in that direction.
However, it is far from clear that STL implementations in widespread use
have such optimizations built in.

As for the wiki, I would leave the item but add a word of caution. After
all, if you are stuck with a compiler that does a poor job at optimizing
away the abstraction overhead of functors, it could lead to worse
performance; but if you have a library that uses special trickery inside,
it could boost performance. It's one of the many cases where measurement is
paramount and awareness of issues is what is required of the programmer.


Best

Kai-Uwe Bux
 
Reply With Quote
 
Noah Roberts
Guest
Posts: n/a
 
      06-09-2008
Juha Nieminen wrote:

> "Instead of writing a for loop over an STL container, use an STL
> algorithm with a function-object or a lambda expression"
>
> Why is that any faster than the for loop? (In fact, it might even be
> slower if, for whatever reason, the compiler is unable to inline the
> lambda function.)


Probably based on the fact that the algorithm can take advantage of
implementation specific knowledge whereas your for loop can't, or at
least shouldn't.

I don't know that any implementation does this though.
 
Reply With Quote
 
Bo Persson
Guest
Posts: n/a
 
      06-09-2008
Carlo Milanesi wrote:
> Juha Nieminen ha scritto:
>> Carlo Milanesi wrote:
>>> I just completed writing an online book about developing efficient
>>> software using the C++ language.

>
>> "Don't check that a pointer is non-null before calling delete on
>> it." While it's true that you don't have to check for null before
>> deleting, speedwise that's mostly irrelevant. In most systems with
>> most compilers the 'delete' itself is a rather heavy operation,
>> and the extra clock cycles added by the conditional will not make
>> the program relevantly slower. It might become more relevant if
>> you have a super-fast specialized memory allocator where a
>> 'delete' takes next to nothing of time, and humongous amounts of
>> objects are deleted in a short time. However, in normal situations
>> it's irrelevant.

>
> I agree that in normal situations it's almost irrelevant, but it is
> nevertheless a useless operation that I have seen done by some
> programmers.


Some of us, and some of the code, have been around since before the
standard was set. Portable code once had to have the checks.

Nowadays, compilers are smarter and at least one optimizes for this
case by removing its own check if it is not needed:

if (_MyWords != nullptr)
0041E0CC mov esi,dword ptr [ebx+14h]
0041E0CF test esi,esi
0041E0D1 je xxx::~xxx+9Ch (41E0ECh)
delete _MyWords;
0041E0D3 mov eax,dword ptr [esi+4]
0041E0D6 test eax,eax
0041E0D8 je xxx::~xxx+93h (41E0E3h)
0041E0DA push eax
0041E0DB call operator delete (4201CEh)
0041E0E0 add esp,4
0041E0E3 push esi
0041E0E4 call operator delete (4201CEh)
0041E0E9 add esp,4


// if (_MyWords != nullptr)
delete _MyWords;
0041E0CC mov esi,dword ptr [ebx+14h]
0041E0CF test esi,esi
0041E0D1 je xxx::~xxx+9Ch (41E0ECh)
0041E0D3 mov eax,dword ptr [esi+4]
0041E0D6 test eax,eax
0041E0D8 je xxx::~xxx+93h (41E0E3h)
0041E0DA push eax
0041E0DB call operator delete (4201CEh)
0041E0E0 add esp,4
0041E0E3 push esi
0041E0E4 call operator delete (4201CEh)
0041E0E9 add esp,4


>
>> "In addition, every virtual member function occupies some more
>> space" Irrelevant, unless you are developing for an embedded system
>> with a *very* small amount of memory.

>
> OK, I am going to remove this.


The rule here is of course that if you need a function to be virtual,
you just have to make it virtual. If you don't , you don't.


>
>> "Do not null a pointer after having called the delete operator on
>> it, if you are sure that this pointer will no more be used."
>>
>> Irrelevant. The 'delete' itself will usually be so slow that the
>> additional assignment won't change the anything.

>
> Analogous to the check-before-delete.


Also for the compiler. If the nulled pointer isn't actually used, the
compiler is likely to optimize away the assignment anyway.

A better rule is to use delete just before the pointer goes out of
scope. Then there is no problem.

Even better is to use a smart pointer or a container that manages
everything for you.


>
>> "Garbage collection, that is automatic reclamation of unreferenced
>> memory, provides the ease to forget about memory deallocation, and
>> prevents memory leaks. Such feature is not provided by the standard
>> library, but is provided by non-standard libraries. Though, such
>> memory management technique causes a performance worse than
>> explicit deallocation (that is when the delete operator is
>> explicitly called)." This is simply not true. In fact, GC can be
>> made faster than
>> explicit deallocation, at least compared to the default memory
>> allocator used by most C++ compilers.

>
> Then why not everyone is using it, and not every guru is
> recommending it? I have never measured GC performance. Are there
> any research papers aroun about its performance in C++ projects?


As usual, it depends.

Some "gurus" actually do use GC when there is an advantage. This one,
for example:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/



Bo Persson


 
Reply With Quote
 
Bo Persson
Guest
Posts: n/a
 
      06-09-2008
Kai-Uwe Bux wrote:
> Carlo Milanesi wrote:
>
>> Juha Nieminen ha scritto:
>>> Carlo Milanesi wrote:

> [snip]
>>> "Instead of writing a for loop over an STL container, use an STL
>>> algorithm with a function-object or a lambda expression"
>>>
>>> Why is that any faster than the for loop? (In fact, it might
>>> even be slower if, for whatever reason, the compiler is unable to
>>> inline the lambda function.)

>>
>> In the book "C++ Coding Standards" it is written:
>> "algorithms are also often more efficient than naked loops".
>> It is explained that they avoid minor inefficiencies introduced by
>> non-expert programmers, that they exploit the inside knowledge of
>> the standard containers, and some of them implement sophisticated
>> algorithms that the average programmer does not know or does not
>> have time to implement.
>> Do you think it is better to remove altogether this advice, or it
>> is better to change it?

>
> In principle, algorithms could make use of special knowledge about
> implementation details of containers such a deque and create faster
> code that way. Also, such specializations could be provided for
> stream and streambuf iterators. I think Dietmar Kuehl had some code
> in that direction. However, it is far from clear that STL
> implementations in widespread use have such optimizations built in.


It seems like the compilers are now smart enough to do most of this
work on their own.

Benchmarking a vector against a deque show little difference in
traversal speed. Some of this is a cache effect win for the contiguous
vector, leaving very little to gain for an improved deque iterator.

>
> As for the wiki, I would leave the item but add a word of caution.
> After all, if you are stuck with a compiler that does a poor job at
> optimizing away the abstraction overhead of functors, it could lead
> to worse performance; but if you have a library that uses special
> trickery inside, it could boost performance. It's one of the many
> cases where measurement is paramount and awareness of issues is
> what is required of the programmer.
>


Yes, optimizing for weak compilers is very tricky. Getting another
compiler might be a better idea, but perhaps not possible.

Perhaps this kind of advice should be tagged with compiler version?



Bo Persson


 
Reply With Quote
 
peter koch
Guest
Posts: n/a
 
      06-09-2008
On 9 Jun., 19:08, Carlo Milanesi <carlo.milanesi.no.s...@libero.it>
wrote:
> Juha Nieminen ha scritto:
>
> > Carlo Milanesi wrote:
> >> I just completed writing an online book about developing efficient
> >> software using the C++ language.

>
> > * The text contains mostly irrelevant and sometimes erroneous
> > suggestions. Most of the suggested micro-optimizations will usually not
> > make your program any faster at all and thus are completely irrelevant.
> > * Some examples:

>
> You look too harsh! The book contains 98 advices, your critiques regard
> only 11 of them. Are you sure that there others are mostly completely
> irrelevant to a non-expert programmer?


I do not find that your recommendations are to bad. While I mostly
agre with Juha, much of the advice you give is good even if is not
related to (program) performance. The first advice of not checking a
pointer for null before deleting it, for example, is very good advice
if you wish to increase programmer performance

The worst advice I saw (and I only read Juhas post) was to avoid the
standard library for I/O. And yet, today it is unfortunately quite
relevant should your program have the bottleneck in formatted I/O.

/Peter
 
Reply With Quote
 
coal@mailvault.com
Guest
Posts: n/a
 
      06-09-2008
On Jun 9, 11:08*am, Carlo Milanesi <carlo.milanesi.no.s...@libero.it>
wrote:
> Juha Nieminen ha scritto:
>
> > Carlo Milanesi wrote:
> >> I just completed writing an online book about developing efficient
> >> software using the C++ language.

>
> > * The text contains mostly irrelevant and sometimes erroneous
> > suggestions. Most of the suggested micro-optimizations will usually not
> > make your program any faster at all and thus are completely irrelevant.
> > * Some examples:

>
> You look too harsh! The book contains 98 advices, your critiques regard
> only 11 of them. Are you sure that there others are mostly completely
> irrelevant to a non-expert programmer?
>
> > "Don't check that a pointer is non-null before calling delete on it."

>
> > * While it's true that you don't have to check for null before deleting,
> > speedwise that's mostly irrelevant. In most systems with most compilers
> > the 'delete' itself is a rather heavy operation, and the extra clock
> > cycles added by the conditional will not make the program relevantly
> > slower. It might become more relevant if you have a super-fast
> > specialized memory allocator where a 'delete' takes next to nothing of
> > time, and humongous amounts of objects are deleted in a short time.
> > However, in normal situations it's irrelevant.

>
> I agree that in normal situations it's almost irrelevant, but it is
> nevertheless a useless operation that I have seen done by some programmers..
>
> > "Declare const every member function that does not change the state of
> > the object on which it is applied."

>
> > * Mostly irrelevant speedwise.

>
> Actually, I never found useful this advice, but I was told that some
> compilers could exploit the constness to optimixe the code.
> I am going to remove this advice.
>
> > "Instead of writing a for loop over an STL container, use an STL
> > algorithm with a function-object or a lambda expression"

>
> > * Why is that any faster than the for loop? (In fact, it might even be
> > slower if, for whatever reason, the compiler is unable to inline the
> > lambda function.)

>
> In the book "C++ Coding Standards" it is written:
> "algorithms are also often more efficient than naked loops".
> It is explained that they avoid minor inefficiencies introduced by
> non-expert programmers, that they exploit the inside knowledge of the
> standard containers, and some of them implement sophisticated algorithms
> that the average programmer does not know or does not have time to
> implement.



I favor the "naked loops" in an automated context. If there are
inefficiencies in this code
http://webEbenezer.net/comp/Msgs.hh

I'm interested in knowing what they are.

Brian Wood
Ebenezer Enterprises
www.webEbenezer.net

"A wise man is strong; yea, a man of knowledge increaseth strength."
Proverbs 24:5
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-10-2008
On Jun 9, 7:08 pm, Carlo Milanesi <carlo.milanesi.no.s...@libero.it>
wrote:
> Juha Nieminen ha scritto:


[...]
> > "Garbage collection, that is automatic reclamation of
> > unreferenced memory, provides the ease to forget about
> > memory deallocation, and prevents memory leaks. Such feature
> > is not provided by the standard library, but is provided by
> > non-standard libraries. Though, such memory management
> > technique causes a performance worse than explicit
> > deallocation (that is when the delete operator is explicitly
> > called)."


> > This is simply not true. In fact, GC can be made faster than
> > explicit deallocation, at least compared to the default
> > memory allocator used by most C++ compilers.


> Then why not everyone is using it, and not every guru is
> recommending it?


Many do (Stroustrup, for example). And the guru's that
recommend against it don't do so on performance grounds. The
one thing I think all gurus agree on is that performancewise, it
all depends. There are programs where garbage collection will
speed the program up, and there are programs which will run
slower with it. And that in all cases, it depends on the actual
implementation of the manual management or the garbage
collection.

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
where can I find good samples for efficient computation of matrix multiplication? walala VHDL 2 03-24-2010 10:06 AM
Javascript new-new-new-new-newbee weblinkunlimited@gmail.com Javascript 2 03-11-2008 01:15 AM
How to implenetment an efficient shifter Fano VHDL 9 10-16-2003 08:58 PM
Hyper-efficient Text Importing Awah Teh ASP .Net 2 09-03-2003 12:50 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57