Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Pointers to member functions are NOT useless

Reply
Thread Tools

Pointers to member functions are NOT useless

 
 
Francesco S. Carta
Guest
Posts: n/a
 
      07-25-2010
Jonathan Lee <(E-Mail Removed)>, on 24/07/2010 21:10:25, wrote:

> On Jul 24, 7:57 pm, "Francesco S. Carta"<(E-Mail Removed)> wrote:
>> You wasted cycles, and potentially, you heavily slowed down the program:
>> imagine if there are hundreds or thousands of such objects in a
>> simulation, would you then try to save as much as possible? I know I would.

>
> This is conjecture, of course. Nor is your assessment about the
> performance obvious. In my code, the switch could easily be replaced
> with a jump table, which might not be any slower. Also, the use of
> a pointer to member function removes the possibility of profile
> guided optimization and will probably ruin cache prediction.
>
> Testing, of course, would be necessary. But this example has gone from
> "generally useful" to "niche optimization" :/


Sorry, I was of course sure of what I said, and I completely ignored the
possibility of that optimization - that's not the first time I've read
about it, but that should just mean that I missed to understand it...
the next (similar) time I'll phrase those sentences as questions

I should get a better grip about this jump table optimization thing (and
the other things you mentioned), because I'm still not able to see how a
direct call can be made equally fast with a table lookup /and/ a direct
call, any further detail will be welcome.

--
FSC - http://userscripts.org/scripts/show/59948
http://fscode.altervista.org - http://sardinias.com
 
Reply With Quote
 
 
 
 
Francesco S. Carta
Guest
Posts: n/a
 
      07-25-2010
Jonathan Lee <(E-Mail Removed)>, on 24/07/2010 21:10:25, wrote:

> On Jul 24, 7:57 pm, "Francesco S. Carta"<(E-Mail Removed)> wrote:
>> You wasted cycles, and potentially, you heavily slowed down the program:
>> imagine if there are hundreds or thousands of such objects in a
>> simulation, would you then try to save as much as possible? I know I would.

>
> This is conjecture, of course. Nor is your assessment about the
> performance obvious. In my code, the switch could easily be replaced
> with a jump table, which might not be any slower. Also, the use of
> a pointer to member function removes the possibility of profile
> guided optimization and will probably ruin cache prediction.


Indeed, my ignorance led me to assert something that cannot reasonably
be true in all the cases - furthermore if the code is open to
optimizations, in which case my assertion will be likely false, as I
understood from your words and as I have seen with my eyes: I've created
a test program to compare the performance of the two approaches, and the
version using pointers to member functions happens to be somewhat faster
only if, in the version using the switch, I prevent the compiler from
inlining the called functions.

If I allow the compiler to inline them, then the switch version becomes
significantly faster.

> Testing, of course, would be necessary. But this example has gone from
> "generally useful" to "niche optimization" :/


My fault, of course, but fortunately not my intention.

I'm still interested in the general question and any information on how
to do something similar (i.e. context-depending member function choice)
using templates will be welcome - I'm thinking about it from yesterday
and still I cannot even imagine where to start.

Thank you for your patience and for your feedback

//-------
#include <iostream>
#include <iomanip>
#include <ctime>

//#define PREVENT_FUNCTION_INLINING

using namespace std;

enum Context {
get_sum,
get_difference,
get_average
};

struct Base {
#ifdef PREVENT_FUNCTION_INLINING
double add(double a, double b) const;
double subtract(double a, double b) const;
double mean(double a, double b) const;
#else
double add(double a, double b) const {
return a + b;
}
double subtract(double a, double b) const {
return a - b;
}
double mean(double a, double b) const {
return (a + b) / 2;
}
#endif
virtual void set_context(Context) = 0;
virtual double execute(double, double) const = 0;
double test(int iterations, Context c) {
set_context(c);
double a = 42;
double b = 78;
double temp = 0;
for (int i = 0; i < iterations; ++i) {
temp = a;
a = execute(a, b);
b = execute(b, temp);
}
return a + b;
}
};

struct UseSwitch : Base {
Context context;
void set_context(Context c) {
context = c;
}
double execute(double a, double b) const {
switch (context) {
case get_sum:
return add(a, b);
case get_difference:
return subtract(a, b);
case get_average:
return mean(a,b);
default:
return -42;
}
}
};

struct UsePointer : Base {
UsePointer() : ptr(&Base::add) {}
void set_context(Context c) {
switch (c) {
case get_sum:
ptr = &Base::add;
break;
case get_difference:
ptr = &Base::subtract;
break;
case get_average:
ptr = &Base::mean;
}
}
double execute(double a, double b) const {
return (this->*ptr)(a, b);
}

double (Base::*ptr)(double, double) const;
};

#ifdef PREVENT_FUNCTION_INLINING
double Base::add(double a, double b) const {
return a + b;
}
double Base::subtract(double a, double b) const {
return a - b;
}
double Base::mean(double a, double b) const {
return (a + b) / 2;
}
#endif

clock_t test(Base* obj, int iterations) {
int a, b, c;
clock_t start, total;

start = clock();
a = obj->test(iterations, get_sum);
b = obj->test(iterations, get_difference);
c = obj->test(iterations, get_average);
total = clock() - start;

cout << a << " " << b << " " << c << endl;
cout << "clocks: " << total << endl;

return total;
}

int main() {

Base* use_switch = new UseSwitch;
Base* use_pointer = new UsePointer;

const int iterations = 10000000;
const int loops = 10;

clock_t switch_clocks = 0;
clock_t pointer_clocks = 0;

for (int i = 0; i < loops; ++i) {
switch_clocks += test(use_switch, iterations);
pointer_clocks += test(use_pointer, iterations);
}

double switch_per_loop = switch_clocks / loops;
double pointer_per_loop = pointer_clocks / loops;

double switch_per_iteration = switch_per_loop / iterations / 3;
double pointer_per_iteration = pointer_per_loop / iterations / 3;

cout << setprecision(6) << fixed;
cout << "switch, per loop: " << switch_per_loop << endl;
cout << "switch, per iteration: " << switch_per_iteration << endl;
cout << endl;
cout << "pointer, per loop: " << pointer_per_loop << endl;
cout << "pointer, per iteration: " << pointer_per_iteration << endl;
cout << endl;
cout << "switch - pointer, per iteration difference: ";
cout << (switch_per_iteration - pointer_per_iteration) << endl;

return 0;
}
//-------

/*

Output with function inlining:

switch, per loop: 940.000000
switch, per iteration: 0.000031

pointer, per loop: 1480.000000
pointer, per iteration: 0.000049

switch - pointer, per iteration difference: -0.000018

Output without function inlining:

switch, per loop: 1726.000000
switch, per iteration: 0.000058

pointer, per loop: 1463.000000
pointer, per iteration: 0.000049

switch - pointer, per iteration difference: 0.000009

*/

--
FSC - http://userscripts.org/scripts/show/59948
http://fscode.altervista.org - http://sardinias.com
 
Reply With Quote
 
 
 
 
Jonathan Lee
Guest
Posts: n/a
 
      07-25-2010
On Jul 25, 5:09*am, "Francesco S. Carta" <(E-Mail Removed)> wrote:
> I should get a better grip about this jump table optimization thing (and
> the other things you mentioned), because I'm still not able to see how a
> direct call can be made equally fast with a table lookup /and/ a direct
> call, any further detail will be welcome.


At the risk of being slightly off topic... the basic idea is that the
pointer to member function isn't a direct call. At least, not at the
processor level. Or... at least not for x86 processors which I'm
familiar with. It's a call through a pointer, versus a call to an
address.

The difference is that when there's a call to an address coming up,
the processor can look ahead and get the instructions at that new
address and start decoding them. Take a look here for what's going
on: http://en.wikipedia.org/wiki/Instruction_pipeline

When jumping through a pointer, the CPU really has no idea where
the instructions will be coming from next. So it can't prefetch
the next block of code. It has to wait until the actual call
instruction is issued. Then it will grab the memory, and start
moving instructions into the pipeline.

It's all sorta similar to a cache miss, but for instructions.

As for the jump table, since they're small they tend to find
in cache. And after they run a few times, branch prediction
eliminates a lot of the penalties you would expect.
http://en.wikipedia.org/wiki/Branch_predictor

--Jonathan
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      07-25-2010
On Jul 24, 4:10 pm, "Francesco S. Carta" <(E-Mail Removed)> wrote:

> after reading an article complaining that pointers to member
> functions are misconceived in C++, and after reading a comment
> about them saying that they happen to be a rarely necessary
> feature - if ever necessary at all - I decided to investigate
> them to see how I could take advantage of them, and I also
> decided to post this message to get the group's feedback.


> A pointer to member function is a pointer that can memorize
> the displacement of any member function so that we can call
> that function on an appropriate object without having to know
> which function we are calling in particular - though, the
> object and the member function must belong to the same class
> (please forgive me if this description is oversimplified, but
> also please correct me if it happens to be wrong).


They don't have to belong to the same class, although the
classes do have to be related by inheritance.

And there's no real "offset" involved. There are two common
techniques for implementing them: in the most obvious, the
pointer stores either the actual address of the function (if
non-virtual) or the index into the vtable (if virtual), some
means of distinguishing between the two, and any correction
necessary to the this pointer. Alternatively, the compiler
generates a trampoline function, which does whatever is
necessary, and stores a pointer to it.

> I've come to think that this feature can be really useful if
> we have objects that can perform different actions in case
> they are used in a context or another.


> The approach to take (having not pointers to member functions at our
> disposal) would be to use a switch or some ifs any time we have to
> perform the action, so that we can choose the right function for the
> current context or input.


> If we happen to have several operations to perform, but the context /
> input rarely changes, having to decide which action to perform at each
> call is a waste of cycles.


> With pointers to member functions we can store our decision about which
> member function to call and speed up the whole process.


A more frequent approach to this problem involves using an
instance of a forwarding class.

The most frequent use of pointers to member functions is in
instantiating templates, where the pointer to member function is
in fact resolved at compile time.

> Since I'm biased towards interfaces and games, the example I
> post here below is just a stub of a game where we have a
> common interface for several different moving objects - I
> think the approach I've taken in that example could lead to a
> nice infrastructure where different objects must interact (or
> refuse to interact) with different environments.


Some early GUI frameworks made extensive use of them for
callbacks.

--
James Kanze
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      07-25-2010
On Jul 24, 4:36 pm, Öö Tiib <(E-Mail Removed)> wrote:
> On 24 juuli, 18:10, "Francesco S. Carta" <(E-Mail Removed)> wrote:


> > I'd like to ask you if you can come up with some real life examples
> > where you have taken advantage of pointers to member functions -
> > unfortunately, being an hobbyist, my best example comes straight from
> > the top of my head


> For loosely coupling something.


> For example typical silly observer pattern implementation expects my
> observer to have virtual handleEvent(Observed& ref) method. Virtual
> from IObserver base interface. Why? Looks like direct translation from
> java with no brain applied. I can just boost:bind any member function
> as an observer functor. Only limit is that it has to have unique
> member function name in class (may not have overloads).


For boost::bind. (Actually, there's no requirement that there
be no overloads, but if there are overloads, you have to
explicitly select which one to use, which is a little bit
awkward.) Earlier GUI implementations simply took a pointer to
member function directly.

> That way i need no useless interface base classes and virtuals and lot
> of other bloat and nuisance usually connected to loose coupling
> patterns. Patterns look like too tightly coupled after such thing
> applied for my taste. The calling and binding of member function
> pointers should be templatized to not confuse novices. Also i usually
> suggest to consider boost::signals2 first when someone needs
> multithreaded version of such.


Earlier GUI frameworks used the pattern you suggest. It was
quickly abandonned for a more explicit base interface class; the
explicit interface results in significantly more readable, more
maintainable code.

--
James Kanze
 
Reply With Quote
 
Francesco S. Carta
Guest
Posts: n/a
 
      07-25-2010
Jonathan Lee <(E-Mail Removed)>, on 25/07/2010 08:30:06, wrote:

> On Jul 25, 5:09 am, "Francesco S. Carta"<(E-Mail Removed)> wrote:
>> I should get a better grip about this jump table optimization thing (and
>> the other things you mentioned), because I'm still not able to see how a
>> direct call can be made equally fast with a table lookup /and/ a direct
>> call, any further detail will be welcome.

>
> At the risk of being slightly off topic... the basic idea is that the
> pointer to member function isn't a direct call. At least, not at the
> processor level. Or... at least not for x86 processors which I'm
> familiar with. It's a call through a pointer, versus a call to an
> address.
>
> The difference is that when there's a call to an address coming up,
> the processor can look ahead and get the instructions at that new
> address and start decoding them. Take a look here for what's going
> on: http://en.wikipedia.org/wiki/Instruction_pipeline
>
> When jumping through a pointer, the CPU really has no idea where
> the instructions will be coming from next. So it can't prefetch
> the next block of code. It has to wait until the actual call
> instruction is issued. Then it will grab the memory, and start
> moving instructions into the pipeline.
>
> It's all sorta similar to a cache miss, but for instructions.
>
> As for the jump table, since they're small they tend to find
> in cache. And after they run a few times, branch prediction
> eliminates a lot of the penalties you would expect.
> http://en.wikipedia.org/wiki/Branch_predictor


All the above decisively clears out a lot of things, thanks for the
additional details and for the links Jonathan.

--
FSC - http://userscripts.org/scripts/show/59948
http://fscode.altervista.org - http://sardinias.com
 
Reply With Quote
 
Jonathan Lee
Guest
Posts: n/a
 
      07-25-2010
On Jul 25, 9:41*am, "Francesco S. Carta" <(E-Mail Removed)> wrote:
> Indeed, my ignorance led me to assert something that cannot reasonably
> be true in all the cases - furthermore if the code is open to
> optimizations, in which case my assertion will be likely false, as I
> understood from your words and as I have seen with my eyes:


Meh. I don't even know that it is "likely false" or that it cannot
"reasonably be true in all cases". I only wanted to impress on you
that
it certainly wasn't obvious. For instance, when I run your program on
my machine, I get nearly identical results.

> > Testing, of course, would be necessary. But this example has gone from
> > "generally useful" to "niche optimization" :/

>
> My fault, of course, but fortunately not my intention.


No, I know it wasn't your intention. On the bright side, it brought to
my attention an area of C++ which I usually ignore completely. It
never
hurts to visit those topics with a fresh pair of eyes.

> I'm still interested in the general question and any information on how
> to do something similar (i.e. context-depending member function choice)
> using templates will be welcome - I'm thinking about it from yesterday
> and still I cannot even imagine where to start.


Actually I'd be interested in seeing an example as well. It's not
something in my tool bag.

> Thank you for your patience and for your feedback


No problem.

--Jonathan

 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      07-25-2010
On Jul 25, 4:30 pm, Jonathan Lee <(E-Mail Removed)> wrote:
> On Jul 25, 5:09 am, "Francesco S. Carta" <(E-Mail Removed)> wrote:


> > I should get a better grip about this jump table
> > optimization thing (and the other things you mentioned),
> > because I'm still not able to see how a direct call can be
> > made equally fast with a table lookup /and/ a direct call,
> > any further detail will be welcome.


> At the risk of being slightly off topic... the basic idea is
> that the pointer to member function isn't a direct call. At
> least, not at the processor level. Or... at least not for x86
> processors which I'm familiar with. It's a call through a
> pointer, versus a call to an address.


It's more than just a call through a pointer. The calling code
first has to test whether the called function is virtual or not,
then either call through a pointer or find the address in the
vtable, using the index in the pointer to member. It then has
to adjust the this pointer---it might have to do this anyway,
but with a direct call, the adjustment is with a compile time
constant; with a pointer to member function, it is with a
runtime variable (which means that in the most frequent case,
where the adjustment is 0, the compiler doesn't do anything for
a direct call, but the generated code still does the add,
because the compiler didn't know that it would always be 0).

Alternatively, some compilers use trampolines: they generate a
small snippet of code which does whatever is necessary to call
the function when you take the address of the member function
(at which time, the compiler knows, for example, whether the
function is virtual or not), and assign this to the pointer to
member function.

> The difference is that when there's a call to an address coming up,
> the processor can look ahead and get the instructions at that new
> address and start decoding them. Take a look here for what's going
> on:http://en.wikipedia.org/wiki/Instruction_pipeline


> When jumping through a pointer, the CPU really has no idea where
> the instructions will be coming from next. So it can't prefetch
> the next block of code.


This depends very much on the processor, and affects all
indirect jumps (vitual functions, function returns, etc.). I've
never noticed it to cause any real problems on a Sparc, for
example, but I know that it is an issue on HP's PA.

On processors where it is an issue, of course, the switch
definitely wins over a pointer to member function, because there
won't be any indirect calls in the code generated for the
switch.

> It has to wait until the actual call instruction is issued.
> Then it will grab the memory, and start moving instructions
> into the pipeline.


Or it knows how to recognize such instructions in the pipeline,
and start fetching the target address way in advance. Since
this affects function returns, a number of processors have added
the necessary logic to handle it.

> It's all sorta similar to a cache miss, but for instructions.


> As for the jump table, since they're small they tend to find
> in cache. And after they run a few times, branch prediction
> eliminates a lot of the penalties you would
> expect. http://en.wikipedia.org/wiki/Branch_predictor


Branch prediction often doesn't give good results on indirect
calls. On such processors, the compiler will generate an inline
binary search for the value, just as it would if the switch
values were not dense. This sort of handling can often be
quicker than an indirect jump.

--
James Kanze
 
Reply With Quote
 
Francesco S. Carta
Guest
Posts: n/a
 
      07-25-2010
Jonathan Lee <(E-Mail Removed)>, on 25/07/2010 09:16:31, wrote:

> On Jul 25, 9:41 am, "Francesco S. Carta"<(E-Mail Removed)> wrote:
>> Indeed, my ignorance led me to assert something that cannot reasonably
>> be true in all the cases - furthermore if the code is open to
>> optimizations, in which case my assertion will be likely false, as I
>> understood from your words and as I have seen with my eyes:

>
> Meh. I don't even know that it is "likely false" or that it cannot
> "reasonably be true in all cases". I only wanted to impress on you
> that
> it certainly wasn't obvious. For instance, when I run your program on
> my machine, I get nearly identical results.


Have you tried to uncomment the macro for allowing the inline functions?
I think it would show a significant gain to the advantage of your
version (the switch one).

I suspect my way out of it (for the very purpose of my niche digression)
would be via "normal" pointers to functions taking references to
instances of my class, that would ensure the speed of "switch +
inlining" (since inlining can't be always achieved)

>> I'm still interested in the general question and any information on how
>> to do something similar (i.e. context-depending member function choice)
>> using templates will be welcome - I'm thinking about it from yesterday
>> and still I cannot even imagine where to start.

>
> Actually I'd be interested in seeing an example as well. It's not
> something in my tool bag.


Let's hope somebody comes up with something new to see

--
FSC - http://userscripts.org/scripts/show/59948
http://fscode.altervista.org - http://sardinias.com
 
Reply With Quote
 
Jonathan Lee
Guest
Posts: n/a
 
      07-25-2010
On Jul 25, 12:30*pm, James Kanze <(E-Mail Removed)> wrote:
> It's more than just a call through a pointer. *The calling code
> first has to test whether the called function is virtual or not,
> then either call through a pointer or find the address in the
> vtable, using the index in the pointer to member. *It then has
> to adjust the this pointer---it might have to do this anyway,
> but with a direct call, the adjustment is with a compile time
> constant; with a pointer to member function, it is with a
> runtime variable (which means that in the most frequent case,
> where the adjustment is 0, the compiler doesn't do anything for
> a direct call, but the generated code still does the add,
> because the compiler didn't know that it would always be 0).
>
> Alternatively, some compilers use trampolines: they generate a
> small snippet of code which does whatever is necessary to call
> the function when you take the address of the member function
> (at which time, the compiler knows, for example, whether the
> function is virtual or not), and assign this to the pointer to
> member function.


Thanks for all the additional info, James. If I didn't make it
clear, I'm very much looking at this from an x86 assembly
language perspective. Didn't even think of the effect that
virtual functions, etc. would have.

--Jonathan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
static member functions vs. member function pointers paul C++ 8 04-30-2009 11:03 AM
overloading non-template member functions with template member functions Hicham Mouline C++ 1 04-24-2009 07:47 AM
overloading non-template member functions with template member functions Hicham Mouline C++ 0 04-23-2009 11:42 AM
Member function pointers to member functions with default arguments Hamish C++ 3 01-25-2008 06:46 AM
Useless thread about some useless statistics Daniel Nogradi Python 0 11-15-2006 11:33 PM



Advertisments