Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > function pointer, (automatic) inlining, performance

Reply
Thread Tools

function pointer, (automatic) inlining, performance

 
 
David Mathog
Guest
Posts: n/a
 
      03-14-2011
Assume there are N (around 10) functions like:
func1(void *, void*, int )
...
funcN(void*,void*,int)

A subset of these are to be called in the inner loop of a program,
conditionally, but always in numerical order. That is, it might be
1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
ways to code this (here "args" is filled in with the appropriate
values using simple code that isn't
relevant to the point under discussion here):

1),
if(test1)func1(args);
if(test2)func2(args);
...
if(testN)funcN(args);

2)
store function pointers in a list and do:
for(i=0;i<listlen;i++){
list[i](args); /* list[I] holds function pointer to funcI */
}

3.
use an enum selector with entries {USEFUNC1,USEFUNC2,... USEFUNCN}
and store those selectors in a list:

for(i=0;i<listlen;i++){
switch (list[i]){
case USEFUNC1: func1(args); break;
case USEFUNC2: func2(args); break;
...
case USEFUNCN: funcN(args); break;
}
}

The functions all perform relatively small operations and could be
inlined. Except, I don't think they can be for (2), because of the
function pointers. Since this is in the inner loop, the resulting
function call overhead is likely to be significant compared to the
inlined access. If case (3) always reduced to the equivalent of a
Fortran computed goto it would be faster than (1), since it could skip
the unnecessary if tests. However, I don't think the standard
guarantees that, and it might well reduce to the same code. In any
case, it looks like (3) is likely to be the fastest, and in any case,
should not be slower than (1). (2) would be fastest (never any extra
logic tests) if the function pointers could be inlined, but is that
possible in this case?

Are there any other options for this situation?

Thanks,

David Mathog
 
Reply With Quote
 
 
 
 
Nobody
Guest
Posts: n/a
 
      03-14-2011
On Mon, 14 Mar 2011 09:10:18 -0700, David Mathog wrote:

> Assume there are N (around 10) functions like:
> func1(void *, void*, int )
> ...
> funcN(void*,void*,int)
>
> A subset of these are to be called in the inner loop of a program,
> conditionally, but always in numerical order. That is, it might be
> 1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
> ways to code this (here "args" is filled in with the appropriate
> values using simple code that isn't relevant to the point under
> discussion here):


In the absence of additional information, I would assume that option #1
would be the fastest, as it is the most restrictive case. The compiler
knows which functions can be called and in which order, so it enables
optimisations such as allowing later functions to use values computed in
earlier functions.

In theory, a call through a function pointer could be inlined if the
compiler can "see" all of the possible values which the pointer could
take, but I've never observed this optimisation in practice.

With regard to #3, even if the compiler could determine that there are
certain constraints upon the ordering of the list elements, it's much
harder for it to actually use this information than is the case for #1.

As for your comment about "computed goto": unless the set of cases is
huge, a tree of branch instructions is likely to be more efficient. A
modern CPU will speculatively execute a branch instruction and the
instructions following it, but if cannot calculate the destination of a
computed goto immediately, it will simply stall until the data is
available.

 
Reply With Quote
 
 
 
 
Dr Nick
Guest
Posts: n/a
 
      03-15-2011
Nobody <> writes:

> On Mon, 14 Mar 2011 09:10:18 -0700, David Mathog wrote:
>
>> Assume there are N (around 10) functions like:
>> func1(void *, void*, int )
>> ...
>> funcN(void*,void*,int)
>>
>> A subset of these are to be called in the inner loop of a program,
>> conditionally, but always in numerical order. That is, it might be
>> 1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
>> ways to code this (here "args" is filled in with the appropriate
>> values using simple code that isn't relevant to the point under
>> discussion here):


How much of the order do you know at compile time?

And - orthogonal to this - are you better including the actual code in
the if-ladder or the switch rather than relying on functions and inlining?
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk
 
Reply With Quote
 
David Mathog
Guest
Posts: n/a
 
      03-15-2011
On Mar 14, 5:52*pm, Dr Nick <3-nos...@temporary-address.org.uk> wrote:
> > On Mon, 14 Mar 2011 09:10:18 -0700, David Mathog wrote:
> >> A subset of these are to be called in the inner loop of a program,
> >> conditionally, but always in numerical order. That is, it might be
> >> 1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
> >> ways to code this (here "args" is filled in with the appropriate
> >> values using simple code that isn't relevant to the point under
> >> discussion here):

>
> How much of the order do you know at compile time?


Well the order is 1 before 2 before 3 ... before 10. but the sequence
can skip some positions. The situation is like dropping a ball down
stairs, it will always land on a lower stair, but can and usually will
skip some in between. The exact sequence is not knowable at compile
time. However, once specified at run time it is fixed for all further
processing. It would be nice to be able to exploit this information.
The sequence of operations is known, it really should not require any
sort of a conditional to get to it. (And it wouldn't in, say,
assembler.) Using the gcc goto extension I guess one could implement
the conditional free function along these line (pseudocode, for the
case that hits 3, 6, and 7):

/* in another function prepare a list describing the operation order
*/
list []=
({&&label3, parameter3},
{&&label6, parameter6},
{&&label7,parameter7},
(&&label11,0}};

/* run the list in this function */
goto list->label;
label1: func1(list->param); list++; goto list->label;
label2: func2(list->param); list++; goto list->label;
label3: func3(list->param); list++; goto list->label;
....
label9: func9(list->param); list++; goto list->label;
label10: func10(list->param); list++; goto list->label;
label11: return;

On second thought, that gcc goto extension may not be an option, since
the function where the list would be created is not the same as the
pseudocode function shown above, so using the && to generate the label
addresses would be a problem. And that type of goto isn't standard in
any case.

> And - orthogonal to this - are you better including the actual code in
> the if-ladder or the switch rather than relying on functions and inlining?


Possibly. It would definitely be a lot harder to read. Each of these
code blocks (whether as a separate function or in an if ladder) is an
operation on the same string, and the N+1th operation cannot do
anything much until the Nth has completed. So while the compiler can
optimize within each code block, there probably isn't much it could do
between them. As Nobody suggested, the CPU might be able to make some
headway with speculative execution within the if ladder.

Regards,

David Mathog

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      03-15-2011
On 3/14/2011 12:10 PM, David Mathog wrote:
> Assume there are N (around 10) functions like:
> func1(void *, void*, int )
> ...
> funcN(void*,void*,int)
>
> A subset of these are to be called in the inner loop of a program,
> conditionally, but always in numerical order. That is, it might be
> 1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
> ways to code this (here "args" is filled in with the appropriate
> values using simple code that isn't
> relevant to the point under discussion here):
>
> 1),
> if(test1)func1(args);
> if(test2)func2(args);
> ...
> if(testN)funcN(args);
>
> 2)
> store function pointers in a list and do:
> for(i=0;i<listlen;i++){
> list[i](args); /* list[I] holds function pointer to funcI */
> }
>
> 3.
> use an enum selector with entries {USEFUNC1,USEFUNC2,... USEFUNCN}
> and store those selectors in a list:
>
> for(i=0;i<listlen;i++){
> switch (list[i]){
> case USEFUNC1: func1(args); break;
> case USEFUNC2: func2(args); break;
> ...
> case USEFUNCN: funcN(args); break;
> }
> }
>
> The functions all perform relatively small operations and could be
> inlined. Except, I don't think they can be for (2), because of the
> function pointers. Since this is in the inner loop, the resulting
> function call overhead is likely to be significant compared to the
> inlined access. If case (3) always reduced to the equivalent of a
> Fortran computed goto it would be faster than (1), since it could skip
> the unnecessary if tests. However, I don't think the standard
> guarantees that, and it might well reduce to the same code. In any
> case, it looks like (3) is likely to be the fastest, and in any case,
> should not be slower than (1). (2) would be fastest (never any extra
> logic tests) if the function pointers could be inlined, but is that
> possible in this case?
>
> Are there any other options for this situation?


Sure. For example,

switch (bitmask_of_funcs_to_call) {
case 1:
func1(args);
break;
case 2:
func2(args);
break;
case 3:
func1(args);
func2(args);
break;
case 4:
func3(args);
break;
...

Of course, with "around 10" functions the number of cases in the
switch might stress the compiler a little ... You could hammer
it back down to semi-reasonable size with a sequence of switch'es
on subsets of the bits:

switch (bitmask_of_funcs_to_call & 07) ...
switch (bitmask_of_funcs_to_call & 070) ...
switch (bitmask_of_funcs_to_call & 0700) ...
...

The technical term for this kind of transformation is "Dumb Idea."

Stick with your original #1. If the tests are expensive and their
results don't change during the loop, you might precompute them:

int flag1 = test1();
int flag2 = test2();
...
while (looping) {
if (flag1) func1(args);
if (flag2) func2(args);
...

If the test results *do* change during the loop, then please don't
overlook the cost of maintaining list[] in proposals #2 and #3.
Also, note that #2 and #3 eliminate the specific tests you're so
worried about, but introduce new tests of their own like "Should
I iterate again?" and "Is the switch value in range of the cases?"

--
Eric Sosman
d
 
Reply With Quote
 
Phil Carmody
Guest
Posts: n/a
 
      03-15-2011
David Mathog <> writes:
> Assume there are N (around 10) functions like:
> func1(void *, void*, int )
> ...
> funcN(void*,void*,int)
>
> A subset of these are to be called in the inner loop of a program,
> conditionally, but always in numerical order. That is, it might be
> 1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
> ways to code this

....
> Are there any other options for this situation?


Yes. Profile and find what's slowing you down before you attempt
to optimise anything.

Phil
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1
 
Reply With Quote
 
David Mathog
Guest
Posts: n/a
 
      03-15-2011
On Mar 15, 5:04*am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:

>
> * * *Stick with your original #1. *If the tests are expensive andtheir


The tests are not expensive. In the current incarnation each is just
a bit test on the same integer.

The consensus here seems to be to stick with #1, so that's what I will
do.

Thanks all,

David Mathog
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
write a function such that when ever i call this function in some other function .it should give me tha data type and value of calling function parameter komal C++ 6 01-25-2005 11:13 AM
Passing a C++ object's member function to a C function expecing a function pointer! James Vanns C++ 7 01-21-2004 02:39 AM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM
performance of static member function vs. instance member function 0to60 C++ 4 11-21-2003 05:25 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57