Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > optimizing the switch statement in Duff's Device (casting a label, label abuse)

Reply
Thread Tools

optimizing the switch statement in Duff's Device (casting a label, label abuse)

 
 
anon.asdf@gmail.com
Guest
Posts: n/a
 
      10-10-2007
Here's a reminder of duff's device:

/*************************************/
#include <stdio.h>

#define STEP 8
#define MAX_LEN STEP*4+1
#define SOURCE_LEN 28


int main(void)
{
char source[SOURCE_LEN] =
"abcdefghijklmnopqrstuvwxyz0";
char destination[MAX_LEN] = {0};

char *src_ptr;
char *dest_ptr;
char *lim;

int user_input;

printf("Enter a number between incl. 1 and %d\n"
"To quit enter: 0<ENTER>\n", SOURCE_LEN-1);

while ((scanf("%d", &user_input) > 0)
&& (user_input > 0)
&& (user_input < SOURCE_LEN)) {

src_ptr = source;
lim = source + user_input;
dest_ptr = destination;

switch (user_input % {
case 0: do {*dest_ptr++ = *src_ptr++;
case 7: *dest_ptr++ = *src_ptr++;
case 6: *dest_ptr++ = *src_ptr++;
case 5: *dest_ptr++ = *src_ptr++;
case 4: *dest_ptr++ = *src_ptr++;
case 3: *dest_ptr++ = *src_ptr++;
case 2: *dest_ptr++ = *src_ptr++;
case 1: *dest_ptr++ = *src_ptr++;
} while (src_ptr < lim);
}

#if 0 /*** alternative ***/
switch (user_input % {
do {*dest_ptr++ = *src_ptr++;
case 7: *dest_ptr++ = *src_ptr++;
case 6: *dest_ptr++ = *src_ptr++;
case 5: *dest_ptr++ = *src_ptr++;
case 4: *dest_ptr++ = *src_ptr++;
case 3: *dest_ptr++ = *src_ptr++;
case 2: *dest_ptr++ = *src_ptr++;
case 1: *dest_ptr++ = *src_ptr++;
case 0: ; } while (src_ptr < lim);
}
#endif

*dest_ptr = '\0';
puts(destination);
}
return 0;
}


/*************************************/


The switch itself scales horribly. At worst we encounter 8 compare-
operations until we reach the desired case.

Maby there is a better way of doing the switch statement: In
particular - the argument to the switch evaluates to an integer

between 0 and 7 (incl.) and now the question is: could we not use this
number as an offset to immediately jump the code to

the desired location.

goto la_0 - (user_input %
// * sizeof("*dest_ptr++ = *src_ptr++;"instruction)
;


do {*dest_ptr++ = *src_ptr++;
la_7: *dest_ptr++ = *src_ptr++;
la_6: *dest_ptr++ = *src_ptr++;
la_5: *dest_ptr++ = *src_ptr++;
la_4: *dest_ptr++ = *src_ptr++;
la_3: *dest_ptr++ = *src_ptr++;
la_2: *dest_ptr++ = *src_ptr++;
la_1: *dest_ptr++ = *src_ptr++;
la_0: ; } while (src_ptr < lim);


The code snippet above shows the idea:
we jump to an address relative to out pivot-label la_0.

Example
If (user_input % == 1, we
want to jump to "la_0 minus a few instructions" to land up at la_1.


Can something like this be done?
Can we get a label's address and subtract (or add) an integer to it?
Or are there other alternatives that don't need a label...?

Of course, there are issues involved:
What if the compiler thinks it's "clever" ?
what does the compiler do with the empty ; located at la_0 (of course
we do not want a nop)?

Thanks a thousand... I'm getting a cold duff from the fridge!

Kind regards,
Albert

 
Reply With Quote
 
 
 
 
J. J. Farrell
Guest
Posts: n/a
 
      10-10-2007
On Oct 10, 5:25 pm, (E-Mail Removed) wrote:
> Here's a reminder of duff's device:
>
> /*************************************/
> #include <stdio.h>
>
> #define STEP 8
> #define MAX_LEN STEP*4+1
> #define SOURCE_LEN 28
>
> int main(void)
> {
> char source[SOURCE_LEN] =
> "abcdefghijklmnopqrstuvwxyz0";
> char destination[MAX_LEN] = {0};
>
> char *src_ptr;
> char *dest_ptr;
> char *lim;
>
> int user_input;
>
> printf("Enter a number between incl. 1 and %d\n"
> "To quit enter: 0<ENTER>\n", SOURCE_LEN-1);
>
> while ((scanf("%d", &user_input) > 0)
> && (user_input > 0)
> && (user_input < SOURCE_LEN)) {
>
> src_ptr = source;
> lim = source + user_input;
> dest_ptr = destination;
>
> switch (user_input % {
> case 0: do {*dest_ptr++ = *src_ptr++;
> case 7: *dest_ptr++ = *src_ptr++;
> case 6: *dest_ptr++ = *src_ptr++;
> case 5: *dest_ptr++ = *src_ptr++;
> case 4: *dest_ptr++ = *src_ptr++;
> case 3: *dest_ptr++ = *src_ptr++;
> case 2: *dest_ptr++ = *src_ptr++;
> case 1: *dest_ptr++ = *src_ptr++;
> } while (src_ptr < lim);
> }
>
> #if 0 /*** alternative ***/
> switch (user_input % {
> do {*dest_ptr++ = *src_ptr++;
> case 7: *dest_ptr++ = *src_ptr++;
> case 6: *dest_ptr++ = *src_ptr++;
> case 5: *dest_ptr++ = *src_ptr++;
> case 4: *dest_ptr++ = *src_ptr++;
> case 3: *dest_ptr++ = *src_ptr++;
> case 2: *dest_ptr++ = *src_ptr++;
> case 1: *dest_ptr++ = *src_ptr++;
> case 0: ; } while (src_ptr < lim);
> }
> #endif
>
> *dest_ptr = '\0';
> puts(destination);
> }
> return 0;
>
> }
>
> /*************************************/
>
> The switch itself scales horribly. At worst we encounter 8 compare-
> operations until we reach the desired case.


Really? What compiler are you using as a matter of interest? It must
be a pretty poor quality one.

> Maby there is a better way of doing the switch statement: In
> particular - the argument to the switch evaluates to an integer
> between 0 and 7 (incl.) and now the question is: could we not use this
> number as an offset to immediately jump the code to
> the desired location.


The compiler could obviously do it, that's what tools are for. I'm
interested to know what compiler you've got which doesn't do what you
say for the code given above.

> goto la_0 - (user_input %
> // * sizeof("*dest_ptr++ = *src_ptr++;"instruction)
> ;
>
> do {*dest_ptr++ = *src_ptr++;
> la_7: *dest_ptr++ = *src_ptr++;
> la_6: *dest_ptr++ = *src_ptr++;
> la_5: *dest_ptr++ = *src_ptr++;
> la_4: *dest_ptr++ = *src_ptr++;
> la_3: *dest_ptr++ = *src_ptr++;
> la_2: *dest_ptr++ = *src_ptr++;
> la_1: *dest_ptr++ = *src_ptr++;
> la_0: ; } while (src_ptr < lim);
>
> The code snippet above shows the idea:
> we jump to an address relative to out pivot-label la_0.
>
> Example
> If (user_input % == 1, we
> want to jump to "la_0 minus a few instructions" to land up at la_1.
>
> Can something like this be done?


Not easily or efficiently in C.

> Can we get a label's address and subtract (or add) an integer to it?
> Or are there other alternatives that don't need a label...?


Use Duff's device and a decent compiler.

> Of course, there are issues involved:
> What if the compiler thinks it's "clever" ?
> what does the compiler do with the empty ; located at la_0 (of course
> we do not want a nop)?
>
> Thanks a thousand... I'm getting a cold duff from the fridge!


I'd track down a decent compiler which does the sensible thing with
Duff's device before spending time trying to hack up some alternative.
An old gcc on x86 does the obvious thing correctly, for example.

 
Reply With Quote
 
 
 
 
Duncan Muirhead
Guest
Posts: n/a
 
      10-10-2007
On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:

<snip>
>
> The switch itself scales horribly. At worst we encounter 8 compare-
> operations until we reach the desired case.
>

<snip>
>
> Can we get a label's address and subtract (or add) an integer to it?
> Or are there other alternatives that don't need a label...?
>
> Of course, there are issues involved:
> What if the compiler thinks it's "clever" ?
> what does the compiler do with the empty ; located at la_0 (of course
> we do not want a nop)?
>
> Thanks a thousand... I'm getting a cold duff from the fridge!
>
> Kind regards,
> Albert

You shouldn't assume that a switch is always implemented by a
sequence of comparisons; it might well be implemented as a jump table.

What you seek can be done with gcc, using the labels as values
extension, (you can store the labels in an array L say, and goto
L[user_output%8]) but that isn't (standard) C.




 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      10-10-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) writes:
> Here's a reminder of duff's device:

<snip>
> switch (user_input % {
> case 0: do {*dest_ptr++ = *src_ptr++;
> case 7: *dest_ptr++ = *src_ptr++;
> case 6: *dest_ptr++ = *src_ptr++;
> case 5: *dest_ptr++ = *src_ptr++;
> case 4: *dest_ptr++ = *src_ptr++;
> case 3: *dest_ptr++ = *src_ptr++;
> case 2: *dest_ptr++ = *src_ptr++;
> case 1: *dest_ptr++ = *src_ptr++;
> } while (src_ptr < lim);
> }

<snip>
> The switch itself scales horribly. At worst we encounter 8 compare-
> operations until we reach the desired case.
>
> Maby there is a better way of doing the switch statement: In
> particular - the argument to the switch evaluates to an integer
>
> between 0 and 7 (incl.) and now the question is: could we not use this
> number as an offset to immediately jump the code to


Yes. Most compilers will do this for you simply by giving it the
above code. A compiler might choose to use repeated compare
operations, but it seems like any unlikely choice in this case. If
yours does, I'd look to change it rather than venture into
non-standard "computed goto" territory (not that I am saying any
compilers support it, but if they did, I would avoid it).

--
Ben.
 
Reply With Quote
 
anon.asdf@gmail.com
Guest
Posts: n/a
 
      10-10-2007
On Oct 10, 7:12 pm, Duncan Muirhead <(E-Mail Removed)> wrote:
> On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:
>
> <snip>
>
>
>
> > The switch itself scales horribly. At worst we encounter 8 compare-
> > operations until we reach the desired case.

>
> <snip>
>
> > Can we get a label's address and subtract (or add) an integer to it?
> > Or are there other alternatives that don't need a label...?

>
> > Of course, there are issues involved:
> > What if the compiler thinks it's "clever" ?
> > what does the compiler do with the empty ; located at la_0 (of course
> > we do not want a nop)?

>
> > Thanks a thousand... I'm getting a cold duff from the fridge!

>
> > Kind regards,
> > Albert

>
> You shouldn't assume that a switch is always implemented by a
> sequence of comparisons; it might well be implemented as a jump table.
>



Ah great! Thanks for the info.
(I was mislead by reading http://www.geekronomicon.com/?q=node/68#switch).


> What you seek can be done with gcc, using the labels as values
> extension, (you can store the labels in an array L say, and goto
> L[user_output%8]) but that isn't (standard) C.


I'm using gcc, so I'll keep that in mind in case I need to do
something more exotic than a switch, one day.


Kind regards,
Albert

 
Reply With Quote
 
christian.bau
Guest
Posts: n/a
 
      10-10-2007
The whole thing is completely stupid.

1. If you want to make a loop fast, use a good compiler and tell it to
unroll the loop. This has many advantages: The compiler can make a
good decision how often to unroll the loop. It will know where using
more unrolling is just pointless. It knows exactly what is the best
way to divide things up between partial and complete iterations. Using
Duff's device, the compiler has to figure out what is happening and
will most likely produce suboptimal code.

For example, on a typical x86 system, with eight times unrolling,
there would be a loop looking like dst[0]=src[0];dst[1]=src[1];...;dst
+=8;src+=8; Only two operations incrementing pointers. Duff's device
has a label after each *dst++=*src++; so it needs eight separate
increments for src and dst. It would probably be possible to detect
Duff's device and transform it back to a straightforward loop, but
that would just be a waste of compiler developer's time.

Now consider what happens if you have an auto-vectorising compiler. A
straightforward loop could be transformed to vector operations and
become ten times faster. But seeing Duff's device, the compiler will
just shake his head in disgust and run away.

However, in this case you could have got ten times faster code by just
calling memcpy (). What are the advantages? I'll give you one example:
If you compiled code for an earlier version of MacOS X running on an
ancient G3 processor, memcpy () would execute hand-optimised assembler
code running at maximum speed for that processor. Take the same
executable (NOT recompiled) and run it on last years G5 processor, and
it executes completely different code, optimised for a G5 processor.
And now you take the same executable and run it on a Macintosh with an
Intel processor. While Duff's device originally generated awful code
for a PowerPC G3, which is now automatically translated to even worse
x86 code, the memcpy () call actually executes hand-optimised x86
assembler code!

Summary: You can use an error-prone, unreadable construct and feel
clever about it, while wasting your time and producing slow code, or
you can leave the hard work to the compiler, save lots of coding and
debugging time, and get code that executes a lot faster. Tough
choice.


 
Reply With Quote
 
christian.bau
Guest
Posts: n/a
 
      10-10-2007
On Oct 10, 6:14 pm, Ben Bacarisse <(E-Mail Removed)> wrote:

> Yes. Most compilers will do this for you simply by giving it the
> above code. A compiler might choose to use repeated compare
> operations, but it seems like any unlikely choice in this case. If
> yours does, I'd look to change it rather than venture into
> non-standard "computed goto" territory (not that I am saying any
> compilers support it, but if they did, I would avoid it).


For small numbers of consecutive values, it is much easier to do say
four compares and eight conditional branches. Compare to 1, branch if
less, branch if equal, compare to 3, branch if less, branch if equal,
Compare to 5 etc. etc. n/4 compares and n/2 branches on average.
Computed branches usually are much worse for branch prediction
hardware. For large numbers, use a binary branch strategy with log(n)
compares and branches in every case.

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      10-10-2007
christian.bau wrote On 10/10/07 16:21,:
> The whole thing is completely stupid.


Yes to that part. However ...

> [...] Duff's device
> has a label after each *dst++=*src++; so it needs eight separate
> increments for src and dst. [...]
> [...] in this case you could have got ten times faster code by just
> calling memcpy ().


.... The O.P.'s code could (and should) be replaced by a
call to memcpy(), but Duff's device could not be. See,
for example,

http://en.wikipedia.org/wiki/Duff's_device

.... and let's give no more guff to Duff!

--
(E-Mail Removed)
 
Reply With Quote
 
anon.asdf@gmail.com
Guest
Posts: n/a
 
      10-11-2007
On Oct 10, 7:12 pm, Duncan Muirhead <(E-Mail Removed)> wrote:
> On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:
>
> <snip>
>
>
>
> > The switch itself scales horribly. At worst we encounter 8 compare-
> > operations until we reach the desired case.

>
> <snip>
>
> > Can we get a label's address and subtract (or add) an integer to it?
> > Or are there other alternatives that don't need a label...?

>
> > Of course, there are issues involved:
> > What if the compiler thinks it's "clever" ?
> > what does the compiler do with the empty ; located at la_0 (of course
> > we do not want a nop)?

>
> > Thanks a thousand... I'm getting a cold duff from the fridge!

>
> > Kind regards,
> > Albert

>
> You shouldn't assume that a switch is always implemented by a
> sequence of comparisons; it might well be implemented as a jump table.
>
> What you seek can be done with gcc, using the labels as values
> extension, (you can store the labels in an array L say, and goto
> L[user_output%8]) but that isn't (standard) C.


Hi!

Here's a code example using the non-standard,
but cool gcc extensions!

I cannot understand why this is not part of the C standard: it's
really powerful!!

{
//deliberately provocative statement
Ah who cares, gcc is the defacto standard anyway! //
}

Regards,
Albert

/***************************************/

#include <stdio.h>

#define STEP 8
#define MAX_LEN STEP*4+1
#define SOURCE_LEN 28

int main(void)
{
char source[SOURCE_LEN] =
"abcdefghijklmnopqrstuvwxyz0";
char destination[MAX_LEN] = {0};

char *src_ptr;
char *dest_ptr;
char *lim;

void *jmp;

int user_input;

printf("Enter a number between incl. 1 and %d\n"
"To quit enter: 0<ENTER>\n", SOURCE_LEN-1);

while ((scanf("%d", &user_input) > 0)
&& (user_input > 0)
&& (user_input < SOURCE_LEN)) {

src_ptr = source;
lim = source + user_input;
dest_ptr = destination;

goto *(&&la_0 -
(user_input % * (&&la_0 - && la_1));

do { *dest_ptr++ = *src_ptr++;
la_7: *dest_ptr++ = *src_ptr++;
la_6: *dest_ptr++ = *src_ptr++;
la_5: *dest_ptr++ = *src_ptr++;
la_4: *dest_ptr++ = *src_ptr++;
la_3: *dest_ptr++ = *src_ptr++;
la_2: *dest_ptr++ = *src_ptr++;
la_1: *dest_ptr++ = *src_ptr++;
la_0: ; } while (src_ptr < lim);

*dest_ptr = '\0';
puts(destination);
}
return 0;

}

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      10-11-2007
(E-Mail Removed) writes:

> On Oct 10, 7:12 pm, Duncan Muirhead <(E-Mail Removed)> wrote:
>> On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:
>>
>> <snip>
>> > The switch itself scales horribly. At worst we encounter 8 compare-
>> > operations until we reach the desired case.

>>
>> <snip>
>> > Can we get a label's address and subtract (or add) an integer to it?
>> > Or are there other alternatives that don't need a label...?

>>

<snip>
>> You shouldn't assume that a switch is always implemented by a
>> sequence of comparisons; it might well be implemented as a jump table.
>>
>> What you seek can be done with gcc, using the labels as values
>> extension, (you can store the labels in an array L say, and goto
>> L[user_output%8]) but that isn't (standard) C.

>
> Here's a code example using the non-standard,
> but cool gcc extensions!


The GCC people are indeed "cool" -- you should take their advice:

"The switch statement is cleaner, so use that rather than an array
[of labels] unless the problem does not fit a switch statement very
well."

> I cannot understand why this is not part of the C standard: it's
> really powerful!!


I can't understand why you would obfuscate your code like this. What
is wrong with memcpy?

<snip>
> goto *(&&la_0 -
> (user_input % * (&&la_0 - && la_1));


Way to go! You've taken a non-standard, non-portable, extension and
used it in such a way as to rely on assumptions that even gcc does not
guarantee. You must be aiming for minimum portability!

> do { *dest_ptr++ = *src_ptr++;
> la_7: *dest_ptr++ = *src_ptr++;
> la_6: *dest_ptr++ = *src_ptr++;
> la_5: *dest_ptr++ = *src_ptr++;
> la_4: *dest_ptr++ = *src_ptr++;
> la_3: *dest_ptr++ = *src_ptr++;
> la_2: *dest_ptr++ = *src_ptr++;
> la_1: *dest_ptr++ = *src_ptr++;
> la_0: ; } while (src_ptr < lim);


As someone helpfully pointed out (in a reply to a post of mine) jump
tables are not even guaranteed to be the fastest was to implement a
switch.

Anyway, in this case you need memcpy.

--
Ben.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Determine the device is a router or switch given the Device IP via snmp KuttapanPro@gmail.com Cisco 0 11-05-2007 06:24 AM
Which of switch statement and if-else statement takes less time to execute? swaroophr@gmail.com C Programming 21 08-02-2005 09:24 AM
Determine the device is a router or switch given the Device IP kiranreddyd@gmail.com Cisco 14 12-26-2004 04:11 PM
Optimizing a switch statement Thomas Matthews C Programming 37 03-05-2004 08:20 PM
Optimizing a switch statement Thomas Matthews C++ 35 03-05-2004 08:20 PM



Advertisments