Velocity Reviews > Optimisation questions

# Optimisation questions

Markus Wichmann
Guest
Posts: n/a

 02-07-2012
On 06.02.2012 18:40, Dann Corbit wrote:
> In article <(E-Mail Removed)>, http://www.velocityreviews.com/forums/(E-Mail Removed) says...
>> int div8_0(int x)
>> {
>> return x >> 3;
>> }
>>
>> int div8_1(int x)
>> {
>> return x / 8;
>> }
>>
>>

>
> Using gcc 4.7 with -O3 on this program:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> int div8_0(int x)
> {
> return x >> 3;
> }
>
> int div8_1(int x)
> {
> return x / 8;
> }
>
> int main(void)
> {
> int a = rand();
> int result0 = div8_0(a);
> int result1 = div8_1(a);
> printf("%d / 8 = %d, %d\n", a, result0, result1);
> return 0;
> }
>
> gave this output, which adds additional instructions for the division as
> opposed to the shift:
>
> C:\tmp>type test.s
> .file "test.c"
> .text
> .p2align 4,,15
> .globl div8_0
> .def div8_0; .scl 2; .type 32; .endef
> .seh_proc div8_0
> div8_0:
> .seh_endprologue
> movl %ecx, %eax
> sarl \$3, %eax
> ret
> .seh_endproc

So, this is really just eax = ecx >> 3.

> .p2align 4,,15
> .globl div8_1
> .def div8_1; .scl 2; .type 32; .endef
> .seh_proc div8_1
> div8_1:
> .seh_endprologue
> leal 7(%rcx), %eax
> testl %ecx, %ecx
> cmovns %ecx, %eax
> sarl \$3, %eax
> ret
> .seh_endproc

While this is:

eax = ecx + 7
if (ecx >= 0) eax = ecx
eax >>= 3

Minor difference! The added 7 is there to round towards zero in case ecx
is negative.

[...]

> With similar output for the MS compiler:
> ; Listing generated by Microsoft (R) Optimizing Compiler Version
> 16.00.40219.01
>
> include listing.inc
>
> INCLUDELIB LIBCMT
> INCLUDELIB OLDNAMES
>
> _DATA SEGMENT
> \$SG4006 DB '%d / 8 = %d, %d', 0aH, 00H
> _DATA ENDS
> PUBLIC div8_0
> ; Function compile flags: /Ogtpy
> _TEXT SEGMENT
> x\$ = 8
> div8_0 PROC
> ; File c:\tmp\test.c
> ; Line 6
> sar ecx, 3
> mov eax, ecx
> ; Line 7
> ret 0
> div8_0 ENDP

Again, this is eax = ecx >> 3.

> _TEXT ENDS
> PUBLIC div8_1
> ; Function compile flags: /Ogtpy
> _TEXT SEGMENT
> x\$ = 8
> div8_1 PROC
> ; Line 11
> mov eax, ecx
> cdq
> and edx, 7
> sar eax, 3
> ; Line 12
> ret 0
> div8_1 ENDP

Okay, that's some cool code... this is:

eax = ecx
if (eax < 0) eax += 7
eax >>= 3

Same meaning, different code.
[...]

> I didn't bother to time it, but it seems that state of the art compilers
> create faster code for the shift than for the division.
>

Ya think? Well then, let's roll out the optimization guide, shall we? My
optimization guide states that up to three decoder units can be used in
parallel, so we have:

div8_0:
Opcode DUu L
mov eax, ecx 1 1
shr eax, 3 1 1

Total Latency: 1 cycle

DUu = Decoder Units used
L = Latency

div8_1 (by gcc):

lea eax, [rcx+7] 1 1
test ecx, ecx 1 1
cmovns eax, ecx 1 1
shr eax, 3 1 1

Total Latency: 2 cycles

div8_1 (by MSVC):

mov eax, ecx 1 1
cdq 1 1
and edx, 7 1 1
shr eax, 3 1 1

Total Latency: 2 cycles

So yeah, div8_0 _is_ faster than div8_1.

> I guess that the bottom line is that if you need the last little
> smackerel of performance then you have better try all the alternatives
> and benchmark them.

You could simply tell the compiler that that number won't be negative,
in which case adding code for rounding towards zero won't be necessary.
But no, you have to do it the hard way.

HTH,
Markus

lawrence.jones@siemens.com
Guest
Posts: n/a

 02-07-2012
Nick Keighley <(E-Mail Removed)> wrote:
>
> You /can/ improve the performance of a program without measuring all
> the different bits. I know I've done it (tweaking a hash calculation
> resulted in a massive performance improvement). Sometimes educated
> guess-work /is/ good enough.

Sometimes, but not very often. I've done a lot of performance
improvement over the years and have found that the bottlenecks
are very rarely where you expect them to be.
--
Larry Jones

When I want an editorial, I'll ASK for it! -- Calvin

Dann Corbit
Guest
Posts: n/a

 02-07-2012
In article <(E-Mail Removed)>, (E-Mail Removed)
says...
>
> > I guess that the bottom line is that if you need the last little
> > smackerel of performance then you have better try all the alternatives
> > and benchmark them.

>
> You could simply tell the compiler that that number won't be negative,
> in which case adding code for rounding towards zero won't be necessary.
> But no, you have to do it the hard way.

The original poster had signed integers.
I have no idea if it matter if they are signed or not.
I do understand the opimizations available for unsigned integers, since
I write bitmap based chess programs for a hobby.

Phil Carmody
Guest
Posts: n/a

 02-21-2012
Keith Thompson <(E-Mail Removed)> writes:
> Ian Collins <(E-Mail Removed)> writes:
> [...]
> > Not really, register is only a hit and it is pretty well redundant these
> > days. Et the optimiser do its job.

>
> "hit" --> "hint"

Nah, it's a hit - like a drug. You think it's doing you good, but
you're just deluding yourself, and need help!

Phil
--
> I'd argue that there is much evidence for the existence of a God.

Pics or it didn't happen.
-- Tom (/. uid 822)