Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Performance of hand-optimised assembly

Reply
Thread Tools

Performance of hand-optimised assembly

 
 
BartC
Guest
Posts: n/a
 
      12-24-2011


"88888 Dihedral" <(E-Mail Removed)> wrote in message
news:11437228.11.1324701159954.JavaMail.geo-discussion-forums@prix23...
> http://ibiblio.org/gferg/ldp/GCC-Inl...bly-HOWTO.html
>
> Check the site to combine C and assembly for GCC.


Having to use that ghastly AT&T syntax is an argument *against* using
assembly, even if it is faster...

--
Bartc

 
Reply With Quote
 
 
 
 
James Harris
Guest
Posts: n/a
 
      12-24-2011
On Dec 24, 1:09*pm, "BartC" <(E-Mail Removed)> wrote:

....

> Having to use that ghastly AT&T syntax is an argument *against* using
> assembly, even if it is faster...


Agreed!

Can you give me any pointers on how to link gcc and nasm or another
separate assembler? I've found a few web references with useful tips
but IIRC one of them was your first attempt at this a few years ago.
Maybe you found some more info since then. I know I can do it but
maybe there are good ways and bad ways.

James
 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      12-24-2011
"Ben Bacarisse" <(E-Mail Removed)> wrote in message
news:0.393ac02b1554cc6fda1d.20111224025213GMT.87eh (E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:


> Well, it's not bad that you can beat 2 of the three compilers.


I've tried one more compiler, and the results, now normalised and for the
fastest code (and using N=2500 for more accuracy), are:

gcc -O3 1.00 (using inlining)
My asm 1.12 (original without inlining or unrolling)
Pelles C -Ot 1.26
DMC -o 1.35
lcc-win32 -O 1.64

(Adjusted for loop overheads)

So while modern compilers might be ultra-sophisticated, most of these
aren't! Admittedly they are all free downloads..

--
Bartc

 
Reply With Quote
 
James Harris
Guest
Posts: n/a
 
      12-24-2011
On Dec 23, 6:43*pm, Ben Bacarisse <(E-Mail Removed)> wrote:

....

I don't understand some of your code. Queries below.

> Here is a test program:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> size_t binsearch(int a[], size_t length, int key)


Why use size_t instead of unsigned int or unsigned long int,
especially as you add it to sum in the test harness which is long int?
Why not use unsigned for sum in the test harness?

> {
> * * *size_t low = 0, high = length;


Why not high = length - 1?

> * * *while (high > low) {
> * * * * * size_t middle = low + (high - low)/2;


Why not middle = (high + low) / 2?

> * * * * * if (key == a[middle])
> * * * * * * * *return middle;
> * * * * * else if (key > a[middle])
> * * * * * * * *low = middle + 1;
> * * * * * else high = middle;


Why not high = middle - 1?

> * * *}
> * * *return -1;
>
> }


The reason for the queries is I wrote an asm binsearch but when
comparing it with your code realised that you had some differences in
the approach.

James
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      12-24-2011


"James Harris" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On Dec 23, 6:43 pm, Ben Bacarisse <(E-Mail Removed)> wrote:


>> while (high > low) {
>> size_t middle = low + (high - low)/2;

>
> Why not middle = (high + low) / 2?


That form can cause an overflow when the numbers are high. Otherwise
(high+low)/2 is marginally faster.

--
Bartc
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      12-24-2011

> Can you give me any pointers on how to link gcc and nasm or another
> separate assembler? I've found a few web references with useful tips
> but IIRC one of them was your first attempt at this a few years ago.
> Maybe you found some more info since then. I know I can do it but
> maybe there are good ways and bad ways.


I haven't made any more attempts since then. You still have the problem in
that the assembly code with the nice syntax has to be in a separate module,
not inline.

To combine, you probably know how; it means running Nasm with the right
object format (nasm -fwin32 under Windows), and presenting the resulting
..obj file to gcc in it's list of list of files.

Oh, and you have to put "_" in front any exported names in the asm module,
for some reason that I still don't know.

And you have to sort out calling conventions between the two languages (C
calling Asm, and Asm calling C).

So it's still a bit fiddly.

(I normally use assembly as inline code within my own language. The
advantages are the use of Intel/Nasm syntax, with no need to write each line
as a string literal either, and being able to write substantial amounts of
code within a high-level language framework: functions, declarations, local
namespaces in each function, etc.

Also interaction with the high-level code is simple (optimisation doesn't
get in the way, for a start!). A lot of other issues are taken care of too.

I would call that a _good_ and easy way of using assembly. If anyone knows
of a C compiler that offers the same, then I might start using C a lot
more.)

--
bartc

 
Reply With Quote
 
Jens Gustedt
Guest
Posts: n/a
 
      12-24-2011
Hello,

Am 12/24/2011 01:42 PM, schrieb BartC:
> Yes. Either using -O/-o for the first two, or not using it; and using -O3
> or -O0 for gcc.
> ...
> So for this example, I don't think gcc was demonstrating any special
> knowledge of my processor; maybe there are processor-specific switches that
> could have been used.


with modern gcc you can always use -march=native to optimize for the
processor on which the compiler is running

Jens

 
Reply With Quote
 
James Harris
Guest
Posts: n/a
 
      12-24-2011
On Dec 24, 3:30*pm, "BartC" <(E-Mail Removed)> wrote:

....

> To combine, you probably know how; it means running Nasm with the right
> object format (nasm -fwin32 under Windows), and presenting the resulting
> .obj file to gcc in it's list of list of files.
>
> Oh, and you have to put "_" in front any exported names in the asm module,
> for some reason that I still don't know.
>
> And you have to sort out calling conventions between the two languages (C
> calling Asm, and Asm calling C).
>
> So it's still a bit fiddly.


I'm aware there are various options for name mangling and parameter
passing. I'm just not sure which of the many are best, if there is
such a thing. I asked in response to Ben's initial post as you had got
a test working but I'm thinking this is too big an issue for this
subthread and needs a separate post, and a crosspost at that!
Something to come back to.

James
 
Reply With Quote
 
James Dow Allen
Guest
Posts: n/a
 
      12-24-2011
Ben Bacarisse <(E-Mail Removed)> might have writ, in
news:0.d0ecd8f477637819bed2.20111223184332GMT.87pq (E-Mail Removed):

> I'm starting a new thread, but it is prompted by this remark...
>| Do you consider to convert 5 to 10 lines of C to assembly?
>| I did that 20 years ago on X86 cpu.
>| It was a piece of cake to gain the 20-40% speed required if paid
>| well.
>
> I've felt (it's only a feeling) that hand coding for speed (rather
> than to access features not available in C like extra wide multiplies
> and so on) was a thing of the past. But maybe I'm wrong. Is it still
> possible to get a significant speed-up over C by hand coding? How
> much depends on the processor?


Don't know, but 20 years ago IIRC, good speedups should have been
common.

I did an experiment just yesterday on a snippet posted in
comp.programming. I'll report my results here. The key line of code
seems "wonderful" in its right: A bit-twiddler to replace a word with N
bits set to the next such word in lexicographical order.

Dave <(E-Mail Removed)> wrote, in
news:(E-Mail Removed):

> On Dec 17, 10:05*am, Andrew Tomazos <(E-Mail Removed)> wrote:
>> I want to loop over every number from 0 to (2^m-1) inclusive whos
>> binary representation contains exactly n bits.

>
> Try something like this:
>
> i = (1 << n) -1;
> while( !(i >> m) )
> {
> do_something_with(i);
> i = (i+(i&(-i)))|(((i^(i+(i&(-i))))>>2)/(i&(-i)));
> }
>
> The first "i =" statement sets i to the smallest number with n 1-
> bits.
>
> The second "i =" statement is some bit manipulation "magic" that sets
> i to the next sequential number that has the same number of 1-bits.
> For an explanation, see
> http://groups.google.com/group/progr...05b3c8b4042d5b
> , which points to a PowerPoint presentation that gives a
> multi-statement expression of the one-liner that I constructed from
> it.
>
> The "while" statement loops as long as i remains less than 2^m.
>
> Dave


With gcc -O{2,4} Pentium I got speedup over
> i = (i+(i&(-i)))|(((i^(i+(i&(-i))))>>2)/(i&(-i)));

just by factoring out common subexpressions and assigning them to temp
variables. (One might think gcc could do that; perhaps this line is
just too complicated.)

Buried in the expression is a '/' operator which, of course, is time
consuming. I got a big speedup replacing that divide.
while (!(lb & 15)) i >>= 4, lb >>= 4;
while (!(lb & 1)) i >>= 1, lb >>= 1;
Pentium has an OPcode, I think, accessible from gcc with `fsf()' which
seems like it *might* allow better speedup, but what I tried was slower
than the while's.

Note that neither of these speedup ideas (while or fsf()) required using
assembler instead of C. IIRC it's not unusual to be able to speedup C
by catering to the C compiler's code generator, often in non-obvious
ways.

James

 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      12-24-2011
"James Harris" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On Dec 24, 3:30 pm, "BartC" <(E-Mail Removed)> wrote:


>> And you have to sort out calling conventions between the two languages (C
>> calling Asm, and Asm calling C).
>>
>> So it's still a bit fiddly.

>
> I'm aware there are various options for name mangling and parameter
> passing. I'm just not sure which of the many are best, if there is
> such a thing.


If you're interfacing with C, the possibilities for parameter passing are
limited, usually you have to go along with what the specific C compiler
does, or whatever foreign interfaces are supported.

Apparently for x86-64, parameters would be passed in registers, which is
best for this example, but on x86-32, the stack seems to be favoured. In any
case the 'winning entry' (gcc -O3) bypassed the whole issue by inlining the
code (which I considered cheating).

> I asked in response to Ben's initial post as you had got
> a test working


The test proposed actually would have worked well written in a separate
module, but I couldn't remember all the fiddly details I needed to know, and
used inline assembly in a language I was familiar with.

> but I'm thinking this is too big an issue for this
> subthread and needs a separate post, and a crosspost at that!
> Something to come back to.


The bigger issue is not so much getting these things to work, as whether it
is still worthwhile using assembly at all, in these days of
'super-intelligent' compilers (which I haven't really come across myself). I
think that was the point of the thread.

--
Bartc

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
adding assembly to windows\assembly through bat file Grant Merwitz ASP .Net 3 09-15-2005 11:40 AM
Assembly's manifest definition does not match the assembly reference. Horatiu Margavan via .NET 247 ASP .Net 0 08-30-2004 04:14 PM
ASP.NET 2.0: What is the namespace and assembly name of generated assembly SA ASP .Net 0 08-09-2004 05:09 PM
Referencing assembly from GAC using @assembly fails Brent ASP .Net 1 01-23-2004 08:23 PM
can a strongly named assembly reference a regular assembly? Prasanna Padmanabhan ASP .Net 1 11-19-2003 06:21 AM



Advertisments