Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Performance of hand-optimised assembly

Reply
Thread Tools

Performance of hand-optimised assembly

 
 
Wolfgang.Draxinger
Guest
Posts: n/a
 
      01-10-2012
On Fri, 23 Dec 2011 18:43:32 +0000
Ben Bacarisse <(E-Mail Removed)> wrote:

> I'm starting a new thread, but it is prompted by this remark from
> 88888 Dihedral <(E-Mail Removed)> about a short binary
> search function I posted:
>
> | Do you consider to convert 5 to 10 lines of C to assembly?
> | I did that 20 years ago on X86 cpu.
> | It was a piece of cake to gain the 20-40% speed required if paid
> | well.
>
> I've felt (it's only a feeling) that hand coding for speed (rather
> than to access features not available in C like extra wide multiplies
> and so on) was a thing of the past. But maybe I'm wrong. Is it
> still possible to get a significant speed-up over C by hand coding?


There are a few corner cases, where hand written assembly makes
compiled code see only the rear lights. But those are few and
far between. The problem is, that modern CPUs have so much side state,
that it's next to impossible to keep track of all the little details,
that may seem insignificant, but have a huge effect. Just the order of
independent instruction blocks can have a large impact.

However where hand written assembler is still useful is implementing
the inner loops of complex algorithms processing (very) large
datasets. Such algorithms usually involve only shuffling numbers around
in a strict layout, so it's easy to reason about this kind of task and
find patterns, that a compiler won't. And it allows to exploit
peculiarities of the used instruction set, a compiler never could do.
Like the one described by Dark Shikari here:

http://stackoverflow.com/a/98251/524368

> How much depends on the processor? How good are the modern
> optimising compilers that would have to be beaten?


Depends on the task at hand. If it's just about the normal execution
path of a program without many loops the compiler will most likely win,
because will put the code in a lot of configurations through some
simulation and count "clock cycles". Also there's s lot of heuristics
in it. The other case are aforementioned processing loops. You as the
algorithm, writer know about the dependencies in data access, know
where pointers can possibly point and where not (the compiler doesn't),
which allows to gain significant performance boosts in such situations.


Wolfgang

 
Reply With Quote
 
 
 
 
Tim Rentsch
Guest
Posts: n/a
 
      01-26-2012
Ben Bacarisse <(E-Mail Removed)> writes:

> James Harris <(E-Mail Removed)> writes:
>
>> On Dec 25, 12:07 am, Ben Bacarisse <(E-Mail Removed)> wrote:
>>> James Harris <(E-Mail Removed)> writes:
>>> > On Dec 23, 6:43 pm, Ben Bacarisse <(E-Mail Removed)> wrote:

>>
>> ...
>>
>>> >> size_t binsearch(int a[], size_t length, int key)
>>>
>>> > Why use size_t instead of unsigned int or unsigned long int,
>>> > especially as you add it to sum in the test harness which is long int?
>>> > Why not use unsigned for sum in the test harness?
>>>
>>> int is better yet because the output will then be same across machines.
>>> Using *any* unsigned type means the "not found" -1 return becomes a
>>> value that can vary from implementation to implementation.

>>
>> You mean -1 may have a strange value on a 1s-complement machine? On a
>> 2s-complement machine wouldn't it always be all bits set even if
>> assigned to an unsigned variable?

>
> No, the machine representation of -1 has no bearing on the result of the
> C code[1]. What I mean is just that a return of -1 has a value of
> SIZE_MAX and that varies from implementation to implementation. It's a
> good value to return in a real binary search because a valid index can't
> ever be (quite) that large -- the largest C-supported object of SIZE_MAX
> bytes can have indexes no higher than SIZE_MAX - 1.
>
> However, it's not a good return for a portable test program like this
> because the sum of the returns will vary from machine to machine.
> Alternatively, changing the simple sum to something like this:
>
> size_t idx = binsearch(data, length, key)
> sum += idx == (size_t)-1 ? -1 : idx;
>
> could have made the output more portable. [snip rest]


Presumably you meant something different for that last
line of code, since in most cases the effect would be
the same as

sum += idx;

 
Reply With Quote
 
 
 
 
Tim Rentsch
Guest
Posts: n/a
 
      01-26-2012
Ben Bacarisse <(E-Mail Removed)> writes:

> [snip] There was a thread here a while back about
> whether an object can be bigger than SIZE_MAX. It probably can, so
> the compiler would not be able to make the assumption [that an
> index into an array can never exceed some threshold].


Since it is the implementation (ie, the compiler) that is making
the decision (as to whether it ever creates such objects), it is
perfectly free to assume that it made whatever choice it made!
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      01-26-2012
jacob navia <(E-Mail Removed)> writes:

> Le 29/12/11 02:12, Phil Carmody a @C3{A9}crit :
>>
>> Stop right there. Why can they not have their top bits set?
>> Are you presuming I can't have an array of over 2^31 ints?

>
> The original program uses "int" to index, and "int" under windows
> linux and mac is at most 32 bits. This is not a problem of assembly
> but of the original C program
>
>> Can you justify that presumption by citing a relevant C standard?

>
> Yes. In the parafraph "Translation limits" the standard says that
> an implemetation is not required to support objects with
> more than 65535 bytes.


The depth of understanding reflected in this statement
is truly awe inspiring.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      01-26-2012
"BartC" <(E-Mail Removed)> writes:

> "Phil Carmody" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> James Harris <(E-Mail Removed)> writes:

>
>>> Understood. When I came to code this in 32-bit assembler I realised
>>> that, since high and low can never have their top bits set,

>>
>> Stop right there. Why can they not have their top bits set?
>> Are you presuming I can't have an array of over 2^31 ints?
>> Can you justify that presumption by citing a relevant C standard?
>> Have you failed to notice that there are x86-based boxes with
>> 256GB RAM on the market nowadays, and 8GiB arrays shouldn't
>> be considered unexpected any more.

>
> The address calculation for int-arrays usually involves multiplying by
> 2, 4 or 8.


Conceptually, but the actual semantics isn't defined in
terms of multiplication; it only says that the resulting
pointer must point to the appropriate array element.
(Incidental note: in fact the Standard even uses the
word "conceptually" in a footnote in the section on
array indexing.)

> With an index having the top bit set, that will overflow
> the size_t range. Whatever size_t happens to be.


The Standard doesn't contain any statement about the type in
which array indexing is carried out; the defining expressions
are mathematical, not given in terms of C operators. As long as
the array object referred to is big enough, array indexing has to
work even if the index is less than PTRDIFF_MIN or greater than
PTRDIFF_MAX or SIZE_MAX.
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      01-26-2012
Tim Rentsch <(E-Mail Removed)> writes:

> Ben Bacarisse <(E-Mail Removed)> writes:

<snip>
>> size_t idx = binsearch(data, length, key)
>> sum += idx == (size_t)-1 ? -1 : idx;

<snip>
> Presumably you meant something different for that last
> line of code, since in most cases the effect would be
> the same as
>
> sum += idx;


Yes, I intended something like

if (idx == (size_t)-1)
sum -= 1;
else sum += idx;

but turned it into an entirely not equivalent conditional expression.

--
Ben.
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      01-26-2012
Tim Rentsch <(E-Mail Removed)> writes:

> Ben Bacarisse <(E-Mail Removed)> writes:
>
>> [snip] There was a thread here a while back about
>> whether an object can be bigger than SIZE_MAX. It probably can, so
>> the compiler would not be able to make the assumption [that an
>> index into an array can never exceed some threshold].

>
> Since it is the implementation (ie, the compiler) that is making
> the decision (as to whether it ever creates such objects), it is
> perfectly free to assume that it made whatever choice it made!


I used the word "compiler" deliberately. Where there is a unified
implementation, then I agree that one part (the optimiser) can assume
facts about other parts (for example the C library), but that's not been
my experience. Most compilers have to work without knowing what
implementation they will end up part of.

--
Ben.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      02-01-2012
Ben Bacarisse <(E-Mail Removed)> writes:

> Tim Rentsch <(E-Mail Removed)> writes:
>
>> Ben Bacarisse <(E-Mail Removed)> writes:
>>
>>> [snip] There was a thread here a while back about
>>> whether an object can be bigger than SIZE_MAX. It probably can, so
>>> the compiler would not be able to make the assumption [that an
>>> index into an array can never exceed some threshold].

>>
>> Since it is the implementation (ie, the compiler) that is making
>> the decision (as to whether it ever creates such objects), it is
>> perfectly free to assume that it made whatever choice it made!

>
> I used the word "compiler" deliberately.


HA! Well aren't you the crafty old dodger?

> Where there is a unified
> implementation, then I agree that one part (the optimiser) can assume
> facts about other parts (for example the C library), but that's not been
> my experience. Most compilers have to work without knowing what
> implementation they will end up part of.


Right. I assumed you were talking about a constraint imposed by
the Standard, but actually you were talking about a constraint
imposed by a desire to be included in more than one implementation,
which is rather a different kettle of fish.
 
Reply With Quote
 
Phil Carmody
Guest
Posts: n/a
 
      02-12-2012
Tim Rentsch <(E-Mail Removed)> writes:
> jacob navia <(E-Mail Removed)> writes:
>
> > Le 29/12/11 02:12, Phil Carmody a @C3{A9}crit :
> >>
> >> Stop right there. Why can they not have their top bits set?
> >> Are you presuming I can't have an array of over 2^31 ints?

> >
> > The original program uses "int" to index, and "int" under windows
> > linux and mac is at most 32 bits. This is not a problem of assembly
> > but of the original C program
> >
> >> Can you justify that presumption by citing a relevant C standard?

> >
> > Yes. In the parafraph "Translation limits" the standard says that
> > an implemetation is not required to support objects with
> > more than 65535 bytes.

>
> The depth of understanding reflected in this statement
> is truly awe inspiring.


If we can see depth being reflected, then we must be below
that statement.

Phil
--
Unix is simple. It just takes a genius to understand its simplicity
-- Dennis Ritchie (1941-2011), Unix Co-Creator
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      03-08-2012
Phil Carmody <(E-Mail Removed)> writes:

> Tim Rentsch <(E-Mail Removed)> writes:
>> jacob navia <(E-Mail Removed)> writes:
>>
>> > Le 29/12/11 02:12, Phil Carmody a @C3{A9}crit :
>> >>
>> >> Stop right there. Why can they not have their top bits set?
>> >> Are you presuming I can't have an array of over 2^31 ints?
>> >
>> > The original program uses "int" to index, and "int" under windows
>> > linux and mac is at most 32 bits. This is not a problem of assembly
>> > but of the original C program
>> >
>> >> Can you justify that presumption by citing a relevant C standard?
>> >
>> > Yes. In the parafraph "Translation limits" the standard says that
>> > an implemetation is not required to support objects with
>> > more than 65535 bytes.

>>
>> The depth of understanding reflected in this statement
>> is truly awe inspiring.

>
> If we can see depth being reflected, then we must be below
> that statement.


Stop it, you're making my head hurt.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
adding assembly to windows\assembly through bat file Grant Merwitz ASP .Net 3 09-15-2005 11:40 AM
Assembly's manifest definition does not match the assembly reference. Horatiu Margavan via .NET 247 ASP .Net 0 08-30-2004 04:14 PM
ASP.NET 2.0: What is the namespace and assembly name of generated assembly SA ASP .Net 0 08-09-2004 05:09 PM
Referencing assembly from GAC using @assembly fails Brent ASP .Net 1 01-23-2004 08:23 PM
can a strongly named assembly reference a regular assembly? Prasanna Padmanabhan ASP .Net 1 11-19-2003 06:21 AM



Advertisments