Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Fastest way to get minimum of four integer values?

Reply
Thread Tools

Fastest way to get minimum of four integer values?

 
 
Daniel Pitts
Guest
Posts: n/a
 
      04-06-2008
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Actually the calculations are mainly table look-ups. It looks like
> this:
>
> int thismin = 100
>
> <...>
>
> <loop starts>
>
> int cell = (acceptMatrix[false][source[i-1]][target[1]]) ?
> distmatrix[i-1][1] : distmatrix[i-1][1]+1 ;
> const int above (distmatrix[i-1][2]+1);
> if (cell>above) cell=above;
> const int left (distmatrix[i][1]+1);
> if (cell>left) cell=left;
> const int trans (
> distmatrix[i-2][0] + 1 +
> (!acceptMatrix[false][source[i-2]][target[1]]) +
> (!acceptMatrix[true][source[i-1]][target[0]])
> );
> if (cell>trans) cell=trans;
> distmatrix[i][2]=cell;
> if (cell<thismin) thismin=cell;
>
> <loop ends>

Array lookups aren't terribly expensive, but they do have a cost. if
you can save a reference to distmatrix[i-1] (since its used in a few
places), that might shave a nanosecond or two. If you have nested
loops, look for calculations or references you can do outside of the
inner most loop (or outside of both loops).

For instance, if "i" is from your outer loop, you could "precalculate"
distmatrix[i], distmatrix[i-1], dismatrix[i-2], source[i-1],
source[i-2], etc...

Chances are, you'll have more luck finding a better algorithm than
you'll have shaving significant time off by optimizing this algorithm.
>
> distmatrix is a 2-D array of int.
>
> acceptmatrix is a 3D array of bool.
>
> i and j are unsigned int loop indexes
>
> source and target are const unsigned char * const
>
> An additional question is whether the use of named const temporaries
> is detrimental. Should I perhaps instead hand-inline the table
> lookups, and let the compiler find the common expressions and
> optimize, or...

It shouldn't really hurt, they would probably compile to the same thing.
>
> I don't know how smart the compiler is, but I suspect "not too
> bright". It's Borland C++ Builder version 6, so a 6 years old compiler
> at best.
>
>> Also: Every time somebody starts asking about "make it faster"
>> I have a standard response I trot out. Where's your stop-watch?
>> Where's your performance spec? First measure how much time
>> various parts of the code take to run. Then look to your spec
>> to see if that is acceptable. If you've made the spec, relax!
>> And don't sweat the stuff that takes a small part of the time.
>> Socks

>
> The "stop-watch" is outside this algorithm, but currently this code
> fragment dominates execution time. Changing the calculation of "trans"
> from an if/then based structure to the current gobbledygook decreased
> run time by approximately 15%, probably by eliminating a few branches
> and a temporary or two. I compare the output to the original
> implementation for a large dataset, so I am sure that behaviour was
> not changed in any obvious way.I am also pursuing other ways to
> decrease the data set that the algorithm iterates over, but that's an
> entirely different discussion.
>
> Of course I could just get some hotshot to hand-assy it for me, but
> then I would be locked-in to that particular "magic" implementation.
> Further, I doubt that solution would be as easy to hand-unroll as the
> C/C++ version is.
>
> Anders



--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
 
Reply With Quote
 
 
 
 
Martin York
Guest
Posts: n/a
 
      04-07-2008
On Apr 6, 2:43 pm, Daniel Pitts
<(E-Mail Removed)> wrote:

> Array lookups aren't terribly expensive, but they do have a cost. if
> you can save a reference to distmatrix[i-1] (since its used in a few
> places), that might shave a nanosecond or two.
>
> Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>


Not sure I buy that. Could be wrong!

Of course we are getting down to machine specific behavior at this
point. But accessing a memory location via an array or reference would
be just as expensive as most memory access instructions (Its been a
while so things may have changed) allow a major pointer and and offset
as part of the same instruction.

Actually the expensive bit is not in the memory retrieval but in
loading the memory from main memory into the local cache. If your code
is getting cache misses the processor stall time for this is many
orders of magnitude greater than any theoretical difference between an
array verses reference read.

As a result I think any changes to code for a perceived code
performance increase must be significant before you consider change to
the code that makes it less readable (read maintainable) as any
performance increases on a particular data size may evaporate when
used on a new data size because of changes in cache traversal/usage.

 
Reply With Quote
 
 
 
 
ppi
Guest
Posts: n/a
 
      04-07-2008
On Apr 1, 4:42 am, (E-Mail Removed) wrote:
> Hi,
>
> I have a tight loop where I basically do this over and over (millions
> of times):
>
> int cell = <calculation>
> const int above = <calculation>
> if (cell > above) cell = above;
> const int below = <calculation>
> if (cell > below) cell = below;
> const int trans = <calculation>
> if (cell > trans) cell = trans;
>
> Is there a faster way to do this?
>
> Sincerely,
> Anders


you may get better results by helping the compiler filling up the
pipeline:

int a = cell > above ? above : cell;
int b = below > trans ? trans : below;
cell = a < b ? b : a;

ideally 'a' and can 'b' be computed efficiently: they are both
independent and cached in registers.
By doing so you provide the compiler with more information regarding
the computation, namely that his free to compute a or b whenever it
wants, in this order or not.
Keeping on re-assigning the same variable again and again just make
the program instructions code stuck in the pipeline.
If you can write something like the previous code, you should notice a
speed-up.
This is nice feature off powerpc and AMD/Intel pentium processors.

You may be interested by:
- http://www.agner.org/optimize/optimizing_cpp.pdf
- http://developer.amd.com/TechnicalAr...162004127.aspx
- http://developer.amd.com/TechnicalAr...212004126.aspx

Even if the last 2 links are from the amd website, they are still
relevant for general optimization.

Unfortunately this means that you do have a good compiler and that the
code/algorithm you use can actually make this possible.

Hope it will help,
-- paulo
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      04-08-2008
On Apr 7, 2:15 am, Martin York <(E-Mail Removed)> wrote:
> On Apr 6, 2:43 pm, Daniel Pitts


> <(E-Mail Removed)> wrote:
> > Array lookups aren't terribly expensive, but they do have a cost. if
> > you can save a reference to distmatrix[i-1] (since its used in a few
> > places), that might shave a nanosecond or two.


> > Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>


> Not sure I buy that. Could be wrong!


More to the point (and this applies to the last response by ppi
as well, concerning pipeline stall): the compiler knows this
better than you, and turning on optimization will normally
result in the best solution here. Compilers have been caching
array accesses and common subexpressions for several decades
now, and all modern compilers know about pipelines, memory
caches, etc., and take them into account when optimizing. All
such micro-optimizations do in source code is make it more
convoluted---more difficult for the human reader to understand,
and more difficult for the compiler to optimize.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
anders.johansen@gmail.com
Guest
Posts: n/a
 
      04-09-2008
Hi,

Well, I can tell you that manual loop unrolling for one special case
gave a massive speedup, and it now runs in about 25% of the time it
used to. I may attempt the approach of helping the compiler bound the
life of temporaries by using curlies, but initial results indicate
that this is detrimental to performance. Next I will probably try to
eliminate named temporaries, on the assumption that the compiler is
able to identify the references easier withour the aliassing
introduced by named temps.

Still, so far I have gotten some nice speedups. They are not related
to the issue I brought up here, but the solutions are in many cases
inspired by the discussion, so you have all helped me in various
ways

Thank you all!

Anders
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
New Canon EIS mirrorless system - Four Thirds, but not Four Thirds! Bruce Digital Photography 31 09-25-2010 05:38 AM
Sixty Four Bit Integer Seebs C Programming 26 10-03-2009 10:04 AM
write a four byte integer to a memory region allocated as (char) rlc C Programming 10 07-01-2009 05:30 AM
Fastest way to convert a byte of integer into a list Godzilla Python 15 07-15-2007 02:37 AM
Fastest 5 mp Digital Camera ? Fastest 4 mp Digital Camera? photoguysept102004@yahoo.com Digital Photography 6 10-28-2004 11:33 AM



Advertisments