Re: Code optimisation
> Fyi, test[i] does indeed contain either 0 or 1.
> I did think about gathering the non zero terms but I don;t believe this
> be effecient because in reality there are several test's and they change
> during the iteration.
Rather than building a test array first and then gather the non-zero term,
maybe it is also possible to directly gather the non-zero terms without
filling a test array?
> I like the idea of putting
> out[i] += test[i] ? in[i] : 0;
> but I suspect (as one response noted) that most compilers will do
> like this anyway.
The compiler won't care how you write it, but you do. So make the code as
clear as possible.
> As a matter of interest, are compilers smart enough not to bother adding
> zero - or is faster to add than to test?
Only if the compiler can determine that it will be always adding zero at
compile time, e.g.: a += 0; will most likely be optimized away. Integer
addition is very fast (often the fastest instruction) on all processors I
know of. Floating point add is on many modern processors pretty fast as
well. However branches seem to get slower with every new processor
generation. In most, if not all, cases testing for zero before doing a add
would actually slow things down.
Rather than tweaking you code here and there, I think you best bet is to
reconsider the algorithm you are using. My experience is that tweaking code
improves the performance typically no more than 10% if you are lucky. A
better algorithm can often improve the performance by many times and
sometimes by more than an order of magnitude. If you are working on large
data sets, you might also want to consider the memory access patterns,
though I would take a good look at the algorithm first as optimizing memory
accesses can get ugly.
Peter van Merkerk
Re: Code optimisation
On a lot of modern CPU architectures, they now employ the use of predicate
registers so that the code:
out[i] = 4*in[i]
out[i] = 0.0;
does not contain any branches and does not require any pipeline flushes.
This turns into something like (pseudo code obviously):
PredReg1 = CMP test[i]
MOVE-IF-TRUE PredReg1 out[i] <-- 4*in[i]
MOVE-IF-FALSE PredReg1 out[i] <-- 0.0
The 2 MOVE-IF instructions are often executed in parallel.
out[i] = test[i]*(4*in[i]);
might introduce a multiply instruction which might actually slow things down
on some CPUs. So do as "Peter van Merkerk" says "Rather than tweaking you
code here and there, I think your best bet is to reconsider the algorithm
you are using."
Rapid Realm Technology, Inc.
|All times are GMT. The time now is 08:15 AM.|
Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.