Velocity Reviews > Optimizing pow() function

# Optimizing pow() function

glen herrmannsfeldt
Guest
Posts: n/a

 04-22-2013
James Kuyper <(E-Mail Removed)> wrote:
> On 04/22/2013 02:34 PM, Edward A. Falk wrote:
>> <(E-Mail Removed)> wrote:
>>> I have a simple (16-bit) embedded platform and I have to
>>> make the following calculations.

>>> unsigned int
>>> func(unsigned int adc, unsigned int pwr, unsigned int pt, double exp) {
>>> return (unsigned int)(pow((double)adc, exp) *
>>> (double)pwr / pow((double)pt, exp));

>>> I noticed this calculation takes too much time, most probably
>>> for pow() function call and double conversion.

>>> I'm finding a way to simplify/optimize this function. Consider that adc and pt
>>> parameters are in the range 0..1023, exp is in the range 1.00-3.00, and pwr can be
>>> 50-10000.

>> Are you not worried about overflow?

>> Your embedded processor almost certainly does not have a floating-point
>> unit. This explains why your code is so slow.

>> I see all values are guaranteed to be integer values.

> The variable exp is double, and he didn't say that only integral values
> of exp are allowed. If that is the case, he can speed up the code a lot
> by using multiplications, rather than calls to pow().

If only a small number of different exp values occur, a look-up
table would be faster than pow. If I want an int result, I try to do
the whole calculation in integer arithmetic, even if it needs some
shifts to get the binary point in the right place.

Routines for fixed point scaled exp, log, and I believe pow are
in Knuth's "Metafont: The Program."

With only 10 bit input, I presume you don't need 53 bits of double.
With a little work doing scaling, it can all be done in fixed point
arithmetic. (Except that pow comes in as double, but either generate
the appropriate scaled fixed point value or send it in that way.)

-- glen

-- glen

Keith Thompson
Guest
Posts: n/a

 04-22-2013
"BartC" <(E-Mail Removed)> writes:
> "Keith Thompson" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> James Kuyper <(E-Mail Removed)> writes:
>>> On 04/22/2013 03:09 PM, BartC wrote:
>>> ...
>>>> Is there a version of pow() that uses float instead of double? ...
>>>
>>> C99 introduced powf(), which takes float arguments and returns a float
>>> value.

>>
>> But I wouldn't count on it being significantly faster than pow().

>
> Not on a desktop PC with floating point hardware. But the OP is using
> a 'simple' 16-bit processor.

I still wouldn't *count* on it being significantly faster. If it
matters, measure it.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

glen herrmannsfeldt
Guest
Posts: n/a

 04-23-2013
Keith Thompson <(E-Mail Removed)> wrote:
> James Kuyper <(E-Mail Removed)> writes:
>> On 04/22/2013 03:09 PM, BartC wrote:

>>> Is there a version of pow() that uses float instead of double? ...

>> C99 introduced powf(), which takes float arguments and
>> returns a float value.

> But I wouldn't count on it being significantly faster than pow().

In a software only implementation, one might hope that it is
faster, but yes I wouldn't be surprised if it wasn't. It should
use logf() and expf(), which again might not be faster.

-- glen

pozz
Guest
Posts: n/a

 04-23-2013
Il 22/04/2013 20:22, Fred K ha scritto:
> On Monday, April 22, 2013 11:08:24 AM UTC-7, Rich Webb wrote:
>> On Mon, 22 Apr 2013 08:23:51 -0700 (PDT), (E-Mail Removed) wrote: >I have a simple (16-bit) embedded platform and I have to make the following calculations. > >unsigned int >func(unsigned int adc, unsigned int pwr, unsigned int pt, double exp) { > return (unsigned int)(pow((double)adc, exp) * > (double)pwr / pow((double)pt, exp)); >} > >I noticed this calculation takes too much time, most probably for pow() function call and double conversion. > >I'm finding a way to simplify/optimize this function. Consider that adc and pt parameters are in the range 0..1023, exp is in the range 1.00-3.00, and pwr can be 50-10000. > >Any suggestion to avoid pow() function call? You might try asking over in comp.arch.embedded, where this kind of an issue is more often addressed than here in a general C group. Depending on what you do with the result of func(), you might be able to get away with operating on the logarithms, applying Jack Crenshaw's bitlog() function as a fast approximation.

>
> If pt is indeed in the range 0...1023, you have to handle the extreme case where pt=0 somehow, since the result is infinity, unless adc is also zero, in which case the result is indeterminant.
>

I have to make another restriction. pt and adc parameters could be in
the range 0..1023, but I'm using pt=800.
At the moment I don't need other values for pt, but I'm interested to
generalize this calculation with different values for pt in a small
range around 800 (for example, 700-900).

pozz
Guest
Posts: n/a

 04-23-2013
Il 22/04/2013 17:42, Eric Sosman ha scritto:
> On 4/22/2013 11:23 AM, (E-Mail Removed) wrote:
>> I have a simple (16-bit) embedded platform and I have to make the
>> following calculations.
>>
>> unsigned int
>> func(unsigned int adc, unsigned int pwr, unsigned int pt, double exp) {
>> return (unsigned int)(pow((double)adc, exp) *
>> (double)pwr / pow((double)pt, exp));
>> }
>>
>> I noticed this calculation takes too much time, most probably for
>> pow() function call and double conversion.
>>
>> I'm finding a way to simplify/optimize this function. Consider that
>> adc and pt parameters are in the range 0..1023, exp is in the range
>> 1.00-3.00, and pwr can be 50-10000.
>>
>> Any suggestion to avoid pow() function call?

>
> Simple enough to avoid half of them:
>
> return pwr * pow( (double)adc / (double)pt, exp);
>
> This is "algebraically equivalent" to the original, although
> it probably won't give exactly the same result every time.
>

Sure, I have halfed the time in this way, but I'd like to reduce the
process time more, if possible.

pozz
Guest
Posts: n/a

 04-23-2013
Il 22/04/2013 20:49, James Kuyper ha scritto:
> On 04/22/2013 02:34 PM, Edward A. Falk wrote:
>> In article <(E-Mail Removed)>,
>> <(E-Mail Removed)> wrote:
>>> I have a simple (16-bit) embedded platform and I have to make the following calculations.
>>>
>>> unsigned int
>>> func(unsigned int adc, unsigned int pwr, unsigned int pt, double exp) {
>>> return (unsigned int)(pow((double)adc, exp) *
>>> (double)pwr / pow((double)pt, exp));
>>> }
>>>
>>> I noticed this calculation takes too much time, most probably for pow() function call
>>> and double conversion.
>>>
>>> I'm finding a way to simplify/optimize this function. Consider that adc and pt
>>> parameters are in the range 0..1023, exp is in the range 1.00-3.00, and pwr can be
>>> 50-10000.

>>
>> Are you not worried about overflow?
>>
>> Your embedded processor almost certainly does not have a floating-point
>> unit. This explains why your code is so slow.
>>
>> I see all values are guaranteed to be integer values.

>
> The variable exp is double, and he didn't say that only integral values
> of exp are allowed. If that is the case, he can speed up the code a lot
> by using multiplications, rather than calls to pow().

exp can be in the range 1.00 to 3.00 (it derives from an integer value
in the range 100-300 divided by 100).

> Note: 'exp' is a bad name for a variable, given that it hides the
> <math.h> function of the same name.
>

You are right, thank you for this observation.

pozz
Guest
Posts: n/a

 04-23-2013
Il 22/04/2013 21:09, BartC ha scritto:
> <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> I have a simple (16-bit) embedded platform and I have to make the
>> following calculations.
>>
>> unsigned int
>> func(unsigned int adc, unsigned int pwr, unsigned int pt, double exp) {
>> return (unsigned int)(pow((double)adc, exp) *
>> (double)pwr / pow((double)pt, exp));
>> }
>>
>> I noticed this calculation takes too much time, most probably for pow()
>> function call and double conversion.
>>
>> I'm finding a way to simplify/optimize this function. Consider that adc
>> and pt parameters are in the range 0..1023, exp is in the range
>> 1.00-3.00,
>> and pwr can be 50-10000.
>>
>> Any suggestion to avoid pow() function call?

>
> Is there a version of pow() that uses float instead of double? That
> might be
> a bit faster if floating point has to be done in software.

I will check.

> Can exp be anything from 1.00 to 3.00?

Yes, it is a sort of "fine calibration".

> How often will it be 1.0, 2.0, and
> 3.0? (These can be optimised.)

The same occurence of 1.03 or 2.43. Anyway, after exp value is
calibrated on the field during first stages of installation/setup, it
remains always the same.

> Will it also only be specified to two
> decimals? That suggests only 201 different values, although making use of a
> 200K lookup table indexed by adc/pt and 100*exp is probably not that
> practical, and the results will be approximate.

I already thought on this approach, but the look-up tables will be too big.

> Are the same adc/pt and exp combinations likely to recur frequently?It
> might be possible to cache the results of the calculations (in a table a
> lot
> smaller than 200K entries), but execution time might become non-linear.

In "real-time" systems, I don't like to have tasks that takes a
"non-linear" time to execute.

> If
> the combinations are predictable, then the table could be precalculated
> (it all depends how this func() function is used).
>

The exp parameter is a calibration value that is defined during first
stages. After that, it will remain always the same value.
I can pre-calculate the look-up table for that particular value. I need
a table of adc/pt combinations length (1023 integers, considering a
fixed pt value), so 2046 bytes... I don't have this space in RAM

Francois Grieu
Guest
Posts: n/a

 04-23-2013
Pozz wrote:

> I have a simple (16-bit) embedded platform and I have to make
> the following calculations.

unsigned int
func(unsigned int adc, unsigned int pwr, unsigned int pt, double exp) {
return (unsigned int)(pow((double)adc, exp) *
(double)pwr / pow((double)pt, exp));
}

> adc and pt parameters are in the range 0..1023,
> exp is in the range 1.00-3.00, and
> pwr can be 50-10000.

> I have to make another restriction. pt and adc parameters could be
> in the range 0..1023, but I'm using pt=800.
> At the moment I don't need other values for pt, but I'm interested to
> generalize this calculation with different values
> for pt in a small range around 800 (for example, 700-900).

Could you state what in the context is constant, or could be subject to
one-time precomputation, and what is variable (like, I guess, ADC?)

Also: does your environment has powf, which is to float what pow
is to double? It might be much faster.

If pt is constant we can save a little with

#define pt 800

unsigned func(unsigned adc, unsigned pwr, double exp) {
return pwr * pow( (1./pt) * adc, exp );
}

Similarly, if pt is a runtime parameter that seldom vary, we can do

double ginvpt;

// call when pt changes
void reparameterize( unsigned pt ) {
ginvpt = 1./pt;
}

unsigned func(unsigned adc, unsigned pwr, double exp) {
return pwr * pow( ginvpt * adc, exp );
}

We would do significantly better, including getting rid of pow, and perhaps
of double or float, if exp is constant (or seldom vary); and avoid the
final multiplication, and I guess double altogether, if pwr is constant
(or seldom vary).

Francois Grieu

pozzugno@gmail.com
Guest
Posts: n/a

 04-23-2013
On Monday, April 22, 2013 5:23:51 PM UTC+2, (E-Mail Removed) wrote:
> Any suggestion to avoid pow() function call?

Ok, it's better to explain what I'm trying to do.

I have a 10-bit ADC that acquires a voltage signal and convert it to a decimal number 0..1023.

This signal measures a power (Watt). The relation between voltage and poweris theorically quadratic:

P = k * x ^ 2

where k is a constant that depends on the system and x is the ADC result.

In order to calibrate k, I measure with an instrument the power at a nominal/reference value, for example I have Pr power measured ad Vr voltage that xr ADC points. So I can write:

P = Pr * (x / xr) ^ 2

Usually I make the reference measurement at xr=800pt.

The quadratic law above is simple to implement, but in practice I have to fine-tune this law, because at low power levels the voltage/power relation seems to deviate from the quadratic law (I don't know exactly why, maybe some non-linear behaviour of some components). So I changed the formula with:

P = Pr * (x / xr) ^ a

where a is in the range 1-3.

As you can see, after calibration/setup/installation/test steps I finally have three constants: Pr, xr and a. For simplicity xr could be considered constant at compile time (800pt). Pr and a is machine dependent.

So my problem is to calculate:

p = (x / 800) ^ a = exp(a * ln(x / 800))

I already tried to create a look-up table for ln(x) with x in the range 0.001 (about 1 / 800) to 1.3 (about 1023 / 800), leaving only the exp() function call. It works on desktop computer, but I couldn't find free space for the table (about 2600 bytes) in the embedded platform, both in RAM and in Flash.

The best would be to have a simple formula (polynomial function?) that approximate well enough the above equation at run-time, without too big tables.

BartC
Guest
Posts: n/a

 04-23-2013

"glen herrmannsfeldt" <(E-Mail Removed)> wrote in message
news:kl4r0b\$64f\$(E-Mail Removed)...
> Keith Thompson <(E-Mail Removed)> wrote:
>> James Kuyper <(E-Mail Removed)> writes:
>>> On 04/22/2013 03:09 PM, BartC wrote:

>
>>>> Is there a version of pow() that uses float instead of double? ...

>
>>> C99 introduced powf(), which takes float arguments and
>>> returns a float value.

>
>> But I wouldn't count on it being significantly faster than pow().

>
> In a software only implementation, one might hope that it is
> faster, but yes I wouldn't be surprised if it wasn't. It should
> use logf() and expf(), which again might not be faster.

For what reasons wouldn't it be faster to emulate only 32-bit floating point
arithmetic instead of 64-bits, on a 16-bit processor?

--
Bartc