Velocity Reviews > VHDL > fixed point math algorithms

fixed point math algorithms

sanborne
Guest
Posts: n/a

 12-04-2008
I have another question related to fixed point.

I wrote a function in MATLAB that will take an arbitrary yet well
behaved function and generate an if-then-elseif tree that can
approximate that function over a given interval. For example, the
following is the output of my function to approximate the natural
logarithm over an interval of 1e-4 to 1 (I apologize if you are
unfamiliar with MATLAB, as this is a VHDL forum, but the functionality
of this code should be relatively clear):

% begin code
% y=ln( x );
% at each interval, y is computed using a linear approximation y=a*x +
b.

if x<0.0263665
if x<0.00428133
if x<0.00206914
if x<0.00143845
y = fi(825.681,T1,F)*x + fi(-7.72178,T1,F);% here 1
else
y = fi(574.007,T1,F)*x + fi(-7.35822,T1,F);
end
elseif x>0.00297635
y = fi(277.414,T1,F)*x + fi(-6.63109,T1,F);% here 4
else
y = fi(399.046,T1,F)*x + fi(-6.99465,T1,F);
end
elseif x>0.00615848
if x<0.0127427
if x<0.00885867
y = fi(134.072,T1,F)*x + fi(-5.90395,T1,F);% here 6
else
y = fi(93.206,T1,F)*x + fi(-5.54039,T1,F);
end
elseif x>0.0183298
y = fi(45.0458,T1,F)*x + fi(-4.81325,T1,F);% here 9
else
y = fi(64.7961,T1,F)*x + fi(-5.17682,T1,F);
end
else
y = fi(192.856,T1,F)*x + fi(-6.26752,T1,F);
end
elseif x>0.0379269
if x<0.162378
if x<0.078476
if x<0.0545559
y = fi(21.7703,T1,F)*x + fi(-4.08612,T1,F);% here 11
else
y = fi(15.1346,T1,F)*x + fi(-3.72256,T1,F);
end
elseif x>0.112884
y = fi(7.31443,T1,F)*x + fi(-2.99542,T1,F);% here 14
else
y = fi(10.5214,T1,F)*x + fi(-3.35899,T1,F);
end
elseif x>0.233572
if x<0.483293
if x<0.335982
y = fi(3.53501,T1,F)*x + fi(-2.26829,T1,F);% here 16
else
y = fi(2.45752,T1,F)*x + fi(-1.90473,T1,F);
end
elseif x>0.695193
y = fi(1.1877,T1,F)*x + fi(-1.17759,T1,F);% here 19
else
y = fi(1.70845,T1,F)*x + fi(-1.54116,T1,F);
end
else
y = fi(5.08494,T1,F)*x + fi(-2.63186,T1,F);
end
else
y = fi(31.3155,T1,F)*x + fi(-4.44969,T1,F);
end
% end code

I think converting this code to VHDL should be relatively straight
forward, although I do not know very well how to deal with fixed
point. How, for example would I convert the constant coefficients
above into fixed point hardware?

But my real question is whether this is the right approach. I am
afraid the hardware implementation of the equivalent VHDL would use a
TON of multipliers, and might have other problems too. How would
someone implement a non-linear, yet common function in hardware? Note
that taylor series is definitely not the answer. After a fairly in-
depth study the above solution is the best I could come up with. Are
there any other suggestions? Is there something better than my binary
if-tree approach?

Thanks a ton for any responses.

sanborne
Guest
Posts: n/a

 12-04-2008
On Dec 3, 10:37*pm, David Bishop <(E-Mail Removed)> wrote:
> sanborne wrote:
> > I have another question related to fixed point.

>
> > I wrote a function in MATLAB that will take an arbitrary yet well
> > behaved function and generate an if-then-elseif tree that can
> > approximate that function over a given interval. For example, the
> > following is the output of my function to approximate the natural
> > logarithm over an interval of 1e-4 to 1 (I apologize if you are
> > unfamiliar with MATLAB, as this is a VHDL forum, but the functionality
> > of this code should be relatively clear):

>
> Stuff deleted....
>
> Yipe.
>
> > I think converting this code to VHDL should be relatively straight
> > forward, although I do not know very well how to deal with fixed
> > point. How, for example would I convert the constant coefficients
> > above into fixed point hardware?

>
> Yes. *You can compare a fixed point number of any width to a number of
> any other width. *I would just say:
> if x < to_sfixed (0.0263665, x'high, x'low);
>
> Without knowing your "fi" function, I couldn't say more.
>
> > But my real question is whether this is the right approach. I am
> > afraid the hardware implementation of the equivalent VHDL would use a
> > TON of multipliers, and might have other problems too. How would
> > someone implement a non-linear, yet common function in hardware? Note
> > that taylor series is definitely not the answer. After a fairly in-
> > depth study the above solution is the best I could come up with. Are
> > there any other suggestions? Is there something better than my binary
> > if-tree approach?

>
> > Thanks a ton for any responses.

>
> The wonderful thing about fixed point math is that a Taylor series will
> give you a good result in a finite number of iterations. *As to
> multipliers, they work the same fixed point or signed. *Like I said, it
> depends on what "fi" does.

Sorry, the FI function is confusing. Basically, that is just a
function in Matlab that allows for fixed point data types. The code
above sets the fixed point type of all the coefficients of the lookup
table as 32 bits wide with a fractional length of 22. I did some
analysis to determine those sizes. If you are interested, there is
documentation online at:

http://www.mathworks.com/access/help...arch_dropdown/

The problem with Taylor series is that even after a large number
(hundreds) of iterations, I was still not getting the accuracy that I
was hoping for. At part of the interval I was over accurate and at
part of the interval I way off.

Tricky
Guest
Posts: n/a

 12-04-2008
On 4 Dec, 04:14, sanborne <(E-Mail Removed)> wrote:
> On Dec 3, 10:37*pm, David Bishop <(E-Mail Removed)> wrote:
>
>
>
> > sanborne wrote:
> > > I have another question related to fixed point.

>
> > > I wrote a function in MATLAB that will take an arbitrary yet well
> > > behaved function and generate an if-then-elseif tree that can
> > > approximate that function over a given interval. For example, the
> > > following is the output of my function to approximate the natural
> > > logarithm over an interval of 1e-4 to 1 (I apologize if you are
> > > unfamiliar with MATLAB, as this is a VHDL forum, but the functionality
> > > of this code should be relatively clear):

>
> > Stuff deleted....

>
> > Yipe.

>
> > > I think converting this code to VHDL should be relatively straight
> > > forward, although I do not know very well how to deal with fixed
> > > point. How, for example would I convert the constant coefficients
> > > above into fixed point hardware?

>
> > Yes. *You can compare a fixed point number of any width to a number of
> > any other width. *I would just say:
> > if x < to_sfixed (0.0263665, x'high, x'low);

>
> > Without knowing your "fi" function, I couldn't say more.

>
> > > But my real question is whether this is the right approach. I am
> > > afraid the hardware implementation of the equivalent VHDL would use a
> > > TON of multipliers, and might have other problems too. How would
> > > someone implement a non-linear, yet common function in hardware? Note
> > > that taylor series is definitely not the answer. After a fairly in-
> > > depth study the above solution is the best I could come up with. Are
> > > there any other suggestions? Is there something better than my binary
> > > if-tree approach?

>
> > > Thanks a ton for any responses.

>
> > The wonderful thing about fixed point math is that a Taylor series will
> > give you a good result in a finite number of iterations. *As to
> > multipliers, they work the same fixed point or signed. *Like I said, it
> > depends on what "fi" does.

>
> Sorry, the FI function is confusing. Basically, that is just a
> function in Matlab that allows for fixed point data types. The code
> above sets the fixed point type of all the coefficients of the lookup
> table as 32 bits wide with a fractional length of 22. I did some
> analysis to determine those sizes. If you are interested, there is
> documentation online at:
>
> http://www.mathworks.com/access/help...ixedpoint/inde...
>
> The problem with Taylor series is that even after a large number
> (hundreds) of iterations, I was still not getting the accuracy that I
> was hoping for. At part of the interval I was over accurate and at
> part of the interval I way off.

You have to remember that fixed point is very accurate, but is limited
on range. floating point will give you the range, but with the
sacrifice of accuracy at larger values.

As I just detailed in the other thread, fixed point is no different to
"normal" arithmatic. it is just all done at normal accuracy with an
implied /2^n.

The probelm you havem especially with your Y values, is that you'd
need at least a 10 bit magnitude part and at least 23 bits (as you
said this isnt accurate enough). With x being what it is you probably
also need a large number(say 32 bits), and when you start multipltying
in fixed point, a 32 bit x 32 bit number gives a 64 bit result, before
you start rounding and approximating. Most FPGA multipliers are 18 bit
(giving 36 bit outputs) and so you would have to approximate before
you put it into another multiplier.

The trick is though to identify what kind of system you are working
in. I will give you an example. Radial correction algorithm for video
(ie. warping an image to remove the curvature of a lense) involves
warping u and v co-ordinates. In reality, these U and V co-ordinates
represent a memory lookup, and so being accurate at the sub pixel
level isnt really appropriate (if you want to do bilinear
interpolation, you need accuracy to 1/16th of a pixel, but even this
isnt that accurate compared to the overall algorithm). The function is
a taylor series of even powers. Terms beyond the 4th order have very
little effect on the output at the pixel level, and so are discarded.
You then only have 2 terms to work on, and as you know you're not very
interested in sub pixel accuracy, you just keep it as accurate as
possible as long as you can, before rounding to nearest.

So the best thing you can do is analyse the system you're trying to
apply the algorithm too and see what accuracy really is important.

sanborne
Guest
Posts: n/a

 12-04-2008
On Dec 4, 9:07*am, Brian Drummond <(E-Mail Removed)>
wrote:
> On Wed, 3 Dec 2008 17:47:40 -0800 (PST), sanborne <(E-Mail Removed)>
> wrote:
>
>
>
> >I have another question related to fixed point.
> > * *if x<0.00428133
> > * * * * * *if x<0.00206914
> > * * * * * * * * * *if x<0.00143845
> > * * * * * * * * * * * * * *y = fi(825.681,T1,F)*x + fi(-7.72178,T1,F);% here 1
> > * * * * * * * * * *else
> > * * * * * * * * * * * * * *y = fi(574.007,T1,F)*x + fi(-7.35822,T1,F);
> > * * * * * * * * * *end
> > * * * * * *elseif x>0.00297635
> > * * * * * * * * * *y = fi(277.414,T1,F)*x + fi(-6..63109,T1,F);% here 4
> > * * * * * *else
> > * * * * * * * * * *y = fi(399.046,T1,F)*x + fi(-6..99465,T1,F);
> > * * * * * *end
> >But my real question is whether this is the right approach. I am
> >afraid the hardware implementation of the equivalent VHDL would use a
> >TON of multipliers, and might have other problems too.

>
> Precompute all the coefficients in the "if" statements, then supply them
> to the same small array of multipliers...
>
> You can transform the above code into this form and test it against the
> original (expect it to round differently but otherwise be equivalent).
> This transformation can be done equally well before or after conversion
> to VHDL; that's a matter of preference (I detest C, especially for
> numerical work; YMMV)
>
> > How would
> >someone implement a non-linear, yet common function in hardware? Note
> >that taylor series is definitely not the answer. After a fairly in-
> >depth study the above solution is the best I could come up with. Are
> >there any other suggestions? Is there something better than my binary
> >if-tree approach?

>
> My preference is a LUT (single BRAM used as 256*72) feeding constant,
> function of a single variable (e.g. ln) it's typically good enough for
> 24 bits in the mantissa.
>
> - Brian

Thanks, everyone for the thoughtful responses. This is very helpful.

Brian, your comment "Precompute all the coefficients in the "if"
statements, then supply them to the same small array of multipliers...
" is interesting, but I don't follow exactly what you are suggesting.
Can you tell me a little bit more about that? Also your LUT
comment...I am new to solving problems of this type, and I want to
know how a digital professional would actually approximate a non-
linear function. I suspect my method is not optimal in some ways. How
does the LUT implementation work? Is there a document somewhere that I

Tricky, thanks for your help also. I need a high amount of accuracy
below the decimal point. I am not doing image processing where sub-
pixel accuracy does not matter. I am attempting to implement an
algorithm that will generate Gaussian random numbers in hardware. I
have done an analysis where I took into account the maximum magnitude
of those numbers, and I think that a 32 bit number with 22 bits of
precision below the decimal point will suffice for me. However I did
not know that most hardware multipliers use 18 bits. I determined that
32 bits was best because I am writing software to verify my hardware
design, and 16, 32 or 64 bit numbers are much easier to deal with in
software than more arbitrary sizes. Do you think I should re-evaluate
that conclusion?

SY

Tricky
Guest
Posts: n/a

 12-04-2008
On 4 Dec, 15:28, sanborne <(E-Mail Removed)> wrote:
> On Dec 4, 9:07*am, Brian Drummond <(E-Mail Removed)>
> wrote:
>
>
>
> > On Wed, 3 Dec 2008 17:47:40 -0800 (PST), sanborne <(E-Mail Removed)>
> > wrote:

>
> > >I have another question related to fixed point.
> > > * *if x<0.00428133
> > > * * * * * *if x<0.00206914
> > > * * * * * * * * * *if x<0.00143845
> > > * * * * * * * * * * * * * *y = fi(825.681,T1,F)*x + fi(-7.72178,T1,F);% here 1
> > > * * * * * * * * * *else
> > > * * * * * * * * * * * * * *y = fi(574.007,T1,F)*x + fi(-7.35822,T1,F);
> > > * * * * * * * * * *end
> > > * * * * * *elseif x>0.00297635
> > > * * * * * * * * * *y = fi(277.414,T1,F)*x + fi(-6.63109,T1,F);% here 4
> > > * * * * * *else
> > > * * * * * * * * * *y = fi(399.046,T1,F)*x + fi(-6.99465,T1,F);
> > > * * * * * *end
> > >But my real question is whether this is the right approach. I am
> > >afraid the hardware implementation of the equivalent VHDL would use a
> > >TON of multipliers, and might have other problems too.

>
> > Precompute all the coefficients in the "if" statements, then supply them
> > to the same small array of multipliers...

>
> > You can transform the above code into this form and test it against the
> > original (expect it to round differently but otherwise be equivalent).
> > This transformation can be done equally well before or after conversion
> > to VHDL; that's a matter of preference (I detest C, especially for
> > numerical work; YMMV)

>
> > > How would
> > >someone implement a non-linear, yet common function in hardware? Note
> > >that taylor series is definitely not the answer. After a fairly in-
> > >depth study the above solution is the best I could come up with. Are
> > >there any other suggestions? Is there something better than my binary
> > >if-tree approach?

>
> > My preference is a LUT (single BRAM used as 256*72) feeding constant,
> > function of a single variable (e.g. ln) it's typically good enough for
> > 24 bits in the mantissa.

>
> > - Brian

>
> Thanks, everyone for the thoughtful responses. This is very helpful.
>
> Brian, your comment "Precompute all the coefficients in the "if"
> statements, then supply them to the same small array of multipliers...
> " is interesting, but I don't follow exactly what you are suggesting.
> Can you tell me a little bit more about that? Also your LUT
> comment...I am new to solving problems of this type, and I want to
> know how a digital professional would actually approximate a non-
> linear function. I suspect my method is not optimal in some ways. How
> does the LUT implementation work? Is there a document somewhere that I
>
> Tricky, thanks for your help also. I need a high amount of accuracy
> below the decimal point. I am not doing image processing where sub-
> pixel accuracy does not matter. I am attempting to implement an
> algorithm that will generate Gaussian random numbers in hardware. I
> have done an analysis where I took into account the maximum magnitude
> of those numbers, and I think that a 32 bit number with 22 bits of
> precision below the decimal point will suffice for me. However I did
> not know that most hardware multipliers use 18 bits. I determined that
> 32 bits was best because I am writing software to verify my hardware
> design, and 16, 32 or 64 bit numbers are much easier to deal with in
> software than more arbitrary sizes. Do you think I should re-evaluate
> that conclusion?
>
> SY

18 bits for Xilinx and Altera. In Altera devices you can specify a 36
bit multiplier (ie, 36/36 with 72 bit output) but this uses up 4x18
bit multipliers. Across the stratix 2 family (as its the datasheet I
have infront of me) the smallest device has 12x36 bit multiupliers and
the largest has 96 (but the price grows exponentially). With clever
designing these multipliers can be used iteratively rather than
pipelined, but that often takes more care to control the iterative
process, through the storing and dataflow of partial results. So in
theory you can multiply very large numbers, but it takes a larger
number of clock cycles.

But: from a bit of quick research, arnt gaussian random numbers based
on cos and sin of psudo-random numbers? these sin/cos can norally be
achieved via a LUT, with the angle PHI being the lookup address.
Obviously, the larger the table, the greater the accuracy of the
result. You would need some sort of C/matlab program to generate the
table. Again from my stratix 2 datasheet, the large internal MRAM
blocks can store 8k x 64bit words (the sin/cos result size) or 16k x
32bit (if you dont mind losing the accuracy), but this would only give
your PHI an accuracy of 13/14 bits. For a much more serious LUT, maybe
a couple of SRAMS - 36x2M (20 bit look up address) or getting more
serious move into SRAM (~ 20 bit lookup), and then theres DDR (~ 26
bit PHI value with 64 bit result) but that isnt really optimised for
fully random access.

But, its best to analyse the range of inputs to see what you can
actually constrain the result into, and then how sub accurate you want
to be.

Eg. the range 0-0.000008 could be represented in 8 bits, with each
increment being worth ~0.0000038 (2^(-25)). In binary, if you have
constrained the range, you can throw away all leading 0's, just dont