Velocity Reviews > VHDL > divider algorithm

# divider algorithm

thunder
Guest
Posts: n/a

 05-25-2012
Hello

I am currently working on a block where i need to divide the 16-bit
byte address by 5 to generate the 5-byte aligned address for the
internal RAM.

I started off by using an actual divider (ie five_byte_ram_addr <=
byte_ram_adder /5). However, this is expensive in terms of area and
does not meet timing.

Can anyone point me to some algorithms which detail how to approximate
the divider functionality (and in particular in this instance an
alogorithm which alllows approximation of divide by 5)?

JO

Gabor
Guest
Posts: n/a

 05-25-2012
thunder wrote:
> Hello
>
> I am currently working on a block where i need to divide the 16-bit
> byte address by 5 to generate the 5-byte aligned address for the
> internal RAM.
>
> I started off by using an actual divider (ie five_byte_ram_addr <=
> byte_ram_adder /5). However, this is expensive in terms of area and
> does not meet timing.
>
> Can anyone point me to some algorithms which detail how to approximate
> the divider functionality (and in particular in this instance an
> alogorithm which alllows approximation of divide by 5)?
>
>
> JO

For dividing by a constant, the simplest method is to
multiply by the constant's inverse. If you're using an
FPGA that has a spare 18 x 18 multiplier you could just
multiply by 2^16 / 5 and then shift the result right 16.
Otherwise you'd need to code the multiplier using adders
where your constant has 50% 1 bits so the size of this
adder tree would be half that of a general purpose
multiplier for the same number of bits.

-- Gabor

Allan Herriman
Guest
Posts: n/a

 05-25-2012
On Fri, 25 May 2012 05:26:01 -0700, thunder wrote:

> Hello
>
> I am currently working on a block where i need to divide the 16-bit byte
> address by 5 to generate the 5-byte aligned address for the internal
> RAM.
>
> I started off by using an actual divider (ie five_byte_ram_addr <=
> byte_ram_adder /5). However, this is expensive in terms of area and does
> not meet timing.
>
> Can anyone point me to some algorithms which detail how to approximate
> the divider functionality (and in particular in this instance an
> alogorithm which alllows approximation of divide by 5)?
>
>
> JO

Division by 5 is equivalent to multiplication by 1/5.

1/5 in binary is 0.001100110011001100 and so on.

There's a very obvious pattern in that binary number, which suggests a

Take your number N and multiply it by binary 11 (i.e. add it to itself
shifted left by one). This will give the "11" part of the pattern.

Take your N x 11 and multiply that by 10001 (i.e. add it to itself
shifted right by 4 bits).
Now we have N x 110011

Then multiply that by 1000000001
Now we have N x 11001100110011

Then multiply that by 10000000000000001
Now we have N x 110011001100110011001100110011

etc.

Stop when you get your desired accuracy. I suspect 3 or 4 two-input
adders will be needed for your 16 bit argument, but I'm just guessing and
you should verify this for yourself.

Take the appropriate slice of the output (to position the binary point at
the right place).

Make sure your testbench covers the full input space. (I hope we learned
something from the Pentium fdiv bug.)

Regards,
Allan

Brian Davis
Guest
Posts: n/a

 05-25-2012
> Can anyone point me to some algorithms which detail how to approximate
> the divider functionality (and in particular in this instance an
> alogorithm which alllows approximation of divide by 5)?

The online "INTEGER DIVISION BY CONSTANTS" addendum to
"Hacker's Delight" has some relevant material :

www.hackersdelight.org/divcMore.pdf

Brian

On May 25, 8:26*am, thunder <(E-Mail Removed)> wrote:
> Hello
>
> I am currently working on a block where i need to divide the 16-bit
> byte address by 5 to generate the 5-byte aligned address for the
> internal RAM.
>
> I started off by using an actual divider (ie five_byte_ram_addr <=
> byte_ram_adder /5). However, this is expensive in terms of area and
> does not meet timing.
>
> Can anyone point me to some algorithms which detail how to approximate
> the divider functionality (and in particular in this instance an
> alogorithm which alllows approximation of divide by 5)?
>
>
> JO

Guest
Posts: n/a

 05-25-2012
On Fri, 25 May 2012 05:26:01 -0700 (PDT)
thunder <(E-Mail Removed)> wrote:

> Hello
>
> I am currently working on a block where i need to divide the 16-bit
> byte address by 5 to generate the 5-byte aligned address for the
> internal RAM.
>
> I started off by using an actual divider (ie five_byte_ram_addr <=
> byte_ram_adder /5). However, this is expensive in terms of area and
> does not meet timing.
>
> Can anyone point me to some algorithms which detail how to approximate
> the divider functionality (and in particular in this instance an
> alogorithm which alllows approximation of divide by 5)?
>
>
> JO

a different one out. Don't need your ram addresses to be divided by
5. Re-architect something upstream in order to make it to where your
ram addresses get divided by 8 instead, and you just waste 3 bytes of
every 8 in the address space. Address space is usually pretty cheap,
and you can often get away without even having any RAM to back up the
gaps, just constants.

I say this because, in a very similar situation, I found myself trying
to implement a /24 to translate bus addresses into local addresses into
a very wide RAM. Making the math work was easy, getting the math to
integrate with the rest of the system without blowing my timing budget
wasted several days to no avail, before I finally gave up and just
padded out the space. Padding the space out took me 30 minutes of
changing the spec, and 30 minutes of coding, worked the first time, and
that was that.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.

MikeWhy
Guest
Posts: n/a

 05-25-2012
"thunder" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hello
>
> I am currently working on a block where i need to divide the 16-bit
> byte address by 5 to generate the 5-byte aligned address for the
> internal RAM.
>
> I started off by using an actual divider (ie five_byte_ram_addr <=
> byte_ram_adder /5). However, this is expensive in terms of area and
> does not meet timing.
>
> Can anyone point me to some algorithms which detail how to approximate
> the divider functionality (and in particular in this instance an
> alogorithm which alllows approximation of divide by 5)?

Divide by '1' beats everything.

"Internal RAM" could be almost anything. Make it 40-bits wide if you can.
Could also stitch 8-bit bram to a 32-bit bram.

Guest
Posts: n/a

 06-05-2012
Hi thunder

checkout my constant division generator. You can use it as a base for a VHDL implementation. It produces optimized C or generic assembly code for a given constant divisor.

http://sourceforge.net/projects/kdiv/

Best regards,

Mister_J
Junior Member
Join Date: Jun 2012
Posts: 8

 06-29-2012
I think that the solution from Rob Gaddi is the best one, but if you want to do a division, here is
how you can do :

You could use the new fixed point type from fixed_pkg and use the function reciprocal to calculate
1/5. If you use this function with a non constant then it won't synthetize well because it will do
the whole calculation in 1 clock cycle.

If we want to calculate 1/x where x is not a constant, we can use this ideal relation :
y = 1.0/x => x * y = 1.0 = x * y_m + x * y_(m-1) + ... + x * y_(n+1) + x * y_n = Mult
where y_k is either 2**k, either 0

In the reality, Mult <= 1 because sometimes the division is not exact but rounded to low.

The goal here will be to make Mult the closest possible to 1.0. First we calculate Mult = x * y_m,
then Mult = x * y_m + x * y_(m-1), then Mult = x * y_m + x * y_(m-1) + x * y_(m-2) and so on until
k=n. At each iteration, we will try to set y_k = 2**k, but if we see that Mult would become higher
than 1.0, then we will set y_k to 0. Then at each iteration, Mult will come closer to 1.0 if y_k =
2**k or stay with its current value if y_k = 0.

Here is how can be declared x and y in VHDL :
signal x : UFixed (a downto b); -- for example a = 11, b = -4
signal y : UFixed (m downto n); -- m = -b-1 = 3; n = -a-1 = -12 => (3 downto -12)

Here is some pseudo VHDL code :

First iteration :

Code:
```Mult_Next_Tmp := x * 2**m; -- We try to set y_m to 2**m
Mult_Next := Mult_Next_Tmp when Mult_Next_Tmp <= 1.0 else
0;
y(m) <= '1' when Mult_Next_Tmp <= 1.0 else
'0';```
Second iteration :

Code:
```Mult_Next_Tmp := Mult + x * 2**(m-1); -- We try to set y_(m-1) to 2**(m-1)
Mult_Next := Mult_Next_Tmp when Mult_Next_Tmp <= 1.0 else
Mult;
y(m-1) <= '1' when Mult_Next_Tmp <= 1.0 else
'0';```
Third iteration :

Code:
```Mult_Next_Tmp := Mult + x * 2**(m-2); -- We try to set y_(m-2) to 2**(m-2)
Mult_Next := Mult_Next_Tmp when Mult_Next_Tmp <= 1.0 else
Mult;
y(m-2) <= '1' when Mult_Next_Tmp <= 1.0 else
'0';```
Last iterations :

(...)

Jonas

Last edited by Mister_J; 06-29-2012 at 12:37 PM..