Velocity Reviews > VHDL > Dividing by 48

# Dividing by 48

Dave Dean
Guest
Posts: n/a

 11-09-2006
If you are targetting an FPGA, here's a different approach that you may be
able to use, depending on your current implementation and target device:

If you have Block Memory to spare, you could implement the division as a
reasonably-sized lookup table (I'll use Xilinx's BRAMs in my example - I'm
not as familiar with the Altera (or other) equivalent, but I'm sure the
basic idea can be transferred over).

An earlier poster noted that, instead of divide by 48, it would be easier to
shift right by 4 (divide by 16), then you just have to figure out how to
divide by 3.
You have 8192 possible values of ls_pos, so you'll need to represent
8192/48=171 values for ls_rowaddr (why was 191 selected as the interger
range?). So, you'll need an 8-bit output.
ls_pos is a 13 bit vector. Dividing by 16 will leave you with 9 bits, so
your address will be 9 bits wide.

You can create a memory with the following basic port assignments:
address => ls_pos(12 downto 4); --2^13 = 8192, (12 downto 4) to divide
by 16
std_logic_vector(7 downto 0);

Simply pre-initialize the memory contents with the correct values (ie:
M[0]=x00, M[1]=x00, M[2]=x00, M[3]=x01....M[422]=x8C....M[511]=xAA). The
data returned at any address is floor(address/3). Since the address is
ls_pos/16, the final equation will work out to data=floor(ls_pos/4, which
is exactly what you want.

As for the implementation, Xilinx BlockRams have a capacity of 18Kb. You
have 2^9=512 addresses and 8-bit data, so you need a capacity of 4Kb. You
can fit the entire thing in one BlockRam.

If you are targetting an FPGA, this approach will be much faster than using
an actual multiplier or divider, and because the block Memories come free
with the chip, you'll hardly use any resources. If you're targetting an
ASIC, then I'm a bit out of my league but you might be able to use an
equivelent memory structure.

Frank Buss
Guest
Posts: n/a

 11-10-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> I have a signal (integer). How can I describe synthesizable code
> for dividing that signal by 48 ? Result (ls_rowaddr) should
> be whole-number that is integer.

This is an interesting problem. You could use my general entity
for dividing two arbitrary numbers:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity binary_division is
generic(
bits: positive := ;
port(
dividend: in unsigned(bits-1 downto 0);
divisor: in unsigned(bits-1 downto 0);
quotient: out unsigned(bits-1 downto 0);
remainder: out unsigned(bits-1 downto 0);
division_by_zero_error: out boolean);
end entity binary_division;

architecture rtl of binary_division is
begin
divide: process(dividend, divisor)
variable temp_dividend: unsigned(bits-1 downto 0);
variable temp_divisor: unsigned(bits-1 downto 0);
variable temp_quotient: unsigned(bits-1 downto 0);
variable align_count: natural range 0 to bits;
begin
-- init
temp_dividend := dividend;
temp_divisor := divisor;
temp_quotient := (others => '0');
if temp_divisor = 0 then
division_by_zero_error <= true;
quotient <= (others => '0');
remainder <= (others => '0');
else
division_by_zero_error <= false;
if temp_divisor > temp_dividend then
quotient <= (others => '0');
remainder <= dividend;
else
-- left align
align_count := 0;
for i in 1 to bits-1 loop
exit when temp_divisor(bits-1) = '1';
temp_divisor := shift_left(temp_divisor, 1);
align_count := align_count + 1;
end loop;
-- divide
for i in 0 to bits-1 loop
if temp_divisor > temp_dividend then
temp_quotient := temp_quotient(bits-2 downto 0) & '0';
else
temp_quotient := temp_quotient(bits-2 downto 0) & '1';
temp_dividend := temp_dividend - temp_divisor;
end if;
temp_divisor := shift_right(temp_divisor, 1);
exit when align_count = 0;
align_count := align_count - 1;
end loop;
quotient <= temp_quotient;
remainder <= temp_dividend;
end if;
end if;
end process;
end architecture rtl;

But this uses a lot of LUTs, 5% of a Cyclone I (about 300 logic elements)
for 8 bit and the worst case timing between input and output is 66.6 ns
(if you tweek a bit the project settings, it can be reduced to 53 ns).

For 16 bit you can use it for benchmarking synthesizer tools, but I
would not recommend it in real designs, because it uses 21% (about 1,300
logic elements) and worst timing is 122 ns. For 32 bit the Quartus II
says "Current module quartus_fit ended unexpectedly", maybe because
it needs more logic elements than available.

There are faster algorithms ( http://en.wikipedia.org/wiki/Division_(digital) ),
but if you don't need high parallel speed and if you have a clock, you
can serialize the algorithm, which shouldn't need very many logic elements.

The testbench:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity binary_division_test is
end entity binary_division_test;

architecture rtl of binary_division_test is

constant bits: positive := 8;

signal dividend: unsigned(bits-1 downto 0);
signal divisor: unsigned(bits-1 downto 0);
signal quotient: unsigned(bits-1 downto 0);
signal remainder: unsigned(bits-1 downto 0);
signal division_by_zero_error: boolean;

begin
binary_division_inst: entity binary_division
generic map(
bits => bits)
port map(
dividend => dividend,
divisor => divisor,
quotient => quotient,
remainder => remainder,
division_by_zero_error => division_by_zero_error);

test_divide: process
variable count: positive;
begin
dividend <= to_unsigned(0, bits);
divisor <= to_unsigned(0, bits);
wait for 1 ps;
count := 1;
for i in 1 to bits loop
count := 2*count;
end loop;
count := count - 1;
for i in 0 to count loop
for j in 0 to count loop
dividend <= to_unsigned(i, bits);
divisor <= to_unsigned(j, bits);
wait for 1 ps;
if j = 0 then
assert quotient = 0 report "quotient error" severity failure;
assert remainder = 0 report "remainder error" severity failure;
assert division_by_zero_error report "division_by_zero_error error" severity failure;
else
assert quotient = i / j report "quotient error" severity failure;
assert remainder = i - i / j * j report "remainder error" severity failure;
assert not division_by_zero_error report "division_by_zero_error error" severity failure;
end if;
end loop;
end loop;
wait for 1 ps;
assert false report "No failure, simulation was successful." severity failure;
end process;
end architecture rtl;

--
Frank Buss, (E-Mail Removed)
http://www.frank-buss.de, http://www.it4-systems.de

ALuPin@web.de
Guest
Posts: n/a

 11-10-2006
Hi Dave,

what about order of dividing by 16 and 3. Where would you
place the look up table: after dividing by 16 or before ?

If you first place the look up table and then
divide by 16 THEN the RAM needed would increase.
I think the way you have proposed would lead to
a rounding error again. Not sure ...

Rgds
André

..

Dave Dean
Guest
Posts: n/a

 11-10-2006
> I have a signal (integer). How can I describe synthesizable code
> for dividing that signal by 48 ? Result (ls_rowaddr) should
> be whole-number that is integer.

From the original post, it seems to me that you just want the whole number
part of the division - the equivalent of, say, the following line of C:
You're not interested in the remainder because you need the result to be the
row of ls_pos, and nothing more. Suppose you want to (inefficiently) code
this as a big if-then-else statement:
if (ls_pos < 4 then ls_rowaddr = 0;
elseif (ls_pos < 48*2) then ls_rowaddr = 1;
elseif (ls_pos < 48*3) then ls_rowaddr = 2;
elseif (ls_pos < 48*4) then ls_rowaddr = 3;
elseif (ls_pos < 48*5) then ls_rowaddr = 4;
.....
elseif (ls_pos < 48*169) then ls_rowaddr = 168;
elseif (ls_pos < 48*170) then ls_rowaddr = 169;
else ls_rowaddr = 170;

I am working on the assumption that this would result in the desired value
of ls_rowaddr. Correct me if I'm wrong.

Now, we can shift both sides by 4, and the inequality will still hold, even
if (ls_pos>>4 != ls_pos/16). Try to make up some corner cases if you think
there will be a rounding error...
if ((ls_pos>>4) < 3) then ls_rowaddr = 0;
elseif ((ls_pos>>4) < 3*2) then ls_rowaddr = 1;
elseif ((ls_pos>>4) < 3*3) then ls_rowaddr = 2;
elseif ((ls_pos>>4) < 3*4) then ls_rowaddr = 3;
elseif ((ls_pos>>4) < 3*5) then ls_rowaddr = 4;
.....
elseif ((ls_pos>>4) < 3*169) then ls_rowaddr = 168;
elseif ((ls_pos>>4) < 3*170) then ls_rowaddr = 169;
else ls_rowaddr = 170;

The worst-case for a rounding error would occur at the end of the range of
ls_pos. Try a few values...
ls_pos = 48*169 = 8112.
With full precision, we'd expect ls_rowaddr=int(8112/4=169.
8112>>4 = 507
(507 > 3*169) && (507 < 3*170), so ls_rowaddr = 169.

ls_pos = 48*169+47 = 8159:
With full precision, we'd expect ls_rowaddr =
int(8159/4=int(169.97916)=169.
8159>>4 = 509.
(509 > 3*169) && (509 < 170), so ls_rowaddr = 169.

You're going to want to divide by 16 before the RAM, or, as you noted, the
RAM will increase in size. There won't really be any "rounding errors"
because you're not really performing any division - you're simply figuring
out which ls_rowaddr a value of ls_pos will map to. If you want to be sure,
run all 8192 possible values of ls_pos through a spreadsheet or VHDL
simulator, and be sure that all the resulting values of ls_rowaddr are
correct.

Dave Dean
Guest
Posts: n/a

 11-10-2006
My typo - the inequality below should have read:

(507 >= 3*169) && (507 < 3*170), so ls_rowaddr = 169.

"Dave Dean" <(E-Mail Removed)> wrote in message
news:ej2fu2\$(E-Mail Removed)...
>> I have a signal (integer). How can I describe synthesizable code
>> for dividing that signal by 48 ? Result (ls_rowaddr) should
>> be whole-number that is integer.

>
> From the original post, it seems to me that you just want the whole number
> part of the division - the equivalent of, say, the following line of C:
> ls_rowaddr = int(ls_pos/4
> You're not interested in the remainder because you need the result to be
> the row of ls_pos, and nothing more. Suppose you want to (inefficiently)
> code this as a big if-then-else statement:
> if (ls_pos < 4 then ls_rowaddr = 0;
> elseif (ls_pos < 48*2) then ls_rowaddr = 1;
> elseif (ls_pos < 48*3) then ls_rowaddr = 2;
> elseif (ls_pos < 48*4) then ls_rowaddr = 3;
> elseif (ls_pos < 48*5) then ls_rowaddr = 4;
> ....
> elseif (ls_pos < 48*169) then ls_rowaddr = 168;
> elseif (ls_pos < 48*170) then ls_rowaddr = 169;
> else ls_rowaddr = 170;
>
> I am working on the assumption that this would result in the desired value
> of ls_rowaddr. Correct me if I'm wrong.
>
> Now, we can shift both sides by 4, and the inequality will still hold,
> even if (ls_pos>>4 != ls_pos/16). Try to make up some corner cases if you
> think there will be a rounding error...
> if ((ls_pos>>4) < 3) then ls_rowaddr = 0;
> elseif ((ls_pos>>4) < 3*2) then ls_rowaddr = 1;
> elseif ((ls_pos>>4) < 3*3) then ls_rowaddr = 2;
> elseif ((ls_pos>>4) < 3*4) then ls_rowaddr = 3;
> elseif ((ls_pos>>4) < 3*5) then ls_rowaddr = 4;
> ....
> elseif ((ls_pos>>4) < 3*169) then ls_rowaddr = 168;
> elseif ((ls_pos>>4) < 3*170) then ls_rowaddr = 169;
> else ls_rowaddr = 170;
>
> The worst-case for a rounding error would occur at the end of the range of
> ls_pos. Try a few values...
> ls_pos = 48*169 = 8112.
> With full precision, we'd expect ls_rowaddr=int(8112/4=169.
> 8112>>4 = 507
> (507 > 3*169) && (507 < 3*170), so ls_rowaddr = 169.
>
> ls_pos = 48*169+47 = 8159:
> With full precision, we'd expect ls_rowaddr =
> int(8159/4=int(169.97916)=169.
> 8159>>4 = 509.
> (509 > 3*169) && (509 < 170), so ls_rowaddr = 169.
>
> You're going to want to divide by 16 before the RAM, or, as you noted, the
> RAM will increase in size. There won't really be any "rounding errors"
> because you're not really performing any division - you're simply figuring
> out which ls_rowaddr a value of ls_pos will map to. If you want to be
> sure, run all 8192 possible values of ls_pos through a spreadsheet or VHDL
> simulator, and be sure that all the resulting values of ls_rowaddr are
> correct.
>
>
>
>

Frank Buss
Guest
Posts: n/a

 11-12-2006
Frank Buss wrote:

> library IEEE;
> use IEEE.STD_LOGIC_1164.ALL;
> use IEEE.NUMERIC_STD.ALL;
>
> entity binary_division is
> generic(
> bits: positive := ;
> port(
> dividend: in unsigned(bits-1 downto 0);
> divisor: in unsigned(bits-1 downto 0);
> quotient: out unsigned(bits-1 downto 0);
> remainder: out unsigned(bits-1 downto 0);
> division_by_zero_error: out boolean);
> end entity binary_division;

BTW: If you don't use a port for the divisor, but define

constant divisor: unsigned(bits-1 downto 0) := to_unsigned(48, bits);

with bits=13, Quartus can optimize it down to 94 logic elements, which is
2% for my Cyclone and timing analyzer says, worst-case input to output time
is 28.1 ns, so it could be clocked with about 35 MHz.

--
Frank Buss, (E-Mail Removed)
http://www.frank-buss.de, http://www.it4-systems.de

ALuPin@web.de
Guest
Posts: n/a

 11-13-2006
Hi Dave,

>From the original post, it seems to me that you just want the whole number
>part of the division - the equivalent of, say, the following line of C:
> ls_rowaddr = int(ls_pos/4

Yes, correct!

Trying different things out ...

Rgds
André