On Mar 2, 12:38*pm, a s <nospa...@gmail.com> wrote:
> On Mar 2, 5:52*pm, Gabor <ga...@alacron.com> wrote:
>
> > I didn't catch which device you are targeting, but I
> > decided to try this myself with XST and Spartan 3A,
> > using Verilog to see if there are any significant
> > differences in synthesis performance.
>
> I am targeting Virtex4FX.
>
>
>
>
>
> > Here's the code:
> > module count_bits
> > #(
> > * parameter IN_WIDTH = 32,
> > * parameter OUT_WIDTH = 6
> > )
> > (
> > * input wire *[IN_WIDTH-1:0] *data_in,
> > * output reg [OUT_WIDTH-1:0] *data_out
> > );
>
> > always @*
> > begin : proc
> > * integer i;
> > * integer sum;
> > * sum = 0;
> > * for (i = 0;i < IN_WIDTH;i = i + 1) sum = sum + data_in[i];
> > * data_out = sum;
> > end
>
> > endmodule
>
> > And the results for the 32-bit case (XST)
>
> > Number of Slices: * * * * * * * * * * * 41 *outof * 1792 * * 2% *
> > Number of 4 input LUTs: * * * * * * * * 73 *out of * 3584 * * 2% *
>
> > which is very close to your original unrolled result.
>
> I get the same results with XST targeting V4.
>
> But that's really interesting how XST produces better results
> with Verilog than with VHDL for basically exactly the same input.
>
> Running your module through Synopsys results again
> in seemingly "optimum" 57LUTs and 34 slices.
>
> I find it pretty amusing how many options did we come up already
> with such a "basic" problem as is counting ones in a word. 
>
> Regards- Hide quoted text -
>
> - Show quoted text -
Eight years ago (Sept/Oct 2003), we went through this exercise in the
thread "Counting Ones" (I was posting as JustJohn back then, not
John_H). See that thread for some ASCII art of the trees. I ended up
with the following VHDL function that produces "optimum" 55 4-input
LUTs for 32-bit vector input. I haven't seen anything better yet. I
liked Andy's recursion suggestion, it'll take some thought to figure
out how to auto-distribute the carry-in bits to the adders.
Yesterday, Gabor posted 35 6-input LUTs.
Gabor, what code did you use?
I think a nice challenge to the C.A.F. group mind is to beat that.
John L. Smith
-- This function counts bits = '1' in a 32-bit word, using a tree
-- structure with Full Adders at leafs for "minimum" logic
utilization.
function vec32_sum2( in_vec : in UNSIGNED ) return UNSIGNED is
type FA_Arr_Type is array ( 0 to 9 ) of UNSIGNED( 1 downto
0 );
variable FA_Array : FA_Arr_Type;
variable result : UNSIGNED( 5 downto 0 );
variable Leaf_Bits : UNSIGNED( 2 downto 0 );
variable Sum3_1 : UNSIGNED( 2 downto 0 );
variable Sum3_2 : UNSIGNED( 2 downto 0 );
variable Sum3_3 : UNSIGNED( 2 downto 0 );
variable Sum3_4 : UNSIGNED( 2 downto 0 );
variable Sum3_5 : UNSIGNED( 2 downto 0 );
variable Sum4_1 : UNSIGNED( 3 downto 0 );
variable Sum4_2 : UNSIGNED( 3 downto 0 );
variable Sum5_1 : UNSIGNED( 4 downto 0 );
begin
for i in 0 to 9 loop
Leaf_Bits := in_vec( 3 * i + 2 downto 3 * i );
case Leaf_Bits is
when "000" => FA_Array( i ) := "00";
when "001" => FA_Array( i ) := "01";
when "010" => FA_Array( i ) := "01";
when "011" => FA_Array( i ) := "10";
when "100" => FA_Array( i ) := "01";
when "101" => FA_Array( i ) := "10";
when "110" => FA_Array( i ) := "10";
when others => FA_Array( i ) := "11";
end case;
end loop;
Sum3_1 := ( "0" & FA_Array( 0 ) ) + ( "0" & FA_Array( 1 ) );
Sum3_2 := ( "0" & FA_Array( 2 ) ) + ( "0" & FA_Array( 3 ) );
Sum3_3 := ( "0" & FA_Array( 4 ) ) + ( "0" & FA_Array( 5 ) );
Sum3_4 := ( "0" & FA_Array( 6 ) ) + ( "0" & FA_Array( 7 ) )
+ ( "00" & in_vec( 30 ) );
Sum3_5 := ( "0" & FA_Array( 8 ) ) + ( "0" & FA_Array( 9 ) )
+ ( "00" & in_vec( 31 ) );
Sum4_1 := ( "0" & Sum3_1 ) + ( "0" & Sum3_2 );
Sum4_2 := ( "0" & Sum3_3 ) + ( "0" & Sum3_4 );
Sum5_1 := ( "0" & Sum4_1 ) + ( "0" & Sum4_2 );
result := ( "0" & Sum5_1 )
+ ( "000" & Sum3_5 );
return result;
end vec32_sum2;