On Mar 2, 12:38*pm, a s <(E-Mail Removed)> wrote:

> On Mar 2, 5:52*pm, Gabor <(E-Mail Removed)> wrote:

>

> > I didn't catch which device you are targeting, but I

> > decided to try this myself with XST and Spartan 3A,

> > using Verilog to see if there are any significant

> > differences in synthesis performance.

>

> I am targeting Virtex4FX.

>

>

>

>

>

> > Here's the code:

> > module count_bits

> > #(

> > * parameter IN_WIDTH = 32,

> > * parameter OUT_WIDTH = 6

> > )

> > (

> > * input wire *[IN_WIDTH-1:0] *data_in,

> > * output reg [OUT_WIDTH-1:0] *data_out

> > );

>

> > always @*

> > begin : proc

> > * integer i;

> > * integer sum;

> > * sum = 0;

> > * for (i = 0;i < IN_WIDTH;i = i + 1) sum = sum + data_in[i];

> > * data_out = sum;

> > end

>

> > endmodule

>

> > And the results for the 32-bit case (XST)

>

> > Number of Slices: * * * * * * * * * * * 41 *outof * 1792 * * 2% *

> > Number of 4 input LUTs: * * * * * * * * 73 *out of * 3584 * * 2% *

>

> > which is very close to your original unrolled result.

>

> I get the same results with XST targeting V4.

>

> But that's really interesting how XST produces better results

> with Verilog than with VHDL for basically exactly the same input.

>

> Running your module through Synopsys results again

> in seemingly "optimum" 57LUTs and 34 slices.

>

> I find it pretty amusing how many options did we come up already

> with such a "basic" problem as is counting ones in a word.

>

> Regards- Hide quoted text -

>

> - Show quoted text -
Eight years ago (Sept/Oct 2003), we went through this exercise in the

thread "Counting Ones" (I was posting as JustJohn back then, not

John_H). See that thread for some ASCII art of the trees. I ended up

with the following VHDL function that produces "optimum" 55 4-input

LUTs for 32-bit vector input. I haven't seen anything better yet. I

liked Andy's recursion suggestion, it'll take some thought to figure

out how to auto-distribute the carry-in bits to the adders.

Yesterday, Gabor posted 35 6-input LUTs.

Gabor, what code did you use?

I think a nice challenge to the C.A.F. group mind is to beat that.

John L. Smith

-- This function counts bits = '1' in a 32-bit word, using a tree

-- structure with Full Adders at leafs for "minimum" logic

utilization.

function vec32_sum2( in_vec : in UNSIGNED ) return UNSIGNED is

type FA_Arr_Type is array ( 0 to 9 ) of UNSIGNED( 1 downto

0 );

variable FA_Array : FA_Arr_Type;

variable result : UNSIGNED( 5 downto 0 );

variable Leaf_Bits : UNSIGNED( 2 downto 0 );

variable Sum3_1 : UNSIGNED( 2 downto 0 );

variable Sum3_2 : UNSIGNED( 2 downto 0 );

variable Sum3_3 : UNSIGNED( 2 downto 0 );

variable Sum3_4 : UNSIGNED( 2 downto 0 );

variable Sum3_5 : UNSIGNED( 2 downto 0 );

variable Sum4_1 : UNSIGNED( 3 downto 0 );

variable Sum4_2 : UNSIGNED( 3 downto 0 );

variable Sum5_1 : UNSIGNED( 4 downto 0 );

begin

for i in 0 to 9 loop

Leaf_Bits := in_vec( 3 * i + 2 downto 3 * i );

case Leaf_Bits is

when "000" => FA_Array( i ) := "00";

when "001" => FA_Array( i ) := "01";

when "010" => FA_Array( i ) := "01";

when "011" => FA_Array( i ) := "10";

when "100" => FA_Array( i ) := "01";

when "101" => FA_Array( i ) := "10";

when "110" => FA_Array( i ) := "10";

when others => FA_Array( i ) := "11";

end case;

end loop;

Sum3_1 := ( "0" & FA_Array( 0 ) ) + ( "0" & FA_Array( 1 ) );

Sum3_2 := ( "0" & FA_Array( 2 ) ) + ( "0" & FA_Array( 3 ) );

Sum3_3 := ( "0" & FA_Array( 4 ) ) + ( "0" & FA_Array( 5 ) );

Sum3_4 := ( "0" & FA_Array( 6 ) ) + ( "0" & FA_Array( 7 ) )

+ ( "00" & in_vec( 30 ) );

Sum3_5 := ( "0" & FA_Array( 8 ) ) + ( "0" & FA_Array( 9 ) )

+ ( "00" & in_vec( 31 ) );

Sum4_1 := ( "0" & Sum3_1 ) + ( "0" & Sum3_2 );

Sum4_2 := ( "0" & Sum3_3 ) + ( "0" & Sum3_4 );

Sum5_1 := ( "0" & Sum4_1 ) + ( "0" & Sum4_2 );

result := ( "0" & Sum5_1 )

+ ( "000" & Sum3_5 );

return result;

end vec32_sum2;