Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > VHDL > Re: Count bits in VHDL, with loop and unrolled loop producesdifferent results

Reply
Thread Tools

Re: Count bits in VHDL, with loop and unrolled loop producesdifferent results

 
 
a s
Guest
Posts: n/a
 
      03-02-2011
On Mar 2, 5:52*pm, Gabor <(E-Mail Removed)> wrote:
> I didn't catch which device you are targeting, but I
> decided to try this myself with XST and Spartan 3A,
> using Verilog to see if there are any significant
> differences in synthesis performance.


I am targeting Virtex4FX.

> Here's the code:
> module count_bits
> #(
> * parameter IN_WIDTH = 32,
> * parameter OUT_WIDTH = 6
> )
> (
> * input wire *[IN_WIDTH-1:0] *data_in,
> * output reg [OUT_WIDTH-1:0] *data_out
> );
>
> always @*
> begin : proc
> * integer i;
> * integer sum;
> * sum = 0;
> * for (i = 0;i < IN_WIDTH;i = i + 1) sum = sum + data_in[i];
> * data_out = sum;
> end
>
> endmodule
>
> And the results for the 32-bit case (XST)
>
> Number of Slices: * * * * * * * * * * * 41 *out of * 1792 * * 2% *
> Number of 4 input LUTs: * * * * * * * * 73 *out of * 3584 * * 2% *
>
> which is very close to your original unrolled result.


I get the same results with XST targeting V4.

But that's really interesting how XST produces better results
with Verilog than with VHDL for basically exactly the same input.

Running your module through Synopsys results again
in seemingly "optimum" 57LUTs and 34 slices.

I find it pretty amusing how many options did we come up already
with such a "basic" problem as is counting ones in a word.

Regards
 
Reply With Quote
 
 
 
 
glen herrmannsfeldt
Guest
Posts: n/a
 
      03-02-2011
In comp.arch.fpga a s <(E-Mail Removed)> wrote:
(snip)

> Running your module through Synopsys results again
> in seemingly "optimum" 57LUTs and 34 slices.


One should probably also compare propagation delay in addition
to the number of LUTs or slices used. I don't believe it is
large, but there is some tradeoff between the two. Worst
delay would be (N-1) consecutive adders, increasing in width
down the line.

> I find it pretty amusing how many options did we come up already
> with such a "basic" problem as is counting ones in a word.


-- glen
 
Reply With Quote
 
 
 
 
JustJohn
Guest
Posts: n/a
 
      03-04-2011
On Mar 2, 12:38*pm, a s <(E-Mail Removed)> wrote:
> On Mar 2, 5:52*pm, Gabor <(E-Mail Removed)> wrote:
>
> > I didn't catch which device you are targeting, but I
> > decided to try this myself with XST and Spartan 3A,
> > using Verilog to see if there are any significant
> > differences in synthesis performance.

>
> I am targeting Virtex4FX.
>
>
>
>
>
> > Here's the code:
> > module count_bits
> > #(
> > * parameter IN_WIDTH = 32,
> > * parameter OUT_WIDTH = 6
> > )
> > (
> > * input wire *[IN_WIDTH-1:0] *data_in,
> > * output reg [OUT_WIDTH-1:0] *data_out
> > );

>
> > always @*
> > begin : proc
> > * integer i;
> > * integer sum;
> > * sum = 0;
> > * for (i = 0;i < IN_WIDTH;i = i + 1) sum = sum + data_in[i];
> > * data_out = sum;
> > end

>
> > endmodule

>
> > And the results for the 32-bit case (XST)

>
> > Number of Slices: * * * * * * * * * * * 41 *outof * 1792 * * 2% *
> > Number of 4 input LUTs: * * * * * * * * 73 *out of * 3584 * * 2% *

>
> > which is very close to your original unrolled result.

>
> I get the same results with XST targeting V4.
>
> But that's really interesting how XST produces better results
> with Verilog than with VHDL for basically exactly the same input.
>
> Running your module through Synopsys results again
> in seemingly "optimum" 57LUTs and 34 slices.
>
> I find it pretty amusing how many options did we come up already
> with such a "basic" problem as is counting ones in a word.
>
> Regards- Hide quoted text -
>
> - Show quoted text -


Eight years ago (Sept/Oct 2003), we went through this exercise in the
thread "Counting Ones" (I was posting as JustJohn back then, not
John_H). See that thread for some ASCII art of the trees. I ended up
with the following VHDL function that produces "optimum" 55 4-input
LUTs for 32-bit vector input. I haven't seen anything better yet. I
liked Andy's recursion suggestion, it'll take some thought to figure
out how to auto-distribute the carry-in bits to the adders.

Yesterday, Gabor posted 35 6-input LUTs.
Gabor, what code did you use?
I think a nice challenge to the C.A.F. group mind is to beat that.

John L. Smith

-- This function counts bits = '1' in a 32-bit word, using a tree
-- structure with Full Adders at leafs for "minimum" logic
utilization.
function vec32_sum2( in_vec : in UNSIGNED ) return UNSIGNED is
type FA_Arr_Type is array ( 0 to 9 ) of UNSIGNED( 1 downto
0 );
variable FA_Array : FA_Arr_Type;
variable result : UNSIGNED( 5 downto 0 );
variable Leaf_Bits : UNSIGNED( 2 downto 0 );
variable Sum3_1 : UNSIGNED( 2 downto 0 );
variable Sum3_2 : UNSIGNED( 2 downto 0 );
variable Sum3_3 : UNSIGNED( 2 downto 0 );
variable Sum3_4 : UNSIGNED( 2 downto 0 );
variable Sum3_5 : UNSIGNED( 2 downto 0 );
variable Sum4_1 : UNSIGNED( 3 downto 0 );
variable Sum4_2 : UNSIGNED( 3 downto 0 );
variable Sum5_1 : UNSIGNED( 4 downto 0 );
begin
for i in 0 to 9 loop
Leaf_Bits := in_vec( 3 * i + 2 downto 3 * i );
case Leaf_Bits is
when "000" => FA_Array( i ) := "00";
when "001" => FA_Array( i ) := "01";
when "010" => FA_Array( i ) := "01";
when "011" => FA_Array( i ) := "10";
when "100" => FA_Array( i ) := "01";
when "101" => FA_Array( i ) := "10";
when "110" => FA_Array( i ) := "10";
when others => FA_Array( i ) := "11";
end case;
end loop;
Sum3_1 := ( "0" & FA_Array( 0 ) ) + ( "0" & FA_Array( 1 ) );
Sum3_2 := ( "0" & FA_Array( 2 ) ) + ( "0" & FA_Array( 3 ) );
Sum3_3 := ( "0" & FA_Array( 4 ) ) + ( "0" & FA_Array( 5 ) );
Sum3_4 := ( "0" & FA_Array( 6 ) ) + ( "0" & FA_Array( 7 ) )
+ ( "00" & in_vec( 30 ) );
Sum3_5 := ( "0" & FA_Array( 8 ) ) + ( "0" & FA_Array( 9 ) )
+ ( "00" & in_vec( 31 ) );
Sum4_1 := ( "0" & Sum3_1 ) + ( "0" & Sum3_2 );
Sum4_2 := ( "0" & Sum3_3 ) + ( "0" & Sum3_4 );
Sum5_1 := ( "0" & Sum4_1 ) + ( "0" & Sum4_2 );
result := ( "0" & Sum5_1 )
+ ( "000" & Sum3_5 );
return result;
end vec32_sum2;
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Triple nested loop python (While loop insde of for loop inside ofwhile loop) Isaac Won Python 9 03-04-2013 10:08 AM
Count bits in VHDL, with loop and unrolled loop produces different results a s VHDL 16 03-08-2011 05:35 PM
Re: Count bits in VHDL, with loop and unrolled loop producesdifferent results Gabor Sz VHDL 0 03-05-2011 03:28 AM
count number of elements in an array and make it 12 bits and take 4 at a time and convert into decimal hara Perl Misc 4 05-25-2006 08:30 AM
8-Bits vs 12 or 16 bits/pixel; When does more than 8 bits count ? Al Dykes Digital Photography 3 12-29-2003 07:08 PM



Advertisments