![]() |
|
|
|
#1 |
|
I'm playing around with different ways to count the number of '1' bits in a vector. I'll only be dealing with 6 bit vectors, so I could just do this with a lookup table, but I started with the most readable implementation to see how it would synthesize (copied below).
The Xilinx tools are synthesizing this into a single counter with a feedback loop, and then telling me me that the design can run at 700MHz. I'm a little skeptical - anyone know if it's reasonable to expect timing analysis to unroll a feedback loop like this? Ken ENTITY test IS GENERIC ( bits : INTEGER := 6 ); PORT ( reset : IN STD_LOGIC; clk : IN STD_LOGIC; input : IN STD_LOGIC_VECTOR((bits - 1) DOWNTO 0); ones : OUT STD_LOGIC_VECTOR(log2(bits) DOWNTO 0) ); END test; ARCHITECTURE model OF test IS SIGNAL input_l : STD_LOGIC_VECTOR(input'RANGE); SIGNAL count : STD_LOGIC_VECTOR(ones'RANGE); BEGIN -- latch inputs and outputs for timing analysis PROCESS (reset, clk) BEGIN IF (reset = '1') THEN input_l <= (OTHERS => '0'); ones <= (OTHERS => '0'); ELSIF (clk'EVENT) AND (clk = '1') THEN input_l <= input; ones <= count; END IF; END PROCESS; -- count ones PROCESS (input_l, count) BEGIN FOR i IN input_l'RANGE LOOP IF (input_l(i) = '1') THEN count <= count + 1; END IF; END LOOP; END PROCESS; END; Ken Cecka |
|
|
|
|
#2 |
|
Posts: n/a
|
Jonathan Bromley wrote:
> On Wed, 11 Mar 2009 18:39:51 GMT, Ken Cecka > <> wrote: > >>I'm playing around with different ways to count the number of '1' >> bits in a vector. I'll only be dealing with 6 bit vectors, so >> I could just do this with a lookup table, but I started with >> the most readable implementation to see how it would >> synthesize (copied below). >> >>The Xilinx tools are synthesizing this into a single >> counter with a feedback loop, and then telling me >> that the design can run at 700MHz. I'm a little >> skeptical - anyone know if it's reasonable to expect >> timing analysis to unroll a feedback loop like this? >> >>Ken > [...] >> -- count ones >> PROCESS (input_l, count) >> BEGIN >> FOR i IN input_l'RANGE LOOP >> IF (input_l(i) = '1') THEN >> count <= count + 1; >> END IF; >> END LOOP; >> END PROCESS; > > OUCH! You really, really don't want to do this! > > I suggest that your s[ck]epticism is entirely > justified - there's no way that makes any sense. > Indeed, a counter with combinational feedback > makes no sense as a design - it isn't coherent. > > Of course you can re-code the loop with a variable: > > PROCESS (input_l) > variable count: unsigned... > BEGIN > FOR i IN input_l'RANGE LOOP > IF (input_l(i) = '1') THEN > count := count + 1; > END IF; > END LOOP; > count_signal <= count; > END PROCESS; > > and now synthesis will properly unroll the for-loop > (creating a chain of adders) and there is no > combinational feedback. I'm guessing you know > a load of other ways of doing it, too. I was expecting it to unroll into a chain of adders in the first place and was surprised when it used feedback, but didn't stop think about the underlying adder implementation and whether this could even work. Thanks for the reality check! Ken Ken Cecka |
|
|
|
#3 |
|
Posts: n/a
|
On Mar 11, 1:52*pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote: > > Of course you can re-code the loop with a variable: > > * PROCESS (input_l) > * * *variable count: unsigned... > * BEGIN > * * FOR i IN input_l'RANGE LOOP > * * * IF (input_l(i) = '1') THEN > * * * * count := count + 1; > * * * END IF; > * * END LOOP; > * * count_signal <= count; > * END PROCESS; Don't forget to initialize count to zero before the loop... Andy Andy |
|
|
|
#4 |
|
Posts: n/a
|
On Mar 11, 2:39*pm, Ken Cecka <cec...@alumni.washington.edu> wrote:
> I'm playing around with different ways to count the number of '1' bits in a vector. *I'll only be dealing with 6 bit vectors, so I could just do this with a lookup table, but I started with the most readable implementation to see how it would synthesize (copied below). > > The Xilinx tools are synthesizing this into a single counter with a feedback loop, and then telling me me that the design can run at 700MHz. *I'm *a little skeptical - anyone know if it's reasonable to expect timing analysis to unroll a feedback loop like this? > > Ken > > ENTITY test IS > * GENERIC > * ( > * * bits : INTEGER := 6 > * ); > * PORT > * ( > * * reset : IN *STD_LOGIC; > * * clk * : IN *STD_LOGIC; > * * input : IN *STD_LOGIC_VECTOR((bits - 1) DOWNTO 0); > * * ones *: OUT STD_LOGIC_VECTOR(log2(bits) DOWNTO 0) > * ); > END test; > > ARCHITECTURE model OF test IS > > * SIGNAL input_l : STD_LOGIC_VECTOR(input'RANGE); > * SIGNAL count : STD_LOGIC_VECTOR(ones'RANGE); > > BEGIN > > * -- latch inputs and outputs for timing analysis > * PROCESS (reset, clk) > * BEGIN > * * IF (reset = '1') THEN > * * * input_l <= (OTHERS => '0'); > * * * ones <= (OTHERS => '0'); > * * ELSIF (clk'EVENT) AND (clk = '1') THEN > * * * input_l <= input; > * * * ones <= count; > * * END IF; > * END PROCESS; > > * -- count ones > * PROCESS (input_l, count) > * BEGIN > * * FOR i IN input_l'RANGE LOOP > * * * IF (input_l(i) = '1') THEN > * * * * count <= count + 1; > * * * END IF; > * * END LOOP; > * END PROCESS; > > END; There are also lots of nifty ways to count ones that the software guys like to use. I haven't though through them enough to say whether they merely represent a way to reclaim the inherent inefficiency of launching a big ole 32-bit ALU op (the minimum unit of work in the software pipeline) for each measly one-bit, or whether you could exploit some of the insights to make faster gate-level hardware. If you want insane speed, I'm not sure how you could beat a (possibly pipelined) log adder tree from an algorithmic viewpoint. Random google hit from "count the number of ones": http://gurmeetsingh.wordpress.com/20...ting-routines/ There is also some interesting work on fast decoding of VLC codes (MPEG video for example) using combinatorial decoding of parts of the input word to switch muxes to re-align the remainder of the input word to a successive decode stage. If you had any indication of the statistics (and were willing to play the statistics game, which is not always an option), you might be able to exploit the same idea, say for fast lookahead carry generation of halves or quarters of your input width. I realize this is a slight tangent to your original question, though... sorry - Kenn kennheinrich@sympatico.ca |
|
|
|
#5 |
|
Posts: n/a
|
wrote:
> There are also lots of nifty ways to count ones that the software guys > like to use. I haven't though through them enough to say whether they > merely represent a way to reclaim the inherent inefficiency of > launching a big ole 32-bit ALU op (the minimum unit of work in the > software pipeline) for each measly one-bit, or whether you could > exploit some of the insights to make faster gate-level hardware. If > you want insane speed, I'm not sure how you could beat a (possibly > pipelined) log adder tree from an algorithmic viewpoint. > > Random google hit from "count the number of ones": > > http://gurmeetsingh.wordpress.com/20...ting-routines/ > > > There is also some interesting work on fast decoding of VLC codes > (MPEG video for example) using combinatorial decoding of parts of the > input word to switch muxes to re-align the remainder of the input word > to a successive decode stage. If you had any indication of the > statistics (and were willing to play the statistics game, which is not > always an option), you might be able to exploit the same idea, say for > fast lookahead carry generation of halves or quarters of your input > width. > > I realize this is a slight tangent to your original question, > though... sorry > > - Kenn I spent a little time looking over some of the software tricks earlier today. They seem to be mostly geared around minimizing serial instructions. The specific case I'm addressing is counting ones in a relatively small vector where I want to do it in a single clock cycle. Interestingly, I tried coding it with a chain of adders and as a lookup table, and the Xilinx compiler faithfully implemented this in the RTL schematic (adder chain and a ROM respectively), but they both reduced to the exact same logic in the technology schematic. Ken Ken Cecka |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Sybex Seeking Feedback: What Can We Do Better? | JKellum | A+ Certification | 0 | 03-10-2008 11:21 PM |
| Sybex looking for feedback on upcoming A+ title | JKellum | A+ Certification | 0 | 04-05-2006 07:48 PM |
| I need feedback | mjohns2 | A+ Certification | 1 | 03-06-2004 03:14 PM |
| Re: Skillsoft Feedback? | natural_4u | A+ Certification | 0 | 01-22-2004 06:51 AM |
| DVD feedback for a very confused user | Jim Ford | DVD Video | 1 | 12-30-2003 06:43 PM |