![]() |
|
|
|||||||
![]() |
VHDL - Latches in pipeline design and numeric logic |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Hi All,
I'm working on a pipeline design (for Xilinx Virtex) and I'd like to get some opinions on whether it's ok to use a latch inside a combinatoral process squeezed between two sequential processes that hold the pipeline registers. As the input to the latch is registered, and the output from the logic containing the latch isn't used until it's been registered, is this a 'good' latch design? What other issues should be considered? The same pipeline design without inferring a latch is over 50Mhz slower so I'm keen to hitch my trailer to the 'latches can be great when they are not inferred by mistake' wagon train A simplified example is shown below, COMB_B process contains the latch and does some trivial calculation. Whereas COMB_A does the same calculation, but without a latch, and a lot slower. Or is there is a way to make XST avoid inferring two lots of (a_reg - b_reg) logic in COMB_A? It's this logic that causes the 50Mhz hit and why I used a latch COMB_B. Thanks, Andy. library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity test_latch is port (clk : in std_logic; rst : in std_logic; d_in : in std_logic; a_in : in std_logic_vector(31 downto 0); b_in : in std_logic_vector(31 downto 0); x_in : in std_logic_vector(31 downto 0); r_out : out std_logic_vector(31 downto 0) ); end test_latch; architecture Behavioral of test_latch is signal a_reg,b_reg,x_reg : std_logic_vector(31 downto 0); signal d_reg : std_logic; signal latch : std_logic_vector(31 downto 0); signal r : std_logic_vector(31 downto 0); begin SEQ1 : process (clk,rst) is begin if (rst ='1') then a_reg <= (others => '0'); b_reg <= (others => '0'); x_reg <= (others => '0'); d_reg <= '0'; elsif (rising_edge(clk)) then if (d_reg = '1') then a_reg <= a_in; b_reg <= b_in; x_reg <= x_in; d_reg <= d_in; end if; end if; end process; -- COMB_A : process (d_reg,x_reg,a_reg,b_reg) is -- begin -- if (d_reg = '1') then -- if (signed(x_reg) > (signed(a_reg) - signed(b_reg))) then -- r <= x_reg; -- else -- r <= std_logic_vector(signed(a_reg) - signed(b_reg)); -- end if; -- else -- r <= (others => '0'); -- end if; -- end process; COMB_B : process (x_reg,a_reg,b_reg,d_reg) is begin if (d_reg = '1') then latch <= std_logic_vector(signed(a_reg) - signed(b_reg)); if (signed(x_reg) > signed(latch)) then r <= x_reg; else r <= latch; end if; else r <= (others => '0'); -- dont specify value for 'latch' as that's what -- we want to infer end if; end process; SEQ2 : process (clk,rst) is begin if (rst ='1') then elsif (rising_edge(clk)) then r_out <= r; end if; end process; end Behavioral; andyesquire@hotmail.com |
|
|
|
|
#2 |
|
Posts: n/a
|
I dont understand how you can better your timings by putting a latch
inbetween and its bad for timing analysis also.The preferred way is to go for pipelining. Neo |
|
|
|
#3 |
|
Posts: n/a
|
wrote:
> A simplified example is shown below, COMB_B process contains the latch > and does some trivial calculation. Whereas COMB_A does the same > calculation, but without a latch, and a lot slower. Signal 'latch' is missing in the sensitivity list. This could cause differences in simulation results (RTL versus synthesis). > Or is there is a way to make XST avoid inferring two lots of (a_reg - > b_reg) logic in COMB_A? It's this logic that causes the 50Mhz hit and > why I used a latch COMB_B. Just make it a signal, assigned by a concurrent signal assignment: latch <= std_logic_vector(signed(a_reg) - signed(b_reg)); COMB_B : process (x_reg,a_reg,b_reg,d_reg) is begin if d_reg = '1' then ... Or make it completely local (my preference) by assigning the common expression to a variable: COMB_B : process (x_reg,a_reg,b_reg,d_reg) is variable a_min_b : std_logic_vector(31 downto 0); begin a_min_b := (others => '0'); -- Avoid latch if (d_reg = '1') then a_min_b := std_logic_vector(signed(a_reg) - signed(b_reg)); if (signed(x_reg) > signed(a_min_b)) then r <= x_reg; else r <= a_min_b; end if; else r <= (others => '0'); end if; end process; A further (textual) optimization would be declaring r, a_reg, b_reg, x_reg and a_min_b as signed, in stead of std_logic_vector. This would avoid most of the conversions. The resulting code (untested): library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity test_latch is port (clk : in std_logic; rst : in std_logic; d_in : in std_logic; a_in : in std_logic_vector(31 downto 0); b_in : in std_logic_vector(31 downto 0); x_in : in std_logic_vector(31 downto 0); r_out : out std_logic_vector(31 downto 0) ); end test_latch; architecture Behavioral of test_latch is signal a_reg,b_reg,x_reg : signed(31 downto 0); signal d_reg : std_logic; signal r : signed(31 downto 0); begin SEQ1 : process (clk, rst) is begin if rst ='1' then a_reg <= (others => '0'); b_reg <= (others => '0'); x_reg <= (others => '0'); d_reg <= '0'; elsif rising_edge(clk) then if d_reg = '1' then a_reg <= signed(a_in); b_reg <= signed(b_in); x_reg <= signed(x_in); d_reg <= d_in; end if; end if; end process; COMB_B : process (x_reg, a_reg, b_reg, d_reg) is variable a_min_b : signed(31 downto 0); begin a_min_b := (others => '0'); -- Avoid latch if d_reg = '1' then a_min_b := a_reg - b_reg; if x_reg > a_min_b then r <= x_reg; else r <= a_min_b; end if; else r <= (others => '0'); end if; end process; SEQ2 : process (clk, rst) is begin if rst ='1' then r_out <= (others => '0'); -- This was missing (intentional?) elsif rising_edge(clk) then r_out <= std_logic_vector(r); end if; end process; end Behavioral; -- Paul Uiterlinden |
|
|
|
#4 |
|
Posts: n/a
|
Thanks Paul, I guess a variable was what I needed all along ! I tried
it out and it gives the same path delay as the latch but the variable is obviously much nicer. Btw, I got into the habit of using std_logic_vector everywhere because Xilinx XST Guide (I think) says only to use slv in port declarations, so I was just being lazy for sloppy mistakes. Neo - The latch itself doesn't do anything to improve the timing, it's not clocked so it cannot reduce the number of logic levels, but in this case it provided faster logic than the alternative code. Andy. Andy |
|