![]() |
|
|
|||||||
![]() |
VHDL - This question seems simpler than it actually is... |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Hello All,
Let's start with a simple problem: Given a buffer of "Buf_Depth" elements (let's say they are unsigned vectors with "Data_Width" number of bits), find the minimum element in the buffer. Our first solution is straightforward: ----------------------- PROCESS VARIABLE min : UNSIGNED(Data_Width-1 DOWNTO 0); BEGIN min := Data_Buf(0); FOR i IN 1 TO Buf_Depth-1 LOOP IF (Data_Buf(i) < min) THEN min := Data_Buf(i) END IF; END LOOP; FinalResult <= min; END PROCESS; -------------------------- The nice thing about this solution is that it will work for any value "Buf_Depth". Everything seems great until we look at the timing analysis (after synthesis). This design is painfully slow. To get better performance, we can speed up the design with some pipelining. Setting "Buf_Depth" = 4, we get: ----------------------- SIGNAL pipeA, pipeB : UNSIGNED(Data_Width-1 DOWNTO 0); PROCESS(Clk) BEGIN IF (rising_edge(Clk)) THEN IF (Data_Buf(0) < Data_Buf(1)) THEN pipeA <= Data_Buf(0); ELSE pipeA <= Data_Buf(1); END IF; IF (Data_Buf(2) < Data_Buf(3)) THEN pipeB <= Data_Buf(2); ELSE pipeB <= Data_Buf(3); END IF; IF (pipeA < pipeB) THEN FinalResult <= pipeA; ELSE FinalResult <= pipeB; END IF; END IF; END PROCESS; ----------------------- This solution (after synthesis) is very fast, but Buf_Depth is hard coded to '4' - ie it is not a generic solution. So here is my question: How can I code up a VHDL block similar to the pipelined solution that can be instantiated for any value of Buff_Depth? Thanks, Amir arubin |
|
|
|
|
#2 |
|
Posts: n/a
|
arubin wrote:
> Hello All, > > Let's start with a simple problem: Given a buffer of "Buf_Depth" > elements (let's say they are unsigned vectors with "Data_Width" number > of bits), find the minimum element in the buffer. > > Our first solution is straightforward: > > ----------------------- > PROCESS > VARIABLE min : UNSIGNED(Data_Width-1 DOWNTO 0); > BEGIN > > min := Data_Buf(0); > > FOR i IN 1 TO Buf_Depth-1 LOOP > IF (Data_Buf(i) < min) THEN > min := Data_Buf(i) > END IF; > END LOOP; > > FinalResult <= min; > > END PROCESS; > -------------------------- > > The nice thing about this solution is that it will work for any value > "Buf_Depth". Everything seems great until we look at the timing > analysis (after synthesis). This design is painfully slow. > > To get better performance, we can speed up the design with some > pipelining. Setting "Buf_Depth" = 4, we get: > > ----------------------- > > SIGNAL pipeA, pipeB : UNSIGNED(Data_Width-1 DOWNTO 0); > > PROCESS(Clk) > BEGIN > IF (rising_edge(Clk)) THEN > > IF (Data_Buf(0) < Data_Buf(1)) THEN > pipeA <= Data_Buf(0); > ELSE > pipeA <= Data_Buf(1); > END IF; > > IF (Data_Buf(2) < Data_Buf(3)) THEN > pipeB <= Data_Buf(2); > ELSE > pipeB <= Data_Buf(3); > END IF; > > IF (pipeA < pipeB) THEN > FinalResult <= pipeA; > ELSE > FinalResult <= pipeB; > END IF; > > END IF; > END PROCESS; > > ----------------------- > > This solution (after synthesis) is very fast, but Buf_Depth is hard > coded to '4' - ie it is not a generic solution. > > So here is my question: How can I code up a VHDL block similar to > the pipelined solution that can be instantiated for any value of > Buff_Depth? > > Thanks, > Amir > This sounds like a good candidate for recursion. I don't know how you'd code it in VHDL, but the theory would be to implement a function that would return the minimum element of an array. Here's some 'c' code that works: int min(int *v, int n) { int l, r, n2; if(n==1) return *v; n2 = n/2; l=min(v, n2); r=min(v+n2, n-n2); return l<r ? l : r; } Just convert this to an equivalent VHDL function declaration. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture |
|
|
|
#3 |
|
Posts: n/a
|
To refine David's suggestion a bit and to incorporate the pipelining
you want to improve performance you'll probably want to make a procedure that takes as input both the array that you want to get the minimum value of along with another integer parameter (a constant, call it 'N' for now) that defines how many clock cycles you're willing to wait for this to be calculated. This procedure would then have 'N' calls to David's minimum value function each function call working on BufDepth / N elements of the array. After one clock cycle that procedure would then have an array of 'N' values which you would then find the minimum of by repeating this process (as David mentions recursion would be handy here). Obviously a few of those sticky details would need to be worked out but that would be the basic idea. KJ |
|
|
|
#4 |
|
Posts: n/a
|
KJ wrote:
> To refine David's suggestion a bit and to incorporate the pipelining > you want to improve performance you'll probably want to make a > procedure that takes as input both the array that you want to get the > minimum value of along with another integer parameter (a constant, call > it 'N' for now) that defines how many clock cycles you're willing to > wait for this to be calculated. > > This procedure would then have 'N' calls to David's minimum value > function each function call working on BufDepth / N elements of the > array. After one clock cycle that procedure would then have an array > of 'N' values which you would then find the minimum of by repeating > this process (as David mentions recursion would be handy here). > > Obviously a few of those sticky details would need to be worked out but > that would be the basic idea. I think all of you are going about this the wrong way (IMHO). First, I find a lot of logic designers forget what HDL stands for... Hardware Description Language. Instead of designing their logic to suit both the behavior required as well as the practical requirements of the problem, they often start coding the problem as if they were writing software. The result is... well I don't know what the result is and that is the point, neither do they! Of course if you have coded a problem in the past in a similar style you can anticipate what the compiler will do with your code. But if you have not written this sort of code before, you won't know! I design my logic without writing a line of HDL. Then once I am happy with the approach, I start writing HDL to "describe" the logic I want. Every HDL tool I have seen in the last 10 years has provided templates for HDL that the tool understands and explains how it will map to hardware. If you use those templates to describe your hardware, you will get what you have designed and you will understand your result completely. The difference between the two approaches above is that one chains N-1 compares in a serial fashion and the other seems to be oriented in a tree structure. I actually have no idea exactly how the logic will work out for the serial coding because the compares are all done against a variable rather than between the input elements. I have never coded in this style so I can't say how the synthesis will translate this. The tree solution can be designed using Generate statements and a few basic calculations to figure out the number of levels (log of Buf_Depth with upward rounding) and the number of compares in the first level (difference between 2 to the power of the number of levels and Buf_Depth). If this is hard to think of just consider that you are describing a logarithmic triangle, so to speak. This should be a simple geometry problem. BTW, this type of solution is not likely to be very practical for a buffer of any size since it will consume O(N) hardware resources. |
|
|
|
#5 |
|
Posts: n/a
|
rickman wrote: > KJ wrote: > > To refine David's suggestion a bit and to incorporate the pipelining > > you want to improve performance you'll probably want to make a > > procedure that takes as input both the array that you want to get the > > minimum value of along with another integer parameter (a constant, call > > it 'N' for now) that defines how many clock cycles you're willing to > > wait for this to be calculated. > > > > This procedure would then have 'N' calls to David's minimum value > > function each function call working on BufDepth / N elements of the > > array. After one clock cycle that procedure would then have an array > > of 'N' values which you would then find the minimum of by repeating > > this process (as David mentions recursion would be handy here). > > > > Obviously a few of those sticky details would need to be worked out but > > that would be the basic idea. > > I think all of you are going about this the wrong way (IMHO). I don't...in my opinion > First, I > find a lot of logic designers forget what HDL stands for... Hardware > Description Language. Yep...and the 'V' in VHDL stands for 'VHSIC' which stands for Very High Speed Integrated Circuit....not sure what you were getting at, but this just as relevant. > Instead of designing their logic to suit both > the behavior required as well as the practical requirements of the > problem, they often start coding the problem as if they were writing > software. Again, kind of missing the relevance to the original post? >The result is... well I don't know what the result is and > that is the point, neither do they! Of course if you have coded a > problem in the past in a similar style you can anticipate what the > compiler will do with your code. But if you have not written this sort > of code before, you won't know! I do. I've written min/max functions before. Many times it is simply to search through some set of constants/generics and find the min/max and return a constant....and that's exactly what is returned and synthesized...a constant. I've also written much more complicated functions to do all kinds of things and again, what got synthesized is right along the lines of what I wanted and expected to pop out. Granted the poster is looking to do min on signals and shouldn't get a constant as a result but if one can succintly code what he is looking to do and parameterize it so as to allow for multiple clock cycle for pipelining roughly along the lines of how I suggested and have it work out just fine. > > I design my logic without writing a line of HDL. Then once I am happy > with the approach, I start writing HDL to "describe" the logic I want. > Every HDL tool I have seen in the last 10 years has provided templates > for HDL that the tool understands and explains how it will map to > hardware. If you use those templates to describe your hardware, you > will get what you have designed and you will understand your result > completely. The only thing in the original post was a 'min' function which should translate into a subtractor to implement each (if X < Y)....not much of a stretch there in my opinion. > > The difference between the two approaches above is that one chains N-1 > compares in a serial fashion and the other seems to be oriented in a > tree structure. Serial versus pipelined is what I would call it but OK. > I actually have no idea exactly how the logic will > work out for the serial coding because the compares are all done > against a variable rather than between the input elements. I have > never coded in this style so I can't say how the synthesis will > translate this. As with all other code...it will translate it exactly as written....by that I mean the synthesis output will be functionally equivalent to the source code.....if you're not always getting that all the time then you have a tool problem. > > The tree solution can be designed using Generate statements and a few > basic calculations to figure out the number of levels (log of Buf_Depth > with upward rounding) and the number of compares in the first level > (difference between 2 to the power of the number of levels and > Buf_Depth). If this is hard to think of just consider that you are > describing a logarithmic triangle, so to speak. This should be a > simple geometry problem. Like I said, I did leave some of the nasty details to the original poster...I just gave an outline of how I would approach it. > > BTW, this type of solution is not likely to be very practical for a > buffer of any size since it will consume O(N) hardware resources. Not true. 'N' will be bounded and will be synthesizable into 'some' number of logic cells. Presumably the poster already has a family of parts in mind that he/she is targetting and will either get it to fit in the design or say...DANG....I can't do it this way and see about if it would be possible to stream the data in and calculate the minimum on the fly....that way only needing one comparator for the entire design. But that's not what they were asking for, maybe there is some reason that the data can not be streamed in but is just 'there' and now he needs to find the minimum of it. My 2 cents. KJ |
|
|
|
#6 |
|
Posts: n/a
|
rickman wrote:
> I think all of you are going about this the wrong way (IMHO). First, I > find a lot of logic designers forget what HDL stands for... Hardware > Description Language. Instead of designing their logic to suit both > the behavior required as well as the practical requirements of the > problem, they often start coding the problem as if they were writing > software. The result is... well I don't know what the result is and > that is the point, neither do they! Of course if you have coded a I think I understand the point you're trying to make but that's not where the approach is coming from. I'm not trying to shoehorn 'c' into VHDL, far from it. What I'm trying to do is express the concept of automatically building up the comparator tree, yet have the code be readable + intuitive. For the exercise I went ahead and did this with a 6 element array. It simulates fine, and then I synthesized it. The synthesis report indicates 5 comparators were used, which is exactly what would be required: HDL Synthesis Report Macro Statistics # Registers : 3 8-bit register : 3 # Comparators : 5 8-bit comparator greater : 2 8-bit comparator less : 3 # Multiplexers : 3 8-bit 4-to-1 multiplexer : 2 8-bit 8-to-1 multiplexer : 1 Below is the code for reference. Now, I look at the function "min" and it seems perfectly intuitive what it is doing. And it maps perfectly into VHDL, and it seems to get the job done. There is an accepted principle that any algorithm implemented using recursion can be implemented without recursion. However I can't imagine a non-recursive form being this nifty. Now, in practice, I can't see much use for this sort of thing in the first place. I think in actual digital circuit design one wouldn't have all the elements of an array built up with their values. You'd have to have logic to initialize the array elements -- I doubt if in one clock cycle you'd fill out the whole array. So if you're going to spread out the initialization of the array across multiple clock cycles, you can take advantage of that and spread out the mechanism to determine the minimum value over those very same clock cycles. I look at this as just an interesting exercise. A possible application might be if you have X devices trying to control a resource, and each has a priority associated with it, you want to pick the device with the highest priority. -Dave library IEEE; use ieee.numeric_std.all; use ieee.std_logic_1164.all; entity system is port ( clk, rst_p : inout std_logic; min_out : out std_logic_vector(7 downto 0)); end system; architecture behav of system is subtype entry is unsigned(7 downto 0); type invect is array (natural range <>) of entry; signal vec : invect(0 to 5); signal minimum : entry; type state is (c0, c1, c2, c3); signal mystate : state := c0; function min(iv : invect) return entry is variable n2 : integer; variable l, r : entry; begin if iv'length = 1 then return iv(iv'left); end if; n2 := (iv'left + iv'right + 1) / 2; l := min(iv(iv'left to n2-1)); r := min(iv(n2 to iv'right)); if l < r then return l; else return r; end if; end; begin -- pragma translate_off process begin clk <= '0'; for i in 0 to 100 loop wait for 5 ns; clk <= not clk; end loop; wait; end process; process begin rst_p <= '1'; wait for 20 ns; rst_p <= '0'; wait; end process; -- pragma translate_on process(clk) begin if clk'event and clk='1' then if rst_p = '1' then mystate <= c0; end if; case mystate is when c0 => vec(0) <= X"22"; vec(1) <= X"11"; vec(2) <= X"55"; vec(3) <= X"77"; vec(4) <= X"33"; vec(5) <= X"99"; mystate <= c1; when c1 => vec(3) <= X"05"; mystate <= c2; when c2 => vec(3) <= X"77"; vec(1) <= X"aa"; mystate <= c3; when c3 => vec(5) <= X"02"; mystate <= c0; end case; end if; end process; minimum <= min(vec); min_out <= std_logic_vector(minimum); end behav; -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture |
|
|
|
#7 |
|
Posts: n/a
|
"David Ashley" <> wrote in message news:5--... I agree with everything you said and pass kudos for posting code and results...as well as for the DDR thing you had posted a while ago too. > Now, I look at the function "min" and it seems perfectly > intuitive what it is doing. And it maps perfectly into VHDL, > and it seems to get the job done. And that's what makes it a 'good' implementation in my book. It maps well to the hardware and is painfully obvious to understand. > > There is an accepted principle that any algorithm implemented > using recursion can be implemented without recursion. However > I can't imagine a non-recursive form being this nifty. > > Now, in practice, I can't see much use for this sort of thing > in the first place. I think in actual digital circuit design one > wouldn't have all the elements of an array built up with their > values. You'd have to have logic to initialize the array > elements -- I doubt if in one clock cycle you'd fill out the whole > array. What about if on every clock cycle you're getting some sensor inputs and you need something to find the minimum for whatever reason as part of the signal processing. Those sensor inputs would all be in parallel. > So if you're going to spread out the initialization of the > array across multiple clock cycles, you can take advantage > of that and spread out the mechanism to determine the > minimum value over those very same clock cycles. Probably true in many cases but the original poster may have something else in mind that would preclude this....like the above mentioned sensor inputs. Although if he hadn't considered this approach the mention that had now been made a couple times in this thread may tickle a brain cell on his side to question if the serial approach might be better....since there is a logic resource issue to contend with doing it in parallel. > I look at this as just an interesting exercise. Yep, and the good thing is that sometimes you get new insights when pondering these exercises. Even if not directly applicable now, you've tickled a brain cell or two that might remember it when you need it. KJ |
|
|
|
#8 |
|
Posts: n/a
|
> What about if on every clock cycle you're getting some sensor inputs and you > need something to find the minimum for whatever reason as part of the signal > processing. Those sensor inputs would all be in parallel. This is exactly my scenario. Every clock cycle I grab data from my sensors and shift the new data into the data buffer (oldest data gets shifted out). Part of my signal processing involves getting the minimum value of this data buffer each cycle. > > So if you're going to spread out the initialization of the > > array across multiple clock cycles, you can take advantage > > of that and spread out the mechanism to determine the > > minimum value over those very same clock cycles. I actually tried a few approaches similar to this and couldn't come up with anything I liked. > > I look at this as just an interesting exercise. I do to. I could have just 'hard-coded' the data buffer size, but I want a solution that is flexible enough so that I can re-use it with different buffer sizes (and maybe eventually be able to use different buffer sizes on the fly). I also wanted to explore what I could do with VHDL a little bit more. > Yep, and the good thing is that sometimes you get new insights when > pondering these exercises. Even if not directly applicable now, you've > tickled a brain cell or two that might remember it when you need it. Yes - big thanks to everyone's insightful comments |
|
|
|
#9 |
|
Posts: n/a
|
for generate? |
|