Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > VHDL > newbie with timing problem, is adding pipeline stages the only optionto speed up?

Reply
Thread Tools

newbie with timing problem, is adding pipeline stages the only optionto speed up?

 
 
festi.giorgio@gmail.com
Guest
Posts: n/a
 
      09-14-2012
i am implementing a fir filter and it has to be(in my naive opinion) very fast.
when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
Without going too much in the detail.. is adding pipeline stages the only option that i have to speed up the all system?

thanks
 
Reply With Quote
 
 
 
 
rickman
Guest
Posts: n/a
 
      09-14-2012
On 9/14/2012 2:57 AM, wrote:
> i am implementing a fir filter and it has to be(in my naive opinion) very fast.
> when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
> Without going too much in the detail.. is adding pipeline stages the only option that i have to speed up the all system?
>
> thanks



No, there are other options.

Rick
 
Reply With Quote
 
 
 
 
antongunman@gmail.com
Guest
Posts: n/a
 
      09-14-2012
It first depends on what you mean by "very fast".

In some devices there are dedicated multipliers or MACC (e.g. DSP4 that are supposed to work up to ~400MHz-~600MHz.
Although this is true, you would have to carefully tweak your architecture,by either correctly inferring the multiplier components and cascading themwith appropriate register (pipelining) stages. Also the max achievable speed might be limited by the actual physical number of multipliers in an FPGAcolumn e.g. 7 (cascading to another column might reduce the theoretical max speed).
You could also use tools like the Core Generator (Xilinx) to generate such filters (I have not done that).

If on the other hand you do not care too much about such details, and wouldrather want to speed-up your design's timing some rules of thumb are:
- Trim your bit-width as soon as possible. (e.g. some dedicated multipliersor MACC like the DSP48 are typically SIGNED 18x25in and 48out, with a carry in of 4.
- Add pipeline stages. At the input of the multipliers, at the output, and at the adder stages. (For most applications, these latencies are consideredinsignificant).
- Consider using an "adder chain" instead of an "adder tree" for the final summation. (An adder is included in the DSP48 and can be efficiently cascaded using an adder chain).
- Use a synchronous reset??? (Depending on the manufacturer, sometimes dedicated multipliers can only be inferred if you use a sync-reset, otherwise you'll end up using the slower FPGA fabric).

For your case, it might be that you have a huge combinatorial adder tree at the output (e.g. SUM = A+B+C+D...+Y+Z), this is why when you increase the number of coefficients, the timing fails (but I’m just guessing at this point, I could not tell without looking at the code).

You do not have to know all the details and inner workings, just a bit of understanding of your current platform can help you code efficiently and in a way that the synthesizer can better understand.

You might also want to take a look at the synthesizer/platform user guides (e.g. the "XST User Guide") these sometimes contain VHDL templates for specific cases.

I hope this helps !
 
Reply With Quote
 
Robert Miles
Guest
Posts: n/a
 
      09-15-2012
On Friday, September 14, 2012 1:57:40 AM UTC-5, festi....@gmail.com wrote:
> i am implementing a fir filter and it has to be(in my naive opinion) veryfast.
>
> when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
>
> Without going too much in the detail.. is adding pipeline stages the onlyoption that i have to speed up the all system?
>
>
>
> thanks


Adding carefully controlled delays in the appropriate places is POSSIBLE, but hard to design, partly since ASIC and FPGAs seldom have any way other than clocked stages to supply delays with a well-controlled value range. This would mean longer to get the result, but usually less than pipelining.

Do ypu have a way to do more of the first part of the calculations in parallel, then combine them in another stage?
 
Reply With Quote
 
Benjamin Couillard
Guest
Posts: n/a
 
      09-25-2012
Off the top of my head here are the options

1 - pipelining (you can pipeline the outputs and inputs of the multipliers,and also pipeline the tree adder stage)

The output for a generic fir filter

y[n] = b[0]*x[0] + b[1]*x[1] + ... b[n]*x[n];

you can pipeline the outputs of b[0]*x[0] to b[n]*x[n]. After, you need to add all the intermediate ouputs.

Suppose, that your filter order is 15, so you have 16 coefficients in all.

stage 1 :

8 adders (b[0]*x[0] + b[1]*x[1], b[2]*x[2] + b[3]*x[3], ...), then pipeline

stage 2 : 4 adders

stage 3 : 2 adders

stage 4 : 1 adder

The number of adder stages stages is ceil(log2(order+1)). Don't forget to pipeline after each stage.

If you pipelined the input before the multiplier, you'd have 1 pipeline delay, if you pipelined the multipliers outputs, you'd have another pipeline delays. Adding the pipelined adder tree would add another 4 pipeline delays.

In total, you would have 6 pipeline delays for this implementation. This isnot usually a big deal and would give you a speed boost.

2 - If your filter is symmetrical (or anti-symmetrical), you can get rid ofhalf the multipliers.

y[n] = b[0]*x[0] + b[1]*x[1] + ... b[n]*x[n];

if b[n] = b[0], b[1] = b[n-1], then you could rewrite the FIR equation as

y[n] = b[0]*(x[0]+x[n]) + b[1]*(x[1]+x[n-1]) +

3 - If you have access to the quantizer toolbox in Matlab you can simulate how many bits you actually need to implement the filter. You could then reduce the bitwidth of some intermediate signals and thus reduce the combinatorial delays. You don't actually need the fixed-point toolbox to perform this analysis, but it's a good tool if you have it.

There might be some errors in my answer, but I think that the general ideasare present.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Embeding pipeline stages to a recursive adder tree code capitan VHDL 3 09-25-2011 09:39 AM
Re: printing a powerpoint document from browser using grayscale optionto save printer ink Diabolic Preacher Computer Support 1 02-15-2010 06:52 PM
How to decide the stages of a pipeline device? Cuthbert VHDL 3 07-31-2008 08:32 AM
dlx to three stages giudar@libero.it VHDL 0 05-21-2005 10:39 AM
parameters for Routability estimation and analysis during RTL stages of the design. santhosh VHDL 1 08-21-2003 04:52 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57