Go Back   Velocity Reviews > Newsgroups > VHDL
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

VHDL - Synthesis of Concurrent Statements for FIR Filter

 
Thread Tools Search this Thread
Old 03-24-2009, 03:46 PM   #1
Default Synthesis of Concurrent Statements for FIR Filter


Dear List,

I am trying to implement a 16-tap FIR Low-Pass Filter and have written
the convolution in VHDL (of which I am a beginner). The input sequence
'x' is a 1-bit sequence of 1's and 0's. This is to be converted to 1's
and -1's and convolved with the impulse sequence 'h'. My goal is for
the convolution portion of the filter to be completely asynchronous
and parallel. That is with each clock cycle 16 bits of the input
sequence are convolved with the impulse response providing a single 12-
bit output. Each element of the impulse response 'h' is a 10 bit
signed integer. The input sequence 'x' is a known sequence and I am
sure the output sequence 'y' will always fit into 12 bits.

Here is the code:
-- 16-tap FIR Low-Pass Filter Convolution Function
--
--
-- When convolved with the code it will produce a maximum value that
will fit into 12-bits

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.numeric_std.all;

entity fir_lpf_conv is
port (
x: in std_logic_vector(15 downto 0);
y: out std_logic_vector(11 downto 0)
);
end fir_lpf_conv;

architecture fir_lpf_conv_arch of fir_lpf_conv is
type coef_type is array(0 to 15) of integer range -511 to 511;
constant h: coef_type :=
(4,-2,-28,-53,-17,128,345,511,511,345,128,-17,-53,-28,-2,4);
signal mult: coef_type;
signal sum: integer range -2047 to 2047;
begin
blabla: for i in x'range generate
mult(i) <= h(i) when x(i)='1' else -h(i);
end generate;
sum <= mult(0) + mult(1) + mult(2) + mult(3) + mult(4) + mult(5) +
mult(6) + mult(7)
+ mult( + mult(9) + mult(10) + mult(11) + mult(12) + mult(13)
+ mult(14) + mult(15);
y <= std_logic_vector(to_signed(sum,12));
end fir_lpf_arch;

I haven't simulated it yet, but I have a sneaky feeling it will not do
what I expect. Even if it does do what I want it to then I'd like to
understand why.

The code should multiply each h element by the corresponding x element
(with zeros converted to -1s) in parallel, AND THEN sum the result
into sum AND THEN put the 'sum' result into 'y'. My use of AND THEN in
that statement makes me think I need sequential code, that is the
multiply should be done in parallel, and the sum should be done in
parallel, but the sum should use the results of the multiply. However
when I look at sequential code it is always clock or event driven and
I don't think that's what I need. All this should be done in less than
1/2 clock cycle.

I could see the compiler synthesizing the above code in two different
ways:

1. Multiply in parallel AND THEN add the results in parallel. (this
would be good)
2. Multiply in parallel and add in parallel. The parallel sum will use
the previous values stored in 'mult', and possibly some updated values
in 'mult' depending on the exact timing. (this would be bad)

So my question is: if the code is correct, then what is the rule for
synthesis? How does the compiler know that I want 'AND THEN' behavior?
If the code is incorrect, what do I write to get 'AND THEN' behavior
that is not clock driven?

I also have a couple less important questions:
Is there a better way to write my sum using a for loop? I couldn't get
it to compile.
I really don't need the intermediate signal 'sum'. I'd like to just
sum into 'y' but I get a type error because the synthesizer doesn't
know if the stuff on the right is signed or unsigned.

Thank You!
Brian


heilig.brian@gmail.com
  Reply With Quote
Old 03-24-2009, 04:16 PM   #2
Tricky
 
Posts: n/a
Default Re: Synthesis of Concurrent Statements for FIR Filter
On 24 Mar, 15:46, heilig.br...@gmail.com wrote:
> Dear List,
>
> I am trying to implement a 16-tap FIR Low-Pass Filter and have written
> the convolution in VHDL (of which I am a beginner). The input sequence
> 'x' is a 1-bit sequence of 1's and 0's. This is to be converted to 1's
> and -1's and convolved with the impulse sequence 'h'. My goal is for
> the convolution portion of the filter to be completely asynchronous
> and parallel. That is with each clock cycle 16 bits of the input
> sequence are convolved with the impulse response providing a single 12-
> bit output. Each element of the impulse response 'h' is a 10 bit
> signed integer. The input sequence 'x' is a known sequence and I am
> sure the output sequence 'y' will always fit into 12 bits.
>
> Here is the code:
> -- 16-tap FIR Low-Pass Filter Convolution Function
> --
> --
> -- When convolved with the code it will produce a maximum value that
> will fit into 12-bits
>
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.std_logic_arith.all;
> use ieee.numeric_std.all;
>
> entity fir_lpf_conv is
> * * * * port (
> * * * * * * * * x: in std_logic_vector(15 downto 0);
> * * * * * * * * y: out std_logic_vector(11 downto 0)
> * * * * );
> end fir_lpf_conv;
>
> architecture fir_lpf_conv_arch of fir_lpf_conv is
> * * * * type coef_type is array(0 to 15) of integer range -511 to 511;
> * * * * constant h: coef_type :=
> (4,-2,-28,-53,-17,128,345,511,511,345,128,-17,-53,-28,-2,4);
> * * * * signal mult: coef_type;
> * * * * signal sum: integer range -2047 to 2047;
> begin
> * * * * blabla: for i in x'range generate
> * * * * * * * * mult(i) <= h(i) when x(i)='1' else -h(i);
> * * * * end generate;
> * * * * sum <= mult(0) + mult(1) + mult(2) + mult(3) + mult(4) + mult(5) +
> mult(6) + mult(7)
> * * * * * * *+ mult( + mult(9) + mult(10) + mult(11) + mult(12) + mult(13)
> + mult(14) + mult(15);
> * * * * y <= std_logic_vector(to_signed(sum,12));
> end fir_lpf_arch;
>
> I haven't simulated it yet, but I have a sneaky feeling it will not do
> what I expect. Even if it does do what I want it to then I'd like to
> understand why.
>
> The code should multiply each h element by the corresponding x element
> (with zeros converted to -1s) in parallel, AND THEN sum the result
> into sum AND THEN put the 'sum' result into 'y'. My use of AND THEN in
> that statement makes me think I need sequential code, that is the
> multiply should be done in parallel, and the sum should be done in
> parallel, but the sum should use the results of the multiply. However
> when I look at sequential code it is always clock or event driven and
> I don't think that's what I need. All this should be done in less than
> 1/2 clock cycle.
>
> I could see the compiler synthesizing the above code in two different
> ways:
>
> 1. Multiply in parallel AND THEN add the results in parallel. (this
> would be good)
> 2. Multiply in parallel and add in parallel. The parallel sum will use
> the previous values stored in 'mult', and possibly some updated values
> in 'mult' depending on the exact timing. (this would be bad)
>
> So my question is: if the code is correct, then what is the rule for
> synthesis? How does the compiler know that I want 'AND THEN' behavior?
> If the code is incorrect, what do I write to get 'AND THEN' behavior
> that is not clock driven?
>
> I also have a couple less important questions:
> Is there a better way to write my sum using a for loop? I couldn't get
> it to compile.
> I really don't need the intermediate signal 'sum'. I'd like to just
> sum into 'y' but I get a type error because the synthesizer doesn't
> know if the stuff on the right is signed or unsigned.
>
> Thank You!
> Brian


What you have written contains 0 multipliers, 15 x 2-1 muxes, no
registers and a very long adder chain. It is very very unlikely that
this will work. you will HAVE to break up the adder chain and pipeline
it - 16 adds just isnt going to work without pipelining. You normally
only want to add 2-3 numbers in a single clock cycle.

You also say "x" is a 1 bit sequence? is it coming in serially? or is
it really coming in as a bus like you've written. As it stands, it
expects all the X bits to be there at the same time.

I suggest you read up on digital design. this code is no way
synthesisable. Here is a hint (Im going to assume that X is a
synchronous input and not asynchronous like you said):

It should give you a latency of 4 clock cycles:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity fir_lpf_conv is
port (
clk : in std_logic;

x: in std_logic_vector(15 downto 0);
y: out std_logic_vector(11 downto 0)
);
end fir_lpf_conv;

architecture fir_lpf_conv_arch of fir_lpf_conv is
type coef_type is array(0 to 15) of integer range -511 to 511;
constant h: coef_type :=
(4,-2,-28,-53,-17,128,345,511,511,345,128,-17,-53,-28,-2,4);
signal mult: coef_type;

subtype sum_range_t is integer range -2047 to 2047;

signal sum: sum_range_t;

signal sum01 : sum_range_t;
signal sum23 : sum_range_t;
.....etc
begin
blabla: for i in x'range generate
mult(i) <= h(i) when x(i)='1' else -h(i);
end generate;


sum_proc : process(clk)
variable sum_total : integer;
begin
if rising_edge(clk) then

sum01 <= mult(0) + mult(1);
sum23 <= mult(2) + mult(3);
........etc

sum0123 <= sum01 + sum23;
.......etc


sum_total := sum0to7 + sum8to15;
y <= std_logic_vector( to_signed( sum_total,
12) );

end if;
end process;


end fir_lpf_arch;


Also - delete std_logic_arith from the code. It clashes with
numeric_std. always use the numeric_std package (which you have).



Tricky
  Reply With Quote
Old 03-24-2009, 04:24 PM   #3
joris
Member
 
Join Date: Jan 2009
Posts: 52
Default
VHDL (simulation) works with "delta" delays:
- each mult(i) is set.
- after a "delta" delay, all mult(i) are summed.
- after a "delta" delay, y is assigned.
The synthesis tool will take care of keeping this "delta" semantics intact

I think this will do what you want?

Though you could also write this as a process:

Code:
architecture fir_lpf_conv_arch of fir_lpf_conv is begin process(x) type coef_type is array(0 to 15) of integer range -511 to 511; constant h: coef_type := (4,-2,-28,-53,-17,128,345,511,511,345,128,-17,-53,-28,-2,4); variable mult: coef_type; variable sum: integer range -2047 to 2047; begin for i in x'range loop mult(i) := h(i) when x(i)='1' else -h(i); end generate; sum := mult(0) + mult(1) + mult(2) + mult(3) + mult(4) + mult(5) + mult(6) + mult(7) + mult(8) + mult(9) + mult(10) + mult(11) + mult(12) + mult(13) + mult(14) + mult(15); y <= std_logic_vector(to_signed(sum,12)); end process; end fir_lpf_arch;

The best way to calculate sum would be something like,
Code:
sum(0) <= mult(0) + mult(1); sum(1) <= mult(2) + mult(3); sum(2) <= mult(4) + mult(5); sum(3) <= mult(6) + mult(7); sum(4) <= mult(8) + mult(9); sum(5) <= mult(10) + mult(11); sum(6) <= mult(12) + mult(13); sum(7) <= mult(14) + mult(15); sum2(0) <= sum(0) + sum(1); sum2(1) <= sum(2) + sum(3); sum2(2) <= sum(4) + sum(5); sum2(3) <= sum(6) + sum(7); sum3(0) <= sum2(8) + sum2(9); sum3(1) <= sum2(10) + sum2(11); sum4 <= sum(12) + sum(13);
Written like this additions are performed in parallel which will scale better.

This isn't too hard to rewrite into a few loops to make this dynamic.
Might even write as a nested loop, using a double indexed array sum(i,j) (or sum(i)(j), depending on how you declare it)

I wrote it like that to make the pattern clear for looping. It isn't actually needed to write like that to get parallelism. You can simply do:
Code:
sum <= (((mult(0) + mult(1)) + (mult(2) + mult(3))) + ((mult(4) + mult(5)) + (mult(6) + mult(7)))) + (((mult(8) + mult(9)) + (mult(10) + mult(11))) + ((mult(12) + mult(13)) + (mult(14) + mult(15))));
Though it already starts to look a bit like lisp like that


joris

Last edited by joris : 03-24-2009 at 04:28 PM.
joris is offline   Reply With Quote
Old 03-24-2009, 04:53 PM   #4
heilig.brian@gmail.com
 
Posts: n/a
Default Re: Synthesis of Concurrent Statements for FIR Filter
> What you have written contains 0 multipliers,

The following line...

mult(i) <= h(i) when x(i)='1' else -h(i);

....is a 1 bit multiplier where a 1 means 'multiply by 1' and a 0 means
'multiply by -1'. When x(i)='1' then mult(i) <= h(i) * 1, else mult(i)
<= h(i) * -1.

> 15 x 2-1 muxes, no
> registers and a very long adder chain. It is very very unlikely that
> this will work. you will HAVE to break up the adder chain and pipeline
> it - 16 adds just isnt going to work without pipelining. You normally
> only want to add 2-3 numbers in a single clock cycle.


Because of the propagation delay? The Quartus II software I'm using
has a parallel_add megafunction (if you're not familiar with Quartus
II a megafunction is like a parameterized logical element) that can
add up to 128 32-bit integers in parallel! Well, at least that's what
it says.

> You also say "x" is a 1 bit sequence? is it coming in serially? or is
> it really coming in as a bus like you've written. As it stands, it
> expects all the X bits to be there at the same time.


It is a 1-bit sequence that is initially serial but through a series
of external d flip flops I am converting it to 16 bits in parallel.
However each of these bits represents one element of the x sequence.
It is not converted to a 16 bit word.

> I suggest you read up on digital design. this code is no way
> synthesisable.


Ouch. Well you caught me. I bought "Circuit Design with VHDL" a few
days ago and it is on its way. I thought, "How hard can this be?"

> Here is a hint (Im going to assume that X is a
> synchronous input and not asynchronous like you said):


X is a synchronous input. The problem is I could draw a working logic
diagram that would perform the 16 1-bit multiplies in parallel and
then sum all the results in parallel. In fact I started off this way
but then figured it's a good time to learn VHDL. So if I know that it
can be represented as a bunch of logic gates then the problem is to
write VHDL code that will synthesize those gates for me.

> It should give you a latency of 4 clock cycles:
>
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
>
> entity fir_lpf_conv is
> * * * * port (
> * * * * * * * * clk : in std_logic;
>
> * * * * * * * * x: in std_logic_vector(15 downto 0);
> * * * * * * * * y: out std_logic_vector(11 downto 0)
> * * * * );
> end fir_lpf_conv;
>
> architecture fir_lpf_conv_arch of fir_lpf_conv is
> * * * * type coef_type is array(0 to 15) of integer range -511 to 511;
> * * * * constant h: coef_type :=
> (4,-2,-28,-53,-17,128,345,511,511,345,128,-17,-53,-28,-2,4);
> * * * * signal mult: coef_type;
>
> * * * * subtype sum_range_t is integer range -2047 to 2047;
>
> * * * * signal sum: sum_range_t;
>
> * * * * signal sum01 : sum_range_t;
> * * * * signal sum23 : sum_range_t;
> * * * * .....etc
> begin
> * * * * blabla: for i in x'range generate
> * * * * * * * * mult(i) <= h(i) when x(i)='1' else -h(i);
> * * * * end generate;
>
> * * * * sum_proc : process(clk)
> * * * * * variable sum_total : integer;
> * * * * begin
> * * * * * if rising_edge(clk) then
>
> * * * * * * sum01 * <= mult(0) + mult(1);
> * * * * * * sum23 * <= mult(2) + mult(3);
> * * * * * * ........etc
>
> * * * * * * sum0123 <= sum01 + sum23;
> * * * * * * .......etc
>
> * * * * * * sum_total := sum0to7 + sum8to15;
> * * * * * * y * * * * <= std_logic_vector( to_signed( sum_total,
> 12) );
>
> * * * * * end if;
> * * * * end process;
>
> end fir_lpf_arch;
>
> Also - delete std_logic_arith from the code. It clashes with
> numeric_std. always use the numeric_std package (which you have).


Ok. Thanks for the help.


heilig.brian@gmail.com
  Reply With Quote
Old 03-25-2009, 09:43 AM   #5
Tricky
 
Posts: n/a
Default Re: Synthesis of Concurrent Statements for FIR Filter
On 24 Mar, 16:53, heilig.br...@gmail.com wrote:
> > What you have written contains 0 multipliers,

>
> The following line...
>
> mult(i) <= h(i) when x(i)='1' else -h(i);
>
> ...is a 1 bit multiplier where a 1 means 'multiply by 1' and a 0 means
> 'multiply by -1'. When x(i)='1' then mult(i) <= h(i) * 1, else mult(i)
> <= h(i) * -1.


Thats probably the way you intend it, but in reality you've just
written a mux with 2 constant inputs that are selected via the
appropriate bit on X. on looking at the RTL viewer, that constants you
have chosen make it even less complicated, making each input input
just a function of X. You could completly change the constants, and
you will never get a hardware multiply, you will always get a mux.


>
> > 15 x 2-1 muxes, no
> > registers and a very long adder chain. It is very very unlikely that
> > this will work. you will HAVE to break up the adder chain and pipeline
> > it - 16 adds just isnt going to work without pipelining. You normally
> > only want to add 2-3 numbers in a single clock cycle.

>
> Because of the propagation delay? The Quartus II software I'm using
> has a parallel_add megafunction (if you're not familiar with Quartus
> II a megafunction is like a parameterized logical element) that can
> add up to 128 32-bit integers in parallel! Well, at least that's what
> it says.


Whats wrong with a propgation delay? FPGAs are great for massive
parrallel processing, but there is normally a latency involved.
Pipelining still means you can get 1 result/clock cycle, but you have
to wait n clock cycles of latency before the first result arrives. n
is ALWAYS fixed, so you know when the output is valid, and from then
on every clock cycle yields a valid result. I fear if latency is your
bigest worry, you're coming at FPGA design from the wrong angle.

Yes altera do provide a parallel_add megafunction, but it looks
horrible to use (the data input is based on their own 2d-array of
std_logic for a start, not the best way to encourage use!). But I
could do a parallel add without their mega function, and add 256x64
bit numbers in parallel if I want, just using the "+" sign. Doesnt
mean it'll make good hardware/firmware though. You'll also add that
there is a "Pipeline" parameter on the parallel add megafunction.

ok, Ive compiled some stuff, and heres the results:

As a quick reference, I ran your initial massive add through
timequest, on a stratix 2 (putting registers in at the mux stage and
the output, so timequest could actually work) - FMax = 94Mhz
Doing the massive add with a parallel add component, 0 latency FMax =
200MHz
parallel adder, pipeline length of 4, FMax = 320Mhz
Pipelining it the way I did in previous post : FMax = 360MHz.

remember this has been done on a large device with no additional
logic, so FMax reports may be artificially high. But I know which
method Id rather use!.

to get hold of the parallel add, you have to actually instatiate it.
Converting the data input into the write format is a bit of an arse:


signal data : altera_mf_logic_2D(15 downto 0, 9 downto 0);
begin


i_gen : for i in data'range(1) generate
j_gen :for j in data'range(2) generate
data(i, j) <= std_logic_vector( to_signed(mult(i), 10) )(j);
end generate j_gen;
end generate i_gen;


par_add : parallel_add
generic map (
width => 10,
size => 16,
widthr => 12,
pipeline => 0,
representation => "SIGNED"
)
port map (
data => data,
result => result
);

>
> > You also say "x" is a 1 bit sequence? is it coming in serially? or is
> > it really coming in as a bus like you've written. As it stands, it
> > expects all the X bits to be there at the same time.

>
> It is a 1-bit sequence that is initially serial but through a series
> of external d flip flops I am converting it to 16 bits in parallel.
> However each of these bits represents one element of the x sequence.
> It is not converted to a 16 bit word.


Well, you have x coming in as a 16 bit bus. And you have 16
"multiplies" in parallel.
Another question - how fast is the serial bus? 16x the main clock
speed? if it isnt, how do you know when any of the X bit are valid?


> > Here is a hint (Im going to assume that X is a
> > synchronous input and not asynchronous like you said):

>
> X is a synchronous input. The problem is I could draw a working logic
> diagram that would perform the 16 1-bit multiplies in parallel and
> then sum all the results in parallel. In fact I started off this way
> but then figured it's a good time to learn VHDL. So if I know that it
> can be represented as a bunch of logic gates then the problem is to
> write VHDL code that will synthesize those gates for me.


But Id recommend you do it that way, especially as a VHDL beginner.
VHDL is a description language, not a programming language. It is
meant for describing digital hardware. You can write whatever you want
in VHDL (to a point), and it may simulate how you intend giving the
results you wanted in the way you specified, but that doesnt mean its
any good as a hardware description.




Tricky
  Reply With Quote
Old 03-25-2009, 11:47 AM   #6
heilig.brian@gmail.com
 
Posts: n/a
Default Re: Synthesis of Concurrent Statements for FIR Filter
> Thats probably the way you intend it, but in reality you've just
> written a mux with 2 constant inputs that are selected via the
> appropriate bit on X. on looking at the RTL viewer, that constants you
> have chosen make it even less complicated, making each input input
> just a function of X. You could completly change the constants, and
> you will never get a hardware multiply, you will always get a mux.


I think this is good. It is equivalent to a multiply by 1 or -1,
right?

> Whats wrong with a propgation delay? FPGAs are great for massive
> parrallel processing, but there is normally a latency involved.
> Pipelining still means you can get 1 result/clock cycle, but you have
> to wait n clock cycles of latency before the first result arrives. n
> is ALWAYS fixed, so you know when the output is valid, and from then
> on every clock cycle yields a valid result. I fear if latency is your
> bigest worry, you're coming at FPGA design from the wrong angle.


You are right. Latency is hardly a concern. I guess my line of
thinking was that I could imagine the logic diagram, now if I could
just write the VHDL to make that logic diagram a reality.

But I wasn't asking about throughput delay, which I think is (or I'll
define as) the time through the entire device. Rather I was asking
about the delay between when the x elements are available on the
rising edge of the clock, to when the next y outputs are available to
be sampled. If this time is greater than half a clock cycle then I
will get garbage out. I think you summarized this in your discussion
below determining FMax.

> ok, Ive compiled some stuff, and heres the results:


Again, thank you for your help.

> As a quick reference, I ran your initial massive add through
> timequest, on a stratix 2 (putting registers in at the mux stage and
> the output, so timequest could actually work) - FMax = 94Mhz
> Doing the massive add with a parallel add component, 0 latency FMax =
> 200MHz
> parallel adder, pipeline length of 4, FMax = 320Mhz
> Pipelining it the way I did in previous post : FMax = 360MHz.


I see. My sample clock is 20 MHz so that's ok. But I see your point
and will add pipelining.

> Well, you have x coming in as a 16 bit bus. And you have 16
> "multiplies" in parallel.
> Another question - how fast is the serial bus? 16x the main clock
> speed? if it isnt, how do you know when any of the X bit are valid?


The serial bus is 20 MHz as is the sample clock. Every time a new x
bit is shifted in I process the entire 16-bit sequence again. So bits
0-14 in the last interval become bits 1-15 in this one.

> But Id recommend you do it that way, especially as a VHDL beginner.
> VHDL is a description language, not a programming language. It is
> meant for describing digital hardware. You can write whatever you want
> in VHDL (to a point), and it may simulate how you intend giving the
> results you wanted in the way you specified, but that doesnt mean its
> any good as a hardware description.


You caught me again. I am a programmer with some hardware experience.
This small exercise is only the beginning, I'll soon need to know VHDL
well. So I guess I'll start reading!

Thanks,
Brian


heilig.brian@gmail.com
  Reply With Quote
Old 03-27-2009, 03:13 PM   #7
heilig.brian@gmail.com
 
Posts: n/a
Default Re: Synthesis of Concurrent Statements for FIR Filter
This is the code I finally settled on:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity code_filter is
port (
x0: in std_logic;
y: out std_logic_vector(11 downto 0);
clk: in std_logic
);
end code_filter;

architecture code_filter_arch of code_filter is
type coef_type is array(0 to 15) of integer range -511 to 511;
constant h: coef_type :=
(4,-2,-28,-53,-17,128,345,511,511,345,128,-17,-53,-28,-2,4);
signal mult: coef_type;
signal x: std_logic_vector(15 downto 0);
signal sum0_1, sum2_3, sum4_5, sum6_7: integer range -1023 to 1023;
signal sum8_9, sum10_11, sum12_13, sum14_15: integer range -1023 to
1023;
signal sum0_3, sum4_7, sum8_11, sum12_15: integer range -2047 to
2047;
signal sum0_7, sum8_15: integer range -4095 to 4095;
signal sum_total: integer range -8191 to 8191;
begin
process (clk)
begin
if rising_edge(clk) then
x(x'high downto 1) <= x((x'high-1) downto 0);
x(0) <= x0;
end if;
end process;

one_bit_multiply: for i in x'range generate
mult(i) <= h(i) when x(i)='1' else -h(i);
end generate;

sum0_1 <= mult(0) + mult(1);
sum2_3 <= mult(2) + mult(3);
sum4_5 <= mult(4) + mult(5);
sum6_7 <= mult(6) + mult(7);
sum8_9 <= mult( + mult(9);
sum10_11 <= mult(10) + mult(11);
sum12_13 <= mult(12) + mult(13);
sum14_15 <= mult(14) + mult(15);
sum0_3 <= sum0_1 + sum2_3;
sum4_7 <= sum4_5 + sum6_7;
sum8_11 <= sum8_9 + sum10_11;
sum12_15 <= sum12_13 + sum14_15;
sum0_7 <= sum0_3 + sum4_7;
sum8_15 <= sum8_11 + sum12_15;
sum_total <= sum0_7 + sum8_15;
y <= std_logic_vector(to_signed(sum_total,12));
end code_filter_arch;

The entire filter is now contained in this code, including the shift
registers (which used to be in another file). It has been simulated
and it works great. The major difference between this version and what
I had before is the processing of the add. The lesson I learned here
is that VHDL produces a result that closely matches the code, unlike C
which will perform aggressive optimizations. My previous version
resulted in 15 adders in one long chain (exactly as the code was
written) whereas the current version resulted in 15 adders in a
hierarchical structure (again exactly as it is written). This resulted
in a reduction of the propagation delay by a factor of log2(16).

Anyway it's good to know my initial design actually did work, even
though it wasn't optimal. After your first scathing response I felt
like I should turn in my degree and restart a career in some liberal
arts field. But your reply is greatly appreciated as I understand what
is going on much better.

Brian


heilig.brian@gmail.com
  Reply With Quote
Old 03-27-2009, 05:04 PM   #8
JohnDuq
Member
 
Join Date: Dec 2008
Posts: 85
Default
Brian, thanks for closing out with your working solution! It is nice to see the design process form beginning to end.

John


JohnDuq
JohnDuq is offline   Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help on auto conversion from Matlab to vhdl on filter design hardheart Hardware 0 12-07-2007 09:19 AM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46