Velocity Reviews > VHDL > how to design this datapath unit for DSP using VHDL/Verilog?

# how to design this datapath unit for DSP using VHDL/Verilog?

walala
Guest
Posts: n/a

 08-30-2003
Dear all,

I want to design an arithmatic datapath unit for digital signal processing
using VHDL and/or Verilog.

The input are 5 elements(either sequential or parallel) each having 8 bits.
It needs to multiply each of these 5 inputs with a predefined constant
matrix(10x10, floating point scaled and round to integer). The output will
be a 10x10 matrix summing the above five matrices up, each element having 12
bits). So for each element of the matrix, I can have a MAC unit. The
internal computation will be 16 bits.

Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix

Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

If I put an MAC for each element, I will have a purely parallel
architecture, but I need 100 16bits MAC units, which will be too resource
consuming.

I am considering to make a parallel-serial architecture, at each time, it
outputs one row, which will be 10x12 bits... so the output will be
row-by-row.

I also need to consider to streamlize the datapath operation. Since there
will be a stream of 5 elements input in a non-stop fashion, the output will
also be non-stop streaming. So after one row is outputted, that row can be
used for computation/storage of the results for the next 5 input elements.

I am ok so far in thinking... but further thinking makes me confused and
perplexed... how to do sequential timing control(how to what to do at which
cycle)? do I need to pipelining? how to design the architecture? I mean, I
know pipelining theoratically from one semester course, but now I am going
to implement one, I am totally lost...

Finally, how to program this? Is there any examples for this?

Thanks a lot,

-Walala

David Jones
Guest
Posts: n/a

 08-30-2003
In article <bipblj\$53j\$(E-Mail Removed)>, walala <(E-Mail Removed)> wrote:
>Dear all,
>
>I want to design an arithmatic datapath unit for digital signal processing
>using VHDL and/or Verilog.
>
>The input are 5 elements(either sequential or parallel) each having 8 bits.
>It needs to multiply each of these 5 inputs with a predefined constant
>matrix(10x10, floating point scaled and round to integer). The output will
>be a 10x10 matrix summing the above five matrices up, each element having 12
>bits). So for each element of the matrix, I can have a MAC unit. The
>internal computation will be 16 bits.
>
>Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
>
>Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

What is your throughput requirement and what technology are you using?

That will determine the amount of parallelism that you need.

If the requirement is low enough, then only one MAC unit will be required.

Next, you must define the timing of the inputs. If they are serial, then
it's easy: stuff the data into the MAC unit. Being pipelined (right?),
the MAC unit will output the answer N clocks later.

If you have more parallelism in your input data than you want in your
MAC units, then you will need to buffer the data. This circuit will be
easy to design once you define the timing requirements.

walala
Guest
Posts: n/a

 08-30-2003
Hi David,

The requirement of output throughput is 33-50MHz, i.e., it should output 33
million to 50 million 12-bits element per second,

and each 5 inputs correspond to 10x10=100 such 12-bits element outputs...

The technology I am going to use is 0.25u.

I think the inputs are naturally serial, but again, I am not sure how to do
the parallel-serial partition of the internal MACs... and how to pace the
outputs...

Seems inputs are faster than the outputs, maybe I should let the input wait
after fed into the unit?

Can you give some further advice on how to do this architecture? how to do
the timing? I think it is really difficult...and point me to some resources?

Thanks very much,

-Walala

"David Jones" <(E-Mail Removed)> wrote in message
news:7N14b.5257\$(E-Mail Removed).. .
> In article <bipblj\$53j\$(E-Mail Removed)>, walala <(E-Mail Removed)>

wrote:
> >Dear all,
> >
> >I want to design an arithmatic datapath unit for digital signal

processing
> >using VHDL and/or Verilog.
> >
> >The input are 5 elements(either sequential or parallel) each having 8

bits.
> >It needs to multiply each of these 5 inputs with a predefined constant
> >matrix(10x10, floating point scaled and round to integer). The output

will
> >be a 10x10 matrix summing the above five matrices up, each element having

12
> >bits). So for each element of the matrix, I can have a MAC unit. The
> >internal computation will be 16 bits.
> >
> >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
> >
> >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

>
> What is your throughput requirement and what technology are you using?
>
> That will determine the amount of parallelism that you need.
>
> If the requirement is low enough, then only one MAC unit will be required.
>
> Next, you must define the timing of the inputs. If they are serial, then
> it's easy: stuff the data into the MAC unit. Being pipelined (right?),
> the MAC unit will output the answer N clocks later.
>
> If you have more parallelism in your input data than you want in your
> MAC units, then you will need to buffer the data. This circuit will be
> easy to design once you define the timing requirements.

walala
Guest
Posts: n/a

 08-30-2003
Can we assume the input are all present at once(parallel)? Since there are
only 5 inputs(5x8=40bits), is it a reasonable assumption?

"walala" <(E-Mail Removed)> wrote in message
news:biqil7\$kf7\$(E-Mail Removed)...
> Hi David,
>
>
> The requirement of output throughput is 33-50MHz, i.e., it should output

33
> million to 50 million 12-bits element per second,
>
> and each 5 inputs correspond to 10x10=100 such 12-bits element outputs...
>
> The technology I am going to use is 0.25u.
>
> I think the inputs are naturally serial, but again, I am not sure how to

do
> the parallel-serial partition of the internal MACs... and how to pace the
> outputs...
>
> Seems inputs are faster than the outputs, maybe I should let the input

wait
> after fed into the unit?
>
> Can you give some further advice on how to do this architecture? how to do
> the timing? I think it is really difficult...and point me to some

resources?
>
> Thanks very much,
>
> -Walala
>
> "David Jones" <(E-Mail Removed)> wrote in message
> news:7N14b.5257\$(E-Mail Removed).. .
> > In article <bipblj\$53j\$(E-Mail Removed)>, walala <(E-Mail Removed)>

> wrote:
> > >Dear all,
> > >
> > >I want to design an arithmatic datapath unit for digital signal

> processing
> > >using VHDL and/or Verilog.
> > >
> > >The input are 5 elements(either sequential or parallel) each having 8

> bits.
> > >It needs to multiply each of these 5 inputs with a predefined constant
> > >matrix(10x10, floating point scaled and round to integer). The output

> will
> > >be a 10x10 matrix summing the above five matrices up, each element

having
> 12
> > >bits). So for each element of the matrix, I can have a MAC unit. The
> > >internal computation will be 16 bits.
> > >
> > >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
> > >
> > >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are

matrices;
> >
> > What is your throughput requirement and what technology are you using?
> >
> > That will determine the amount of parallelism that you need.
> >
> > If the requirement is low enough, then only one MAC unit will be

required.
> >
> > Next, you must define the timing of the inputs. If they are serial,

then
> > it's easy: stuff the data into the MAC unit. Being pipelined (right?),
> > the MAC unit will output the answer N clocks later.
> >
> > If you have more parallelism in your input data than you want in your
> > MAC units, then you will need to buffer the data. This circuit will be
> > easy to design once you define the timing requirements.

>
>