Can we assume the input are all present at once(parallel)? Since there are
only 5 inputs(5x8=40bits), is it a reasonable assumption?
"walala" <(EMail Removed)> wrote in message
news:biqil7$kf7$(EMail Removed)...
> Hi David,
>
> Thanks for your answer!
>
> The requirement of output throughput is 3350MHz, i.e., it should output
33
> million to 50 million 12bits element per second,
>
> and each 5 inputs correspond to 10x10=100 such 12bits element outputs...
>
> The technology I am going to use is 0.25u.
>
> I think the inputs are naturally serial, but again, I am not sure how to
do
> the parallelserial partition of the internal MACs... and how to pace the
> outputs...
>
> Seems inputs are faster than the outputs, maybe I should let the input
wait
> after fed into the unit?
>
> Can you give some further advice on how to do this architecture? how to do
> the timing? I think it is really difficult...and point me to some
resources?
>
> Thanks very much,
>
> Walala
>
> "David Jones" <(EMail Removed)> wrote in message
> news:7N14b.5257$(EMail Removed).. .
> > In article <bipblj$53j$(EMail Removed)>, walala <(EMail Removed)>
> wrote:
> > >Dear all,
> > >
> > >I want to design an arithmatic datapath unit for digital signal
> processing
> > >using VHDL and/or Verilog.
> > >
> > >The input are 5 elements(either sequential or parallel) each having 8
> bits.
> > >It needs to multiply each of these 5 inputs with a predefined constant
> > >matrix(10x10, floating point scaled and round to integer). The output
> will
> > >be a 10x10 matrix summing the above five matrices up, each element
having
> 12
> > >bits). So for each element of the matrix, I can have a MAC unit. The
> > >internal computation will be 16 bits.
> > >
> > >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
> > >
> > >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are
matrices;
> >
> > What is your throughput requirement and what technology are you using?
> >
> > That will determine the amount of parallelism that you need.
> >
> > If the requirement is low enough, then only one MAC unit will be
required.
> >
> > Next, you must define the timing of the inputs. If they are serial,
then
> > it's easy: stuff the data into the MAC unit. Being pipelined (right?),
> > the MAC unit will output the answer N clocks later.
> >
> > If you have more parallelism in your input data than you want in your
> > MAC units, then you will need to buffer the data. This circuit will be
> > easy to design once you define the timing requirements.
>
>
