Go Back   Velocity Reviews > Newsgroups > VHDL
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

VHDL - Pipelining question

 
Thread Tools Search this Thread
Old 03-31-2005, 05:29 PM   #1
Default Pipelining question


Hi,
I am looking for some insight into how I can go about pipelining my
system.

The system is an image interpolator which contains a buffer (on-chip
dual-port RAM) and an interpolator block.
The buffer stores incoming data (real-time say at X MHz). The
interpolator requires 4 data elements from this buffer to produce 1
output.

To keep the systems real-time, I am running my system at X MHz and also
write to the buffer at this rate. But read the data out of the buffer
at 4X MHz so that at each clock cycle I have all the 4 data elements
that the intepolator unit needs. This has limited X to 50 MHz because
the internal block RAM max out at 200 MHz (well, according to Altera it
can go upto 287 MHz but I hit the wall at some point or another).

I was wondering if there is a way to pipeline the design so that I can
run the whole system at a single clock frequency but still not have a
huge backlog of data accumulation (since there is finite amount of
on-chip storage) or if there are examples of such in any books?

Thanks,
Divyang M



Divyang M
  Reply With Quote
Old 03-31-2005, 05:54 PM   #2
Kai Harrekilde-Petersen
 
Posts: n/a
Default Re: Pipelining question
"Divyang M" <> writes:

> Hi,
> I am looking for some insight into how I can go about pipelining my
> system.
>
> The system is an image interpolator which contains a buffer (on-chip
> dual-port RAM) and an interpolator block.
> The buffer stores incoming data (real-time say at X MHz). The
> interpolator requires 4 data elements from this buffer to produce 1
> output.
>
> To keep the systems real-time, I am running my system at X MHz and also
> write to the buffer at this rate. But read the data out of the buffer
> at 4X MHz so that at each clock cycle I have all the 4 data elements
> that the intepolator unit needs. This has limited X to 50 MHz because
> the internal block RAM max out at 200 MHz (well, according to Altera it
> can go upto 287 MHz but I hit the wall at some point or another).
>
> I was wondering if there is a way to pipeline the design so that I can
> run the whole system at a single clock frequency but still not have a
> huge backlog of data accumulation (since there is finite amount of
> on-chip storage) or if there are examples of such in any books?


If you have the storage available, write to four RAMs in parallel and
read 1 element from each RAM per clock cycle.
Other combinations are also possible.

Regards,


Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>


Kai Harrekilde-Petersen
  Reply With Quote
Old 03-31-2005, 07:28 PM   #3
Divyang M
 
Posts: n/a
Default Re: Pipelining question
Hello Kai,

I do not have that much storage space to store the data 4 times since
my data is a 240x320 gray level (8-bit/pixel) image.

The other option I was thinking of is to delay the first of the four
output by 4 cycles (registers), the second by 3 cycles, the third
output by 2 cycles, and the fourth output by 1 cycle. These four
points will then be aligned, but if I use this strategy, then I get a
valid output out of my system once every 4 cycles, so a throughput of
0.25 (if I'm using the definition of throughput correctly). But I would
ideally like the throughput to be 1.

Any others suggestions you have would be welcome.

Thanks,
Divyang M



Divyang M
  Reply With Quote
Old 03-31-2005, 10:30 PM   #4
info_
 
Posts: n/a
Default Re: Pipelining question
Are the four element consecutive ?
Then it's a simple pipelined FIR, and it runs at full speed,
producing data at X Ms/s with a X MHz clock.

This become a little bit more interesting when you come to 2D
correlation with 8 "adjacent" pixels spread over 3 lines.

Bert Cuzeau

Divyang M wrote:
> Hello Kai,
>
> I do not have that much storage space to store the data 4 times since
> my data is a 240x320 gray level (8-bit/pixel) image.
>
> The other option I was thinking of is to delay the first of the four
> output by 4 cycles (registers), the second by 3 cycles, the third
> output by 2 cycles, and the fourth output by 1 cycle. These four
> points will then be aligned, but if I use this strategy, then I get a
> valid output out of my system once every 4 cycles, so a throughput of
> 0.25 (if I'm using the definition of throughput correctly). But I would
> ideally like the throughput to be 1.
>
> Any others suggestions you have would be welcome.
>
> Thanks,
> Divyang M
>



info_
  Reply With Quote
Old 03-31-2005, 10:40 PM   #5
Divyang M
 
Posts: n/a
Default Re: Pipelining question
Hi Bert,

The four elements are not consecutive. I am actually working with an
image. So the four elements are essentially 2 elements each in 2 rows,
something like A11, A12, A21, A22.

But these can be any 4 "adjacent" elements in the image, so they do not
go by a particular order either.

Thanks,
Divyang



Divyang M
  Reply With Quote
Old 03-31-2005, 10:49 PM   #6
Ben Twijnstra
 
Posts: n/a
Default Re: Pipelining question
Hi Divyang,

Is there some scrambled addressing way that you can write the data so that
the words you need to read are always consecutive? That way you could set
your write port width to 8, and your read port width to 32.

If this is possible, your write bandwidth would be limited to the Tpd of the
logic that sets up the write address, but at least you will be able to
clock both ends at write speed.

Best regards,


Ben



Ben Twijnstra
  Reply With Quote
Old 03-31-2005, 11:05 PM   #7
info_
 
Posts: n/a
Default Re: Pipelining question
I think you just need to pipeline almost one line (N-2 depth), and pipeline it on two
FF stages at the entrance and the exit. So at a single clock cycle, you would have
your four elements available for combining, every clock cyle.

The big pipeline fits nicely in a dual port memory used as circular buffer,
or simply a ready-made Fifo.

Assuming the pixels per line is n+1 :

FF -> FF -> [[[[[ Fifo(n-2) ]]]]] -> FF -> FF
D2n D2n-1 D2n-2 ... D21 D20 D1n D1n-1

so you have D1n, D1n-1, D2n, D2n-1 available at the same time.
Just be careful at the edges...

What can we not do with pipelining ?

It's late here, I hope I didn't goof.

Bert

Divyang M wrote:

> Hi Bert,
>
> The four elements are not consecutive. I am actually working with an
> image. So the four elements are essentially 2 elements each in 2 rows,
> something like A11, A12, A21, A22.
>
> But these can be any 4 "adjacent" elements in the image, so they do not
> go by a particular order either.
>
> Thanks,
> Divyang
>



info_
  Reply With Quote
Old 03-31-2005, 11:26 PM   #8
Divyang M
 
Posts: n/a
Default Re: Pipelining question
Hi Bert,

That would work if I was doing a "forward mapping" (ie taking 4 know
input pixels to comupte an output pixel which can end up anywhere in
the image), but I am doing "inverse mapping" (ie I know the output
pixel I am computing, but the 4 input pixels can be located anywhere in
the input image).

Thanks for your time and help.

Divyang



Divyang M
  Reply With Quote
Old 03-31-2005, 11:28 PM   #9
Divyang M
 
Posts: n/a
Default Re: Pipelining question
Hi Ben,

That's what I am thinking now. There seems to be no straight-forward
solution to "inverse mapping" problems (due to the uncertainity of
which input pixels to access) even in any of the literatures. So I
might have to go this way.

Thanks,
Divyang M



Divyang M
  Reply With Quote
Old 03-31-2005, 11:29 PM   #10
Ben Twijnstra
 
Posts: n/a
Default Re: Pipelining question
Hi Divyang M,

> The four elements are not consecutive. I am actually working with an
> image. So the four elements are essentially 2 elements each in 2 rows,
> something like A11, A12, A21, A22.
>
> But these can be any 4 "adjacent" elements in the image, so they do not
> go by a particular order either.


I once wrote a linear interpolation algorithm for a Bayer-matrix camera
interface for one of my customers. I can't publish the code due to legal
reasons, but the idea is as follows:

There is no frame buffer. You use an FSM to capture video into one of four
line buffers when writing. Thus, three buffers are available for reading,
and the fourth is being updated.

Thus, assuming that your destination pixel (xd,yd) is a function of one or
two (x+d,y-1), one or two (x+d,y) and one or two (x+d, y+1) pixels (where d
is some arbitrary X distance) you can read pixels at write speed from the
three line buffers.

In my case, the worst case computation was
p(x-1,y-1)*C1+p(x+1,y-1)*C2+p(x-1,y+1)*C3+p(x+1,y+1)*C4, which would
normally cause 4 fetches from a frame buffer, 2*2 fetches using the
abovementioned technique, or simply 2*1 fetch if you store the x-1 and x
fetches in 8-bit 'cache' registers.

In this case I could simply work from left to right, so I used 6*8 DFFs to
hold

(x-1, y-1), (x, y-1)
(x-1, y) , (x, y )
(x-1, y+1), (x, y+1)

and compute my formula by setting the address on the line buffers to x+1.
The data outputs would then yield

(x+1, y-1)
(x+1, y )
(x+1, y+1)

giving me all the necessary date to compute RGB values for every pixel in
the Bayer pattern, or basically any 3x3 Fourier kernel on grayscale data.

The whole idea yields a 3-scanline delay between input and output, but this
way you can run at high clock speeds and be creative about the
interpolation method.

Best regards,


Ben



Ben Twijnstra
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Dial-up Modem Question w_tom A+ Certification 0 09-18-2005 09:12 PM
"Installing two drives" question - what next? Jim A+ Certification 12 08-07-2005 01:19 PM
Re: Good morning or good evening depending upon your location. I want to ask you the most important question of your life. Your joy or sorrow for all eternity depends upon your answer. The question is: Are you saved? It is not a question of how good God DVD Video 3 04-25-2005 04:19 PM
Re: Good morning or good evening depending upon your location. I want to ask you the most important question of your life. Your joy or sorrow for all eternity depends upon your answer. The question is: Are you saved? It is not a question of how good Filthy Mcnasty DVD Video 0 04-25-2005 04:29 AM
Re: Safe Mode Question (A+ question) Gordon Findlay A+ Certification 0 06-16-2004 10:48 AM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46