![]() |
|
|
|
#1 |
|
Hi,
I am looking for some insight into how I can go about pipelining my system. The system is an image interpolator which contains a buffer (on-chip dual-port RAM) and an interpolator block. The buffer stores incoming data (real-time say at X MHz). The interpolator requires 4 data elements from this buffer to produce 1 output. To keep the systems real-time, I am running my system at X MHz and also write to the buffer at this rate. But read the data out of the buffer at 4X MHz so that at each clock cycle I have all the 4 data elements that the intepolator unit needs. This has limited X to 50 MHz because the internal block RAM max out at 200 MHz (well, according to Altera it can go upto 287 MHz but I hit the wall at some point or another). I was wondering if there is a way to pipeline the design so that I can run the whole system at a single clock frequency but still not have a huge backlog of data accumulation (since there is finite amount of on-chip storage) or if there are examples of such in any books? Thanks, Divyang M Divyang M |
|
|
|
|
#2 |
|
Posts: n/a
|
"Divyang M" <> writes:
> Hi, > I am looking for some insight into how I can go about pipelining my > system. > > The system is an image interpolator which contains a buffer (on-chip > dual-port RAM) and an interpolator block. > The buffer stores incoming data (real-time say at X MHz). The > interpolator requires 4 data elements from this buffer to produce 1 > output. > > To keep the systems real-time, I am running my system at X MHz and also > write to the buffer at this rate. But read the data out of the buffer > at 4X MHz so that at each clock cycle I have all the 4 data elements > that the intepolator unit needs. This has limited X to 50 MHz because > the internal block RAM max out at 200 MHz (well, according to Altera it > can go upto 287 MHz but I hit the wall at some point or another). > > I was wondering if there is a way to pipeline the design so that I can > run the whole system at a single clock frequency but still not have a > huge backlog of data accumulation (since there is finite amount of > on-chip storage) or if there are examples of such in any books? If you have the storage available, write to four RAMs in parallel and read 1 element from each RAM per clock cycle. Other combinations are also possible. Regards, Kai -- Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk> Kai Harrekilde-Petersen |
|
|
|
#3 |
|
Posts: n/a
|
Hello Kai,
I do not have that much storage space to store the data 4 times since my data is a 240x320 gray level (8-bit/pixel) image. The other option I was thinking of is to delay the first of the four output by 4 cycles (registers), the second by 3 cycles, the third output by 2 cycles, and the fourth output by 1 cycle. These four points will then be aligned, but if I use this strategy, then I get a valid output out of my system once every 4 cycles, so a throughput of 0.25 (if I'm using the definition of throughput correctly). But I would ideally like the throughput to be 1. Any others suggestions you have would be welcome. Thanks, Divyang M Divyang M |
|
|
|
#4 |
|
Posts: n/a
|
Are the four element consecutive ?
Then it's a simple pipelined FIR, and it runs at full speed, producing data at X Ms/s with a X MHz clock. This become a little bit more interesting when you come to 2D correlation Bert Cuzeau Divyang M wrote: > Hello Kai, > > I do not have that much storage space to store the data 4 times since > my data is a 240x320 gray level (8-bit/pixel) image. > > The other option I was thinking of is to delay the first of the four > output by 4 cycles (registers), the second by 3 cycles, the third > output by 2 cycles, and the fourth output by 1 cycle. These four > points will then be aligned, but if I use this strategy, then I get a > valid output out of my system once every 4 cycles, so a throughput of > 0.25 (if I'm using the definition of throughput correctly). But I would > ideally like the throughput to be 1. > > Any others suggestions you have would be welcome. > > Thanks, > Divyang M > info_ |
|
|
|
#5 |
|
Posts: n/a
|
Hi Bert,
The four elements are not consecutive. I am actually working with an image. So the four elements are essentially 2 elements each in 2 rows, something like A11, A12, A21, A22. But these can be any 4 "adjacent" elements in the image, so they do not go by a particular order either. Thanks, Divyang Divyang M |
|
|
|
#6 |
|
Posts: n/a
|
Hi Divyang,
Is there some scrambled addressing way that you can write the data so that the words you need to read are always consecutive? That way you could set your write port width to 8, and your read port width to 32. If this is possible, your write bandwidth would be limited to the Tpd of the logic that sets up the write address, but at least you will be able to clock both ends at write speed. Best regards, Ben Ben Twijnstra |
|
|
|
#7 |
|
Posts: n/a
|
I think you just need to pipeline almost one line (N-2 depth), and pipeline it on two
FF stages at the entrance and the exit. So at a single clock cycle, you would have your four elements available for combining, every clock cyle. The big pipeline fits nicely in a dual port memory used as circular buffer, or simply a ready-made Fifo. Assuming the pixels per line is n+1 : FF -> FF -> [[[[[ Fifo(n-2) ]]]]] -> FF -> FF D2n D2n-1 D2n-2 ... D21 D20 D1n D1n-1 so you have D1n, D1n-1, D2n, D2n-1 available at the same time. Just be careful at the edges... What can we not do with pipelining It's late here, I hope I didn't goof. Bert Divyang M wrote: > Hi Bert, > > The four elements are not consecutive. I am actually working with an > image. So the four elements are essentially 2 elements each in 2 rows, > something like A11, A12, A21, A22. > > But these can be any 4 "adjacent" elements in the image, so they do not > go by a particular order either. > > Thanks, > Divyang > info_ |
|
|
|
#8 |
|
Posts: n/a
|
Hi Bert,
That would work if I was doing a "forward mapping" (ie taking 4 know input pixels to comupte an output pixel which can end up anywhere in the image), but I am doing "inverse mapping" (ie I know the output pixel I am computing, but the 4 input pixels can be located anywhere in the input image). Thanks for your time and help. Divyang Divyang M |
|
|
|
#9 |
|
Posts: n/a
|
Hi Ben,
That's what I am thinking now. There seems to be no straight-forward solution to "inverse mapping" problems (due to the uncertainity of which input pixels to access) even in any of the literatures. So I might have to go this way. Thanks, Divyang M Divyang M |
|
|
|
#10 |
|
Posts: n/a
|
Hi Divyang M,
> The four elements are not consecutive. I am actually working with an > image. So the four elements are essentially 2 elements each in 2 rows, > something like A11, A12, A21, A22. > > But these can be any 4 "adjacent" elements in the image, so they do not > go by a particular order either. I once wrote a linear interpolation algorithm for a Bayer-matrix camera interface for one of my customers. I can't publish the code due to legal reasons, but the idea is as follows: There is no frame buffer. You use an FSM to capture video into one of four line buffers when writing. Thus, three buffers are available for reading, and the fourth is being updated. Thus, assuming that your destination pixel (xd,yd) is a function of one or two (x+d,y-1), one or two (x+d,y) and one or two (x+d, y+1) pixels (where d is some arbitrary X distance) you can read pixels at write speed from the three line buffers. In my case, the worst case computation was p(x-1,y-1)*C1+p(x+1,y-1)*C2+p(x-1,y+1)*C3+p(x+1,y+1)*C4, which would normally cause 4 fetches from a frame buffer, 2*2 fetches using the abovementioned technique, or simply 2*1 fetch if you store the x-1 and x fetches in 8-bit 'cache' registers. In this case I could simply work from left to right, so I used 6*8 DFFs to hold (x-1, y-1), (x, y-1) (x-1, y) , (x, y ) (x-1, y+1), (x, y+1) and compute my formula by setting the address on the line buffers to x+1. The data outputs would then yield (x+1, y-1) (x+1, y ) (x+1, y+1) giving me all the necessary date to compute RGB values for every pixel in the Bayer pattern, or basically any 3x3 Fourier kernel on grayscale data. The whole idea yields a 3-scanline delay between input and output, but this way you can run at high clock speeds and be creative about the interpolation method. Best regards, Ben Ben Twijnstra |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: Dial-up Modem Question | w_tom | A+ Certification | 0 | 09-18-2005 09:12 PM |
| "Installing two drives" question - what next? | Jim | A+ Certification | 12 | 08-07-2005 01:19 PM |
| Re: Good morning or good evening depending upon your location. I want to ask you the most important question of your life. Your joy or sorrow for all eternity depends upon your answer. The question is: Are you saved? It is not a question of how good | God | DVD Video | 3 | 04-25-2005 04:19 PM |
| Re: Good morning or good evening depending upon your location. I want to ask you the most important question of your life. Your joy or sorrow for all eternity depends upon your answer. The question is: Are you saved? It is not a question of how good | Filthy Mcnasty | DVD Video | 0 | 04-25-2005 04:29 AM |
| Re: Safe Mode Question (A+ question) | Gordon Findlay | A+ Certification | 0 | 06-16-2004 10:48 AM |