Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > VHDL > video buffering scheme, nonsequential access (no spatial locality)

Reply
Thread Tools

video buffering scheme, nonsequential access (no spatial locality)

 
 
wallge
Guest
Posts: n/a
 
      01-24-2007
I am doing some embedded video processing, where I store an incoming
frame of video, then based on some calculations in another part of the
system, I warp that buffered frame of video. Now when the frame goes
into the buffer
(an off-FPGA SDRAM chip), it is simply written in one pixel at a time
in row major ordering.

The problem with this is that I will not be accessing it in this way. I
may want to do some arbitrary image rotation. This means
the first pixel I want to access is not the first one I put in the
buffer, It might actually be the last one in the buffer. If I am doing
full page reads, or even burst reads, I will get a bunch of pixels that
I will not need to determine the output pixel value. If i just do
single reads, this waists a bunch of clock cycles setting up the SDRAM,
telling it which row to activate and which column to read from. After
the read is done, you then have to issue the precharge command to close
the row. There is a high degree of inefficiency to this. It takes 5,
maybe 10 clock cycles just to retrieve one
pixel value.

Does anyone know a good way to organize a frame buffer to be more
friendly (and more optimal) to nonsequential access (like the kind we
might need if we wanted to warp the input image via some
linear/nonlinear transformation)?

 
Reply With Quote
 
 
 
 
Patrick Dubois
Guest
Posts: n/a
 
      01-25-2007
I have somewhat the same problem and I'm using ram that provides fast
random access, i.e. ZBT ram. You can get ZBT ram that runs at 200 MHz,
so that you can effectively process 100 Mpixels/s. ZBT ram is very
small compared to SDRAM, but if you only need to store a few frames,
that shouldn't be a problem.

Adding ZBT might not be an option on your system however... Maybe
someone can suggest a clever algorithm for your particular problem.


Patrick Dubois

On Jan 24, 2:36 pm, "wallge" <(E-Mail Removed)> wrote:
> I am doing some embedded video processing, where I store an incoming
> frame of video, then based on some calculations in another part of the
> system, I warp that buffered frame of video. Now when the frame goes
> into the buffer
> (an off-FPGA SDRAM chip), it is simply written in one pixel at a time
> in row major ordering.
>
> The problem with this is that I will not be accessing it in this way. I
> may want to do some arbitrary image rotation. This means
> the first pixel I want to access is not the first one I put in the
> buffer, It might actually be the last one in the buffer. If I am doing
> full page reads, or even burst reads, I will get a bunch of pixels that
> I will not need to determine the output pixel value. If i just do
> single reads, this waists a bunch of clock cycles setting up the SDRAM,
> telling it which row to activate and which column to read from. After
> the read is done, you then have to issue the precharge command to close
> the row. There is a high degree of inefficiency to this. It takes 5,
> maybe 10 clock cycles just to retrieve one
> pixel value.
>
> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access (like the kind we
> might need if we wanted to warp the input image via some
> linear/nonlinear transformation)?


 
Reply With Quote
 
 
 
 
Martin Thompson
Guest
Posts: n/a
 
      01-25-2007
"wallge" <(E-Mail Removed)> writes:

> I am doing some embedded video processing, where I store an incoming
> frame of video, then based on some calculations in another part of the
> system, I warp that buffered frame of video. Now when the frame goes
> into the buffer
> (an off-FPGA SDRAM chip), it is simply written in one pixel at a time
> in row major ordering.
>
> The problem with this is that I will not be accessing it in this way. I
> may want to do some arbitrary image rotation. This means
> the first pixel I want to access is not the first one I put in the
> buffer, It might actually be the last one in the buffer. If I am doing
> full page reads, or even burst reads, I will get a bunch of pixels that
> I will not need to determine the output pixel value. If i just do
> single reads, this waists a bunch of clock cycles setting up the SDRAM,
> telling it which row to activate and which column to read from. After
> the read is done, you then have to issue the precharge command to close
> the row. There is a high degree of inefficiency to this. It takes 5,
> maybe 10 clock cycles just to retrieve one
> pixel value.
>


If you are doing truly arbitrary warping, then is it not right that
you can never get an optimal organisation for all warps?

> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access (like the kind we
> might need if we wanted to warp the input image via some
> linear/nonlinear transformation)?
>


Could you do some kind of caching scheme where you read an entire DRAM
row in at a time, and "hope it comes in handy" later?

Failing that, can you use SSRAM for your frame buffer?

Or, can you parallelise your task so that it operates on (eg) 4 wildly
different areas of input data at a time, which means you can use the
banking mechanism of the DRAMs to hide the latency?

Those are my initial thoughts (whilst waiting for a very loooooong
simulation to run

Cheers,
Martin

--
http://www.velocityreviews.com/forums/(E-Mail Removed)
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html


 
Reply With Quote
 
Mike Treseler
Guest
Posts: n/a
 
      01-25-2007
wallge wrote:

> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access


Sounds like a RAM.
If it didn't fit in fpga block ram
I would use an external device.

-- Mike Treseler
 
Reply With Quote
 
wallge
Guest
Posts: n/a
 
      01-25-2007
I should have been more specific in my question.

I have to use a small (64 Mbit) mobile sdram. I can't choose
to use a different storage element in the system (other than *some*
FPGA buffering, though not full frame).

I have heard some discussion of the way in which graphic accelerator
boards do memory transactions, storing pixels in blocks of neighbor
pixels
(instead of being organized row major). In other words the spatial
locality
in the SDRAM buffer might look like:

Image pixels:
N2 N3 N4
N1 P N5
N8 N7 N6

Memory organization:
ADDR DATA
0x0000 P
0x0001 N1
0x0002 N2
0x0003 N3
0x0004 N4
0x0005 N5
0x0006 N6
0x0007 N7
0x0008 N8


Where P is the central pixel of interest, and the N's are its
neighbors.
We organize the pixels in the SDRAM buffer not by rows, but by regions
of interest.
This way if we are doing some kind of Image warp and we want to get
more bang for the buck
in terms of read latency, we are more likely to reuse pixels in the
neighborhood of the currently accessed pixel
than if we were arranged in a row or column major ordering (consider
the case were we wanted to rotate an image by 47.2 degrees from input
to output).

Has anyone seen something like this or know of any resources online
with regard to memory buffer organization schemes for graphics or image
processing?



On Jan 24, 2:36 pm, "wallge" <(E-Mail Removed)> wrote:
> I am doing some embedded video processing, where I store an incoming
> frame of video, then based on some calculations in another part of the
> system, I warp that buffered frame of video. Now when the frame goes
> into the buffer
> (an off-FPGA SDRAM chip), it is simply written in one pixel at a time
> in row major ordering.
>
> The problem with this is that I will not be accessing it in this way. I
> may want to do some arbitrary image rotation. This means
> the first pixel I want to access is not the first one I put in the
> buffer, It might actually be the last one in the buffer. If I am doing
> full page reads, or even burst reads, I will get a bunch of pixels that
> I will not need to determine the output pixel value. If i just do
> single reads, this waists a bunch of clock cycles setting up the SDRAM,
> telling it which row to activate and which column to read from. After
> the read is done, you then have to issue the precharge command to close
> the row. There is a high degree of inefficiency to this. It takes 5,
> maybe 10 clock cycles just to retrieve one
> pixel value.
>
> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access (like the kind we
> might need if we wanted to warp the input image via some
> linear/nonlinear transformation)?


 
Reply With Quote
 
Pete Fraser
Guest
Posts: n/a
 
      01-25-2007
"wallge" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...

>
> Image pixels:
> N2 N3 N4
> N1 P N5
> N8 N7 N6


Have you thought about what order of filtering you'll
need to use?


 
Reply With Quote
 
wallge
Guest
Posts: n/a
 
      01-25-2007
I am not doing any image filtering.
This is not a filtering operation.
It is an interpolation operation
typically bilinear or bicubic
to do image transformations.

On Jan 25, 1:00 pm, "Pete Fraser" <(E-Mail Removed)> wrote:
> "wallge" <(E-Mail Removed)> wrote in messagenews:(E-Mail Removed) ooglegroups.com...
>
>
>
> > Image pixels:
> > N2 N3 N4
> > N1 P N5
> > N8 N7 N6Have you thought about what order of filtering you'll

> need to use?


 
Reply With Quote
 
Pete Fraser
Guest
Posts: n/a
 
      01-25-2007
"wallge" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
>I am not doing any image filtering.


Yes you are.

> This is not a filtering operation.


Yes it is.

> It is an interpolation operation
> typically bilinear or bicubic
> to do image transformations.


And that's a filtering operation.
So the maximum kernel size is 4 x 4, though
you might use 2 x 2. The kernel size could have a substantail
bearing on the traffic to/from on-chip RAM.

I'm still not sure of your limitations on off-chip RAM.
You have a buffer on the input or output (or both?)
Do you have enough bandwidth to have an
intermediate buffer for a two-pass operation?


 
Reply With Quote
 
wallge
Guest
Posts: n/a
 
      01-25-2007
Can you write out the FIR filter coeffs for
a bilinear interpolation "filter kernel"?
How about a bicubic interpolator filter kernel
what are its filter coeffs?

arguing semantics was not the purpose of my post.

I will probably wind up doing bilinear interpolation or
"filtering". Which means I need 4 pixels of the input frame to
determine
1 pixel of output warped frame.

By the way what is the Freq response of the bilinear interpolation
"filter"?



On Jan 25, 5:16 pm, "Pete Fraser" <(E-Mail Removed)> wrote:
> "wallge" <(E-Mail Removed)> wrote in messagenews:(E-Mail Removed) oglegroups.com...
>
> >I am not doing any image filtering.Yes you are.

>
> > This is not a filtering operation.Yes it is.

>
> > It is an interpolation operation
> > typically bilinear or bicubic
> > to do image transformations.And that's a filtering operation.

> So the maximum kernel size is 4 x 4, though
> you might use 2 x 2. The kernel size could have a substantail
> bearing on the traffic to/from on-chip RAM.
>
> I'm still not sure of your limitations on off-chip RAM.
> You have a buffer on the input or output (or both?)
> Do you have enough bandwidth to have an
> intermediate buffer for a two-pass operation?


 
Reply With Quote
 
Pete Fraser
Guest
Posts: n/a
 
      01-26-2007

"wallge" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...
> Can you write out the FIR filter coeffs for
> a bilinear interpolation "filter kernel"?
> How about a bicubic interpolator filter kernel
> what are its filter coeffs?


I'm happy to, but we're getting away from FPGA stuff,
so let's do that off line. Let me know how many phases you
need, and the coefficient format you'd like. I usually
use a minor 4x4 variation on cubic, but it's all set up in
Mathematica, so I could do cubic also.

>
> arguing semantics was not the purpose of my post.
>
> I will probably wind up doing bilinear interpolation or
> "filtering". Which means I need 4 pixels of the input frame to
> determine
> 1 pixel of output warped frame.


So you don't really need coefficient tables for this.
You can just use the fractional phase directly.

>
> By the way what is the Freq response of the bilinear interpolation
> "filter"?


It depends on the position of output relative to input pixel, but
for a central output pixel the frequency response would be
Cosusoidal.

Getting back to FPGA stuff though, what are your off-chip
RAM bandwidth limitations, and could you consider a two-pass approach?

>> I'm still not sure of your limitations on off-chip RAM.
>> You have a buffer on the input or output (or both?)
>> Do you have enough bandwidth to have an
>> intermediate buffer for a two-pass operation?

>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Efficient implementation of spatial occupancy grid Marco Körner C++ 3 05-30-2008 07:07 AM
CreateRecordSet("nonsequential") ash ASP General 0 09-29-2006 04:16 AM
Re: is there any Python code for spatial tessellation? Shi Mu Python 1 10-13-2005 09:06 AM
library for querying virtual polygon over raster spatial data paul C++ 1 10-01-2005 08:44 PM
convert to grey: 4 x the spatial resolution? digiboy@mailinator.com Digital Photography 1 02-19-2005 07:22 PM



Advertisments