![]() |
|
|
|||||||
![]() |
VHDL - help on 2-d arry .vs. register file |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Again, some questions about here:
Inside my top-level design, I have a 32x32 8-bit data block flowing through several modules, some modules are in sequence, some in parallel. Inside each module, I need to process the data block as a 2-D array, like 4x4 block-based operations, etc. How could I pass the 32x32 data block very efficiently among those modules in terms of system speed and logical element utilization? Will it be possible and efficient for me to have a 2-D array defined in top-level design, and pass the 2-D array among those modules? If it is possible, how to do it? And will it consume too much resource? Or, I need to have a small piece of memeory or register file using lpm_ram, then let each module access the memory through the bus? Then how will I process the data in 2-D array inside each module? Do i need to buffer the data inside each module for array-wise operations? Then will it be slow and also consume extra resourse? Maybe I am in the wrong track. I am not quite familiar with VHDL. Still kind of C programmer. Please help me on it. Thank you a lot. systolic |
|
|
|
|
#2 |
|
Posts: n/a
|
systolic wrote:
> Inside my top-level design, I have a 32x32 8-bit data block flowing > through several modules, some modules are in sequence, some in parallel. > Inside each module, I need to process the data block as a 2-D array, > like 4x4 block-based operations, etc. Write your top level entity before you start slicing. I expect that there are no 1024 bit interfaces at the top. Maybe a dot clock and video data in and out? Next work out the top architecture signals Do you need to count out rows and columns? Are you processing everything live? Line buffers? Frame buffers? -- Mike Treseler Mike Treseler |
|
|
|
#3 |
|
Posts: n/a
|
Mike Treseler wrote: > systolic wrote: > >> Inside my top-level design, I have a 32x32 8-bit data block flowing >> through several modules, some modules are in sequence, some in >> parallel. Inside each module, I need to process the data block as a >> 2-D array, like 4x4 block-based operations, etc. > > > Write your top level entity before you start slicing. > I expect that there are no 1024 bit interfaces at the top. > Maybe a dot clock and video data in and out? > Next work out the top architecture signals > Do you need to count out rows and columns? > Are you processing everything live? > Line buffers? Frame buffers? > > -- Mike Treseler Mike, thank you for the reply. Yes, I assume there is a frame buffer, which feeds data into my top level design in a 32-bit interface (4 pixels in one time). Then I need to perform 32x32 block-based operations inside the top level design among several modules. Totally, I have 4 modules in three levels. The last one need to perform the block-based operations from 32x32 block all the way down to 4x4 blocks. I think I could pass everything among those modules on a 32-bit bus, then re-format data into a 32x32 block inside each module. But it would consume more memory and impact the system speed. I am expecting to have possibility to passing the 32x32 block through each modules. I am really not quite sure I could do that and how. Guess it is also not worth for such huge interface among those module if this is possible. I would like to have some suggestions or hints. Maybe I still have to go back to a 32-bit bus and reformat the 32x32 block inside modules. Is this the normal way to do it? No way to work around this? systolic |
|
|
|
#4 |
|
Posts: n/a
|
systolic wrote:
> Mike, thank you for the reply. > > Yes, I assume there is a frame buffer, which feeds data into my top > level design in a 32-bit interface (4 pixels in one time). Consider verifying this before you proceed. > Then I need to perform 32x32 block-based operations inside the top level > design among several modules. Totally, I have 4 modules in three levels. > The last one need to perform the block-based operations from 32x32 block > all the way down to 4x4 blocks. Are those bit blocks or pixel blocks? > I think I could pass everything among those modules on a 32-bit bus, > then re-format data into a 32x32 block inside each module. But it would > consume more memory and impact the system speed. What is the speed requirement? Do you have to keep up with each frame, or are you post-processing a single frame. If you are planning to put this in a fpga, a 1024 bit input bus in unrealistic. > I am expecting to have possibility to passing the 32x32 block through > each modules. I am really not quite sure I could do that and how. Guess > it is also not worth for such huge interface among those module if this > is possible. Once you have shifted in the data block, processing 1024 bits in parallel is possible. > Maybe I still have to go back to a 32-bit bus and reformat the 32x32 > block inside modules. Is this the normal way to do it? No way to work > around this? The limit is FPGA pins. They are three for a dollar. -- Mike Treseler Mike Treseler |
|
|
|
#5 |
|
Posts: n/a
|
systolic wrote:
> > Again, some questions about here: > > Inside my top-level design, I have a 32x32 8-bit data block flowing > through several modules, some modules are in sequence, some in parallel. > Inside each module, I need to process the data block as a 2-D array, > like 4x4 block-based operations, etc. > > How could I pass the 32x32 data block very efficiently among those > modules in terms of system speed and logical element utilization? > > Will it be possible and efficient for me to have a 2-D array defined in > top-level design, and pass the 2-D array among those modules? If it is > possible, how to do it? And will it consume too much resource? > > Or, I need to have a small piece of memeory or register file using > lpm_ram, then let each module access the memory through the bus? Then > how will I process the data in 2-D array inside each module? Do i need > to buffer the data inside each module for array-wise operations? Then > will it be slow and also consume extra resourse? > > Maybe I am in the wrong track. I am not quite familiar with VHDL. Still > kind of C programmer. I have read the replies to this post and I can see that you are still thinking in terms of C rather than hardware. VHDL stands for VHSIC Hardware Description Language. The key part is HARDWARE. VHDL is used for describing hardware, not algorithms. So instead of thinking of this as a program that will be turned into hardware by some magical process, think of it as a way to describe the hardware you want built. If you don't know how to design the hardware, it is unlikely that you will get hardware that will be at all efficient. VHDL uses modules also known as components. How you transfer the data between them does not appreciably matter since the signals are just wires and require very little time to transfer a signal. Wires also don't use much in the way of resources. The only exception is when you are receiving data serially and you want to process data serially. Then there is no need to transfer your data in parallel. So draw some block diagrams showing your processing and break it down to the level of registers. Label all the interfaces with the number of wires in each path. Then decide where you want the blocks grouped into modules and start "describing" your hardware. It will go a lot easier this way. -- Rick "rickman" Collins Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX rickman |
|
|
|
#6 |
|
Posts: n/a
|
Rickman, thx for your reply.
This design has been frustrating for a while. I broke down the entire design to several modules and thought about the interface among modules and between the compression FPGA unit and the frame-buffer unit. So as you said, it is ok to have 1024 wires among modules inside one FPGA. The way I need to manipulate the 32x32 pixle-block is performing some arithmetical operations based on the whole block, then some other operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different modules. It is a kind of quartree operation: splitting 32x32 pixle-block to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on. In this way, I hope to have the 32x32 pixel-block ready for each module when they need it and take advantage of the array index operations. So my concern is: 1. If I can pass a 32x32 pixle-block result among those modules in one time. (Looks the answer is NO) 2. If I can not pass 32x32 pixel-block in one time, which will be better for buffering 32x32 pixle-block inside each module .vs. having a register file in top level which updated after the operations in each module. 3. Or there are some other better ways? Or I am still in the wrong track. Ok, thank a lot for your time and replies. rickman wrote: > systolic wrote: > >>Again, some questions about here: >> >>Inside my top-level design, I have a 32x32 8-bit data block flowing >>through several modules, some modules are in sequence, some in parallel. >>Inside each module, I need to process the data block as a 2-D array, >>like 4x4 block-based operations, etc. >> >>How could I pass the 32x32 data block very efficiently among those >>modules in terms of system speed and logical element utilization? >> >>Will it be possible and efficient for me to have a 2-D array defined in >>top-level design, and pass the 2-D array among those modules? If it is >>possible, how to do it? And will it consume too much resource? >> >>Or, I need to have a small piece of memeory or register file using >>lpm_ram, then let each module access the memory through the bus? Then >>how will I process the data in 2-D array inside each module? Do i need >>to buffer the data inside each module for array-wise operations? Then >>will it be slow and also consume extra resourse? >> >>Maybe I am in the wrong track. I am not quite familiar with VHDL. Still >>kind of C programmer. > > > I have read the replies to this post and I can see that you are still > thinking in terms of C rather than hardware. VHDL stands for VHSIC > Hardware Description Language. The key part is HARDWARE. VHDL is used > for describing hardware, not algorithms. So instead of thinking of this > as a program that will be turned into hardware by some magical process, > think of it as a way to describe the hardware you want built. If you > don't know how to design the hardware, it is unlikely that you will get > hardware that will be at all efficient. > > VHDL uses modules also known as components. How you transfer the data > between them does not appreciably matter since the signals are just > wires and require very little time to transfer a signal. Wires also > don't use much in the way of resources. The only exception is when you > are receiving data serially and you want to process data serially. Then > there is no need to transfer your data in parallel. > > So draw some block diagrams showing your processing and break it down to > the level of registers. Label all the interfaces with the number of > wires in each path. Then decide where you want the blocks grouped into > modules and start "describing" your hardware. It will go a lot easier > this way. > > systolic |
|
|
|
#7 |
|
Posts: n/a
|
systolic wrote:
> > Rickman, thx for your reply. > > This design has been frustrating for a while. I broke down the entire > design to several modules and thought about the interface among modules > and between the compression FPGA unit and the frame-buffer unit. > > So as you said, it is ok to have 1024 wires among modules inside one FPGA. > > The way I need to manipulate the 32x32 pixle-block is performing some > arithmetical operations based on the whole block, then some other > operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or > from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different > modules. It is a kind of quartree operation: splitting 32x32 pixle-block > to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on. > > In this way, I hope to have the 32x32 pixel-block ready for each module > when they need it and take advantage of the array index operations. > > So my concern is: > 1. If I can pass a 32x32 pixle-block result among those modules in one > time. (Looks the answer is NO) > 2. If I can not pass 32x32 pixel-block in one time, which will be better > for buffering 32x32 pixle-block inside each module .vs. having a > register file in top level which updated after the operations in each > module. > 3. Or there are some other better ways? Or I am still in the wrong track. I didn't say that using a lot of wires is ok. Each wire needs a driver, so there is cost in the hardware. But if the data is being produced in parallel and you already have the drivers, there is no need to reduce the size of the interface. You seem to be focusing on how you will pass the data between blocks rather than how the blocks will work. If you are going to do all your math in parallel and *need* to have the data all at once, then you will need a wide interface. But if your data is being processed in chunks that are less than the size of the entire array, then the chunk size would be the best interface size. Think of hardware like an assembly line. If 12 items get stuffed into a box, they don't move 12 items along the assembly line in parallel. They get delivered one at a time so each one can then be put into the box. Or maybe three at a time can be put in the box, so they travel three wide, maybe. If it takes the same time to deliver three items, one at a time, as it does to put all three in the box, then they can still be delivered on a one wide belt. So do your modules need the data all at once? Or a few items at a time? Maybe you should leave the definition of the size of your interfaces until you know more about the design of the blocks? -- Rick "rickman" Collins Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX rickman |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| SONY DVD RW DW-G120A SOMETIMES FAILS...... | atlantic965 | DVD Video | 0 | 06-18-2006 10:36 PM |
| problems backing up dvds | Lawrence Traub | DVD Video | 11 | 09-27-2005 07:34 PM |
| Re: Ripping DVDs. Please answer the attached question. - Question.txt | Stan Brown | DVD Video | 19 | 02-09-2005 11:19 PM |
| Burn process failed - help! Log file posted for help troubleshooting | Michael Mason | DVD Video | 1 | 08-16-2004 09:24 PM |
| Pioneer A05 Problems | Bill Stock | DVD Video | 8 | 11-28-2003 05:03 AM |