Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > VHDL > help on 2-d arry .vs. register file

Reply
Thread Tools

help on 2-d arry .vs. register file

 
 
systolic
Guest
Posts: n/a
 
      10-23-2004
Again, some questions about here:

Inside my top-level design, I have a 32x32 8-bit data block flowing
through several modules, some modules are in sequence, some in parallel.
Inside each module, I need to process the data block as a 2-D array,
like 4x4 block-based operations, etc.

How could I pass the 32x32 data block very efficiently among those
modules in terms of system speed and logical element utilization?

Will it be possible and efficient for me to have a 2-D array defined in
top-level design, and pass the 2-D array among those modules? If it is
possible, how to do it? And will it consume too much resource?

Or, I need to have a small piece of memeory or register file using
lpm_ram, then let each module access the memory through the bus? Then
how will I process the data in 2-D array inside each module? Do i need
to buffer the data inside each module for array-wise operations? Then
will it be slow and also consume extra resourse?

Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
kind of C programmer.

Please help me on it. Thank you a lot.

 
Reply With Quote
 
 
 
 
Mike Treseler
Guest
Posts: n/a
 
      10-23-2004
systolic wrote:

> Inside my top-level design, I have a 32x32 8-bit data block flowing
> through several modules, some modules are in sequence, some in parallel.
> Inside each module, I need to process the data block as a 2-D array,
> like 4x4 block-based operations, etc.


Write your top level entity before you start slicing.
I expect that there are no 1024 bit interfaces at the top.
Maybe a dot clock and video data in and out?
Next work out the top architecture signals
Do you need to count out rows and columns?
Are you processing everything live?
Line buffers? Frame buffers?

-- Mike Treseler
 
Reply With Quote
 
 
 
 
systolic
Guest
Posts: n/a
 
      10-23-2004


Mike Treseler wrote:

> systolic wrote:
>
>> Inside my top-level design, I have a 32x32 8-bit data block flowing
>> through several modules, some modules are in sequence, some in
>> parallel. Inside each module, I need to process the data block as a
>> 2-D array, like 4x4 block-based operations, etc.

>
>
> Write your top level entity before you start slicing.
> I expect that there are no 1024 bit interfaces at the top.
> Maybe a dot clock and video data in and out?
> Next work out the top architecture signals
> Do you need to count out rows and columns?
> Are you processing everything live?
> Line buffers? Frame buffers?
>
> -- Mike Treseler


Mike, thank you for the reply.

Yes, I assume there is a frame buffer, which feeds data into my top
level design in a 32-bit interface (4 pixels in one time).
Then I need to perform 32x32 block-based operations inside the top level
design among several modules. Totally, I have 4 modules in three levels.
The last one need to perform the block-based operations from 32x32 block
all the way down to 4x4 blocks.

I think I could pass everything among those modules on a 32-bit bus,
then re-format data into a 32x32 block inside each module. But it would
consume more memory and impact the system speed.

I am expecting to have possibility to passing the 32x32 block through
each modules. I am really not quite sure I could do that and how. Guess
it is also not worth for such huge interface among those module if this
is possible.

I would like to have some suggestions or hints.

Maybe I still have to go back to a 32-bit bus and reformat the 32x32
block inside modules. Is this the normal way to do it? No way to work
around this?

 
Reply With Quote
 
Mike Treseler
Guest
Posts: n/a
 
      10-23-2004
systolic wrote:

> Mike, thank you for the reply.
>
> Yes, I assume there is a frame buffer, which feeds data into my top
> level design in a 32-bit interface (4 pixels in one time).


Consider verifying this before you proceed.

> Then I need to perform 32x32 block-based operations inside the top level
> design among several modules. Totally, I have 4 modules in three levels.
> The last one need to perform the block-based operations from 32x32 block
> all the way down to 4x4 blocks.


Are those bit blocks or pixel blocks?

> I think I could pass everything among those modules on a 32-bit bus,
> then re-format data into a 32x32 block inside each module. But it would
> consume more memory and impact the system speed.


What is the speed requirement?
Do you have to keep up with each frame,
or are you post-processing a single frame.
If you are planning to put this in a fpga,
a 1024 bit input bus in unrealistic.

> I am expecting to have possibility to passing the 32x32 block through
> each modules. I am really not quite sure I could do that and how. Guess
> it is also not worth for such huge interface among those module if this
> is possible.


Once you have shifted in the data block, processing 1024 bits in
parallel is possible.

> Maybe I still have to go back to a 32-bit bus and reformat the 32x32
> block inside modules. Is this the normal way to do it? No way to work
> around this?


The limit is FPGA pins. They are three for a dollar.

-- Mike Treseler
 
Reply With Quote
 
rickman
Guest
Posts: n/a
 
      10-24-2004
systolic wrote:
>
> Again, some questions about here:
>
> Inside my top-level design, I have a 32x32 8-bit data block flowing
> through several modules, some modules are in sequence, some in parallel.
> Inside each module, I need to process the data block as a 2-D array,
> like 4x4 block-based operations, etc.
>
> How could I pass the 32x32 data block very efficiently among those
> modules in terms of system speed and logical element utilization?
>
> Will it be possible and efficient for me to have a 2-D array defined in
> top-level design, and pass the 2-D array among those modules? If it is
> possible, how to do it? And will it consume too much resource?
>
> Or, I need to have a small piece of memeory or register file using
> lpm_ram, then let each module access the memory through the bus? Then
> how will I process the data in 2-D array inside each module? Do i need
> to buffer the data inside each module for array-wise operations? Then
> will it be slow and also consume extra resourse?
>
> Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
> kind of C programmer.


I have read the replies to this post and I can see that you are still
thinking in terms of C rather than hardware. VHDL stands for VHSIC
Hardware Description Language. The key part is HARDWARE. VHDL is used
for describing hardware, not algorithms. So instead of thinking of this
as a program that will be turned into hardware by some magical process,
think of it as a way to describe the hardware you want built. If you
don't know how to design the hardware, it is unlikely that you will get
hardware that will be at all efficient.

VHDL uses modules also known as components. How you transfer the data
between them does not appreciably matter since the signals are just
wires and require very little time to transfer a signal. Wires also
don't use much in the way of resources. The only exception is when you
are receiving data serially and you want to process data serially. Then
there is no need to transfer your data in parallel.

So draw some block diagrams showing your processing and break it down to
the level of registers. Label all the interfaces with the number of
wires in each path. Then decide where you want the blocks grouped into
modules and start "describing" your hardware. It will go a lot easier
this way.


--

Rick "rickman" Collins

http://www.velocityreviews.com/forums/(E-Mail Removed)
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
Reply With Quote
 
systolic
Guest
Posts: n/a
 
      10-24-2004
Rickman, thx for your reply.

This design has been frustrating for a while. I broke down the entire
design to several modules and thought about the interface among modules
and between the compression FPGA unit and the frame-buffer unit.

So as you said, it is ok to have 1024 wires among modules inside one FPGA.

The way I need to manipulate the 32x32 pixle-block is performing some
arithmetical operations based on the whole block, then some other
operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or
from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different
modules. It is a kind of quartree operation: splitting 32x32 pixle-block
to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on.

In this way, I hope to have the 32x32 pixel-block ready for each module
when they need it and take advantage of the array index operations.

So my concern is:
1. If I can pass a 32x32 pixle-block result among those modules in one
time. (Looks the answer is NO)
2. If I can not pass 32x32 pixel-block in one time, which will be better
for buffering 32x32 pixle-block inside each module .vs. having a
register file in top level which updated after the operations in each
module.
3. Or there are some other better ways? Or I am still in the wrong track.


Ok, thank a lot for your time and replies.


rickman wrote:

> systolic wrote:
>
>>Again, some questions about here:
>>
>>Inside my top-level design, I have a 32x32 8-bit data block flowing
>>through several modules, some modules are in sequence, some in parallel.
>>Inside each module, I need to process the data block as a 2-D array,
>>like 4x4 block-based operations, etc.
>>
>>How could I pass the 32x32 data block very efficiently among those
>>modules in terms of system speed and logical element utilization?
>>
>>Will it be possible and efficient for me to have a 2-D array defined in
>>top-level design, and pass the 2-D array among those modules? If it is
>>possible, how to do it? And will it consume too much resource?
>>
>>Or, I need to have a small piece of memeory or register file using
>>lpm_ram, then let each module access the memory through the bus? Then
>>how will I process the data in 2-D array inside each module? Do i need
>>to buffer the data inside each module for array-wise operations? Then
>>will it be slow and also consume extra resourse?
>>
>>Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
>>kind of C programmer.

>
>
> I have read the replies to this post and I can see that you are still
> thinking in terms of C rather than hardware. VHDL stands for VHSIC
> Hardware Description Language. The key part is HARDWARE. VHDL is used
> for describing hardware, not algorithms. So instead of thinking of this
> as a program that will be turned into hardware by some magical process,
> think of it as a way to describe the hardware you want built. If you
> don't know how to design the hardware, it is unlikely that you will get
> hardware that will be at all efficient.
>
> VHDL uses modules also known as components. How you transfer the data
> between them does not appreciably matter since the signals are just
> wires and require very little time to transfer a signal. Wires also
> don't use much in the way of resources. The only exception is when you
> are receiving data serially and you want to process data serially. Then
> there is no need to transfer your data in parallel.
>
> So draw some block diagrams showing your processing and break it down to
> the level of registers. Label all the interfaces with the number of
> wires in each path. Then decide where you want the blocks grouped into
> modules and start "describing" your hardware. It will go a lot easier
> this way.
>
>


 
Reply With Quote
 
rickman
Guest
Posts: n/a
 
      10-24-2004
systolic wrote:
>
> Rickman, thx for your reply.
>
> This design has been frustrating for a while. I broke down the entire
> design to several modules and thought about the interface among modules
> and between the compression FPGA unit and the frame-buffer unit.
>
> So as you said, it is ok to have 1024 wires among modules inside one FPGA.
>
> The way I need to manipulate the 32x32 pixle-block is performing some
> arithmetical operations based on the whole block, then some other
> operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or
> from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different
> modules. It is a kind of quartree operation: splitting 32x32 pixle-block
> to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on.
>
> In this way, I hope to have the 32x32 pixel-block ready for each module
> when they need it and take advantage of the array index operations.
>
> So my concern is:
> 1. If I can pass a 32x32 pixle-block result among those modules in one
> time. (Looks the answer is NO)
> 2. If I can not pass 32x32 pixel-block in one time, which will be better
> for buffering 32x32 pixle-block inside each module .vs. having a
> register file in top level which updated after the operations in each
> module.
> 3. Or there are some other better ways? Or I am still in the wrong track.


I didn't say that using a lot of wires is ok. Each wire needs a driver,
so there is cost in the hardware. But if the data is being produced in
parallel and you already have the drivers, there is no need to reduce
the size of the interface.

You seem to be focusing on how you will pass the data between blocks
rather than how the blocks will work. If you are going to do all your
math in parallel and *need* to have the data all at once, then you will
need a wide interface. But if your data is being processed in chunks
that are less than the size of the entire array, then the chunk size
would be the best interface size.

Think of hardware like an assembly line. If 12 items get stuffed into a
box, they don't move 12 items along the assembly line in parallel. They
get delivered one at a time so each one can then be put into the box.
Or maybe three at a time can be put in the box, so they travel three
wide, maybe. If it takes the same time to deliver three items, one at a
time, as it does to put all three in the box, then they can still be
delivered on a one wide belt.

So do your modules need the data all at once? Or a few items at a
time? Maybe you should leave the definition of the size of your
interfaces until you know more about the design of the blocks?

--

Rick "rickman" Collins

(E-Mail Removed)
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Implementation of Register File VHDL Model New User ^_^ VHDL 3 08-02-2009 09:54 PM
Remove arry entry from multidimensional array David Javascript 1 11-12-2006 11:55 PM
VHDL register file synthesis stoyan.shopov@gmail.com VHDL 3 03-16-2005 07:06 AM
Is there a way to implement a true 5 r 3 w register file in altera's stratix fpga chip pandora VHDL 0 04-14-2004 03:16 AM
Filter arry Freddy Drogt Perl 0 01-29-2004 08:58 AM



Advertisments