Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > VHDL > Describing pipelined hardware

Reply
Thread Tools

Describing pipelined hardware

 
 
Jonathan Bromley
Guest
Posts: n/a
 
      06-06-2006
Not a specific question, not a request for help, just an
invitation to share ideas about something that I've always
found tricky - and I suspect I'm not alone.

Using HDLs you can elegantly describe quite complicated logic in
a clocked process - we've had several discussions about that
here, and we know there are many popular styles.

Mostly, though, we need to describe things that are pipelined.
Sometimes that pipelining is from choice, sometimes it's
forced upon us by the behaviour of things outside our
control (such as pipelined synchronous RAMs in an FPGA).

As soon as you have a pipelined design, it's rather easy to
describe the behaviour of each pipeline stage as an HDL
clocked process (or, indeed, as part of a process that
describes multiple stages) but as soon as that happens
you tend to lose sight of the overall algorithm that's
being implemented. Sometimes the design nicely
suits a description in which each pipeline stage stands
alone, but if there is any feedback from later pipeline
stages to earlier ones then it's usually much harder
to see what's going on.

So, here's my question: When writing pipelined designs,
what do all you experts out there do to make the overall
data and control flow as clear and obvious as possible?

Thanks in advance
 
Reply With Quote
 
 
 
 
Mike Treseler
Guest
Posts: n/a
 
      06-06-2006
Jonathan Bromley wrote:

> Mostly, though, we need to describe things that are pipelined.
> Sometimes that pipelining is from choice, sometimes it's
> forced upon us by the behaviour of things outside our
> control (such as pipelined synchronous RAMs in an FPGA).


It can also be forced by the design requirements.
I can't shift in a serial packet
in one rx_clk for example.

It can also be forced by timing requirements.
If the system clock is 100Mhz,
that's 10ns a tick, without exception.

There is top level pipelining
from module instances
and internal pipelining using
cases of variable/register values
inside the process/block.

For example, a serial interface
stats counter might have single
process/block instances like this:

-[serial/sync/hdlc]-[octet2packetbus]-[statsCounters]-[cpu bus]-

A synchronous process/block always provides
at least one level of pipeline on the output.

Internal state or counter variables/registers
can add more latency as needed A)by design
or B)to meet timing. With recent devices
I have found few requirements for type B
pipelining, but this is very dependent on
the design requirements.

For example, if I have access to serial
data and clock, a crc check is straightforward.
However if I must process a word per tick,
I have no choice but to use a FOR loop
to process multiple bits per clock.

> As soon as you have a pipelined design, it's rather easy to
> describe the behaviour of each pipeline stage as an HDL
> clocked process (or, indeed, as part of a process that
> describes multiple stages) but as soon as that happens
> you tend to lose sight of the overall algorithm that's
> being implemented. Sometimes the design nicely
> suits a description in which each pipeline stage stands
> alone, but if there is any feedback from later pipeline
> stages to earlier ones then it's usually much harder
> to see what's going on.


I keep any such feedback inside the same process/block
even if this means a variable/register array declaration.

> So, here's my question: When writing pipelined designs,
> what do all you experts out there do to make the overall
> data and control flow as clear and obvious as possible?


Good question.

The short answer is,
by using synchronous blocks and single
cycle control strobes at the module interfaces.
It's much simpler to design modules
to respond to a strobe (and maybe handshake it)
than it is to make some poor module
responsible for all cases of the full system timing.

The text books all say that
separating the data path is essential,
but I have never found any evidence
to support this assertion.
I like to let it all flow through
the same stream.


-- Mike Treseler
 
Reply With Quote
 
 
 
 
Kai Harrekilde-Petersen
Guest
Posts: n/a
 
      06-06-2006
Mike Treseler <> writes:

> The text books all say that
> separating the data path is essential,
> but I have never found any evidence
> to support this assertion.
> I like to let it all flow through
> the same stream.


I have found that separating the datapath can tremendously help DC to
optimize the logic on the datapath - especially if you need to do
several almost identical operations on the datapath, depending on the
state.

In these and similar other cases I have found that creating a set of
flags in the control path, and then using the flags in the datapath to
determine how to manipulate the data, yields to superior synthesis
results.


Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>
 
Reply With Quote
 
Andy
Guest
Posts: n/a
 
      06-06-2006
I think that may be more of a limitation of DC than anything else. At
least for FPGA synthesis, Synplicity does not seem to mind combining
control and dataflow logic. I quit using DC (or FC2) a long time ago
because Synplicity was soooo much better, both in vhdl language
support, and in QOR. Judging from their simulator, which I still have
to use from time to time, synopsys still crashes on '93 standard
features that others gobble up with no problem, or at least they give
you an error report you can chase.

Andy


Kai Harrekilde-Petersen wrote:
> Mike Treseler <> writes:
>
> > The text books all say that
> > separating the data path is essential,
> > but I have never found any evidence
> > to support this assertion.
> > I like to let it all flow through
> > the same stream.

>
> I have found that separating the datapath can tremendously help DC to
> optimize the logic on the datapath - especially if you need to do
> several almost identical operations on the datapath, depending on the
> state.
>
> In these and similar other cases I have found that creating a set of
> flags in the control path, and then using the flags in the datapath to
> determine how to manipulate the data, yields to superior synthesis
> results.
>
>
> Kai
> --
> Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>


 
Reply With Quote
 
Andy
Guest
Posts: n/a
 
      06-06-2006
In clocked vhdl processes, every assignment from one _signal_ to
another is a clock cycle (a register or pipeline stage). This is
completely different from how software behaves.

Using variables instead of signals, you write the process the way you
would in software, and order references relative to assignments to
create clock delays (register/pipeline stages).

Some people like the descriptions using signals better, some like the
variable descriptions better. I like the flexibility of
moving/adding/deleting registers by moving variable assignments
relative to references in the process.

Another approach is to use pipelining and retiming features of your
synthesis tool. You may be able to describe the process all in one
cycle, and then delay the outputs by several clocks (through
registers), then let the synthesis tool redistribute registers
according to timing constraints. Synthesis tools have their
limitations here though... And of course, this has problems when
handling feedback.

Andy

 
Reply With Quote
 
Mike Treseler
Guest
Posts: n/a
 
      06-06-2006
Andy wrote:
> I think that may be more of a limitation of DC than anything else. At
> least for FPGA synthesis, Synplicity does not seem to mind combining
> control and dataflow logic.


I agree, and would add Quartus, ISE, Leonardo, Modelsim, and NC-Sim
to the list of tools proven useful for VHDL'93 designs.

If I had to use DC, I would code in verilog instead of VHDL.

-- Mike Treseler
 
Reply With Quote
 
Aditya Ramachandran
Guest
Posts: n/a
 
      06-07-2006
For pipelined logic where it's not clear what each stage should do
exactly, I find it
easier to code the logic first, add multiple pipelined registers at the
end of the logic
and then synthesize using balance_registers in Design-Compiler.

Ex: AND AND AND FLOP FLOP FLOP
becomes
AND FLOP AND FLOP AND FLOP
after synthesis using balance_registers

Aditya

Jonathan Bromley wrote:
> Not a specific question, not a request for help, just an
> invitation to share ideas about something that I've always
> found tricky - and I suspect I'm not alone.
>
> Using HDLs you can elegantly describe quite complicated logic in
> a clocked process - we've had several discussions about that
> here, and we know there are many popular styles.
>
> Mostly, though, we need to describe things that are pipelined.
> Sometimes that pipelining is from choice, sometimes it's
> forced upon us by the behaviour of things outside our
> control (such as pipelined synchronous RAMs in an FPGA).
>
> As soon as you have a pipelined design, it's rather easy to
> describe the behaviour of each pipeline stage as an HDL
> clocked process (or, indeed, as part of a process that
> describes multiple stages) but as soon as that happens
> you tend to lose sight of the overall algorithm that's
> being implemented. Sometimes the design nicely
> suits a description in which each pipeline stage stands
> alone, but if there is any feedback from later pipeline
> stages to earlier ones then it's usually much harder
> to see what's going on.
>
> So, here's my question: When writing pipelined designs,
> what do all you experts out there do to make the overall
> data and control flow as clear and obvious as possible?
>
> Thanks in advance


 
Reply With Quote
 
Ben Jones
Guest
Posts: n/a
 
      06-07-2006

"Jonathan Bromley" <> wrote in message
news:...

> So, here's my question: When writing pipelined designs,
> what do all you experts out there do to make the overall
> data and control flow as clear and obvious as possible?


Comments.

Lots and lots and lots of comments. Oh, and a diagram.

-Ben-


 
Reply With Quote
 
Marcus Harnisch
Guest
Posts: n/a
 
      06-07-2006
Jonathan,

I've successfully used register balancing in Synopsys DC since about
eight years ago. In order to notify the other end about when there's
work to be done, it is often a good idea to pass a synchronization
signal (e.g. data valid, deasserted reset) through the pipeline as
well.

Don't forget your post-synthesis verification though (gate-level or
formal). We never completely trust the tools, right?

Regards,
Marcus
 
Reply With Quote
 
KJ
Guest
Posts: n/a
 
      06-07-2006
>
> Mostly, though, we need to describe things that are pipelined.
> Sometimes that pipelining is from choice,

Not sure I can think of any "from choice" examples except for places where..
- It doesn't matter if the signal is combinatorial or delayed by a clock
cycle.
- and the cleanest from for writing the logic (in VHDL) is using a
statement only available inside a process (i.e. a case or if)
- There would be more than a couple signals in the sensitivity list
In that situation I would choose a clocked process over a process with the
laundry list of signals in the sensitivity list of which I'll invariably
forget at least one.

> sometimes it's
> forced upon us by the behaviour of things outside our
> control (such as pipelined synchronous RAMs in an FPGA).
>

Dang those pesky constraints anyway.

> As soon as you have a pipelined design, it's rather easy to
> describe the behaviour of each pipeline stage as an HDL
> clocked process (or, indeed, as part of a process that
> describes multiple stages) but as soon as that happens
> you tend to lose sight of the overall algorithm that's
> being implemented.

That's the point where I would go back and rethink how I've partitioned the
design and ponder a bit on...
- Is the algorithm itself really what needs to be implemented or is there a
different algorithm that accomplishes the same/similar goals that might be
more ameanable to implementation since I've wrapped myself around the axle
on this one. If not, then move on to the following point.
- Rethink the partitioning of the design. Sometimes my first guess at how
things should be partitioned turns out to be rather clumsy and now after
having "lost sight of the overall algorithm that's being implemented" is a
good time to go back and redraw the boundary lines.

As for the boundary lines themselves, I'm generally talking about at the
VHDL entity level. Any decently complex algorithm that needs to be
pipelined probably is composed of some form of cascaded blocks. Each
cascaded block will have a clear definition of what it is trying to
accomplish. This pretty much then defines what the I/O (in terms of
algorithm information flow) is. Based on that choose an appropriate set of
control/status signals to move that information in and out of the blocks.
For that, of late I've been using Altera's Avalon bus specification as a
model. I looked at opencore's wishbone spec as well and wasn't terribly
impressed but Avalon seems to have an interface definition that scales
really well (like not just for the top level, but can go all the way down to
'simple blocks' without any appreciable 'overhead' in terms of wasted
logic). By that I mean that not only can I use it for the top level of the
algorithm implementation's I/O but it can also be used for interconnecting
those cascaded blocks. Not sales pitching Altera, I'm sure Xilinx, Actel et
al all probably have some equivalent as well but over the last 5 years I've
pretty much been all Altera. The SOPC Builder tool sucks and I no longer
use it for real design, but the Avalon specification itself is good.

In any case, I've found that having 'some' block I/O interface signal
specification instead of your own "well thought out, but still kinda in your
head but it works for me and it's so clear that I'm sure you'll get it too"
version is a key to not getting lost in your pipelining (second only to
having the individual sub-blocks implementing the correct
functionality...i.e. drawing the right boundaries in the first place).
Since these are 'sub-blocks' I'll tend to generalize the data signals to fit
the true need. For example, Avalon data are all std_logic_vectors but I'll
change that to be a VHDL record so that the interface between blocks is of
the appropriate type for that interface. At the top level of the algorithm
implementation you're generally constrained in what you can use but the
internal block to block interfaces generally don't have that constraint.

Once inside a particular block, if I'm finding myself "losing sight of the
overall algorithm within the local space" I'll generally follow the same
steps and re-factor. Maybe that means that this particular block should be
decomposed into a parent/child structure or maybe it needs to be split into
two cascaed 'siblings'.

> Sometimes the design nicely
> suits a description in which each pipeline stage stands
> alone, but if there is any feedback from later pipeline
> stages to earlier ones then it's usually much harder
> to see what's going on.
>

'Most' of the time in the past, I've found that this feedback is usually
something of the form 'slow down I can't take the data so quickly' or 'OK,
I'm ready to accept data'. That feedback needs to get from the data
consumer back to whatever it is that is ultimately sourcing the data. This
particular type of feedback though is exactly the data flow control that
specifications like Avalon are designed to handle so if you've designed each
sub block to that interface than the flow control type of feedback will take
care of itself. I'm pondering what other types of feedback there might be
to feed from a later to an earlier stage, but I guess it's too early in the
morning.

> So, here's my question: When writing pipelined designs,
> what do all you experts out there do to make the overall
> data and control flow as clear and obvious as possible?
>

1. Partition entities into clearly describable functions and don't be afraid
to go back and re-partition them into different clearly describable
functions if you get wrapped around the axle.
2. Choose an I/O interface model specification (Avalon, wishbone, etc.) and
use it not just for the top block but for sub-blocks as well. Since you'd
like to use this I/O model all the way from the top to bottom in your design
don't pick something that carries a lot of baggage with it that causes you
to abandon it. An outlandish example, would be choosing PCI as your model.
While great for connecting 'big' things, you probably wouldn't want to
outfit each entity with a PCI interface. Look for something that scales
well DOWNWARD (i.e. not logic wasteful), so you're not forced to abandon it
because of the overhead.
3. Re-factor an entity into a parent/child or sibling/sibling pair of
entities when you find yourself getting 'lost'.

> Thanks in advance

Thanks for the soapbox

Kevin Jennings


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Describing a tree Mok-Kong Shen C Programming 35 07-28-2010 08:30 AM
Describing a webservice tascien ASP .Net Web Services 4 02-23-2006 09:36 AM
describing a file system in xml uvts_cvs@yahoo.com XML 0 03-11-2005 10:02 AM
What am I describing? Gloria Goitre Computer Support 5 02-12-2005 04:05 AM
syntax/notation used in describing c's grammar ben C Programming 4 08-20-2004 07:38 PM



Advertisments