Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > VHDL > improving code

Reply
Thread Tools

improving code

 
 
mysticlol
Guest
Posts: n/a
 
      09-15-2006
Hi,

I am having trouble with this code. It is reducing the design max clock
frequency considerably. How can I improve this?

architecture rtl of clock_gen is
constant DEFAULT_TIME : std_logic_vector := "1111011010";
signal ontime : std_logic_vector( 10 downto 0);
begin

process(clock, reset)
begin
if reset='1' then
ontime <= DEFAULT_TIME;
elsif rising_edge(clock) then
if go='1' then
ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
end if;
end if;
end process;
..
..
..
end architecture rtl;

 
Reply With Quote
 
 
 
 
David Ashley
Guest
Posts: n/a
 
      09-15-2006
mysticlol wrote:
> Hi,
>
> I am having trouble with this code. It is reducing the design max clock
> frequency considerably. How can I improve this?
>
> architecture rtl of clock_gen is
> constant DEFAULT_TIME : std_logic_vector := "1111011010";
> signal ontime : std_logic_vector( 10 downto 0);
> begin
>
> process(clock, reset)
> begin
> if reset='1' then
> ontime <= DEFAULT_TIME;
> elsif rising_edge(clock) then
> if go='1' then
> ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
> end if;
> end if;
> end process;
> .
> .
> .
> end architecture rtl;
>


Probably everyone will say go with a synchronous clock approach. Also
you can compute the next data word in combinatorial logic:

architecture rtl of clock_gen is
constant DEFAULT_TIME : std_logic_vector := "1111011010";
signal ontime : std_logic_vector( 10 downto 0);
signal next_dataWord : std_logic_vector (10 downto 0);

begin
next_dataWord <= dataWord(11 downto 1) - '1';

process(clock)
begin
if rising_edge(clock) then
if reset='1' then
ontime <= DEFAULT_TIME;
elsif go='1' then
ontime <= next_dataWord;
end if;
end if;
end process;

end architecture rtl;


--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
 
Reply With Quote
 
 
 
 
mysticlol
Guest
Posts: n/a
 
      09-15-2006
Hi David,

I already tried that. It is not making any difference..

Actually I have a 100MHz clock. dataWord is user input.
100 MHz/dataWord(11 downto 0) = required Clock Period.

I am taking dataWord(11 downto 1) - so that I can get ontime which is
half of this dataWord. As I am counting from 0, I am substracting 1
from this ontime.

regards,
mysticlol


David Ashley wrote:
> mysticlol wrote:
> > Hi,
> >
> > I am having trouble with this code. It is reducing the design max clock
> > frequency considerably. How can I improve this?
> >
> > architecture rtl of clock_gen is
> > constant DEFAULT_TIME : std_logic_vector := "1111011010";
> > signal ontime : std_logic_vector( 10 downto 0);
> > begin
> >
> > process(clock, reset)
> > begin
> > if reset='1' then
> > ontime <= DEFAULT_TIME;
> > elsif rising_edge(clock) then
> > if go='1' then
> > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
> > end if;
> > end if;
> > end process;
> > .
> > .
> > .
> > end architecture rtl;
> >

>
> Probably everyone will say go with a synchronous clock approach. Also
> you can compute the next data word in combinatorial logic:
>
> architecture rtl of clock_gen is
> constant DEFAULT_TIME : std_logic_vector := "1111011010";
> signal ontime : std_logic_vector( 10 downto 0);
> signal next_dataWord : std_logic_vector (10 downto 0);
>
> begin
> next_dataWord <= dataWord(11 downto 1) - '1';
>
> process(clock)
> begin
> if rising_edge(clock) then
> if reset='1' then
> ontime <= DEFAULT_TIME;
> elsif go='1' then
> ontime <= next_dataWord;
> end if;
> end if;
> end process;
>
> end architecture rtl;
>
>
> --
> David Ashley http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture


 
Reply With Quote
 
David Ashley
Guest
Posts: n/a
 
      09-15-2006
mysticlol wrote:
> Hi David,
>
> I already tried that. It is not making any difference..
>
> Actually I have a 100MHz clock. dataWord is user input.
> 100 MHz/dataWord(11 downto 0) = required Clock Period.
>
> I am taking dataWord(11 downto 1) - so that I can get ontime which is
> half of this dataWord. As I am counting from 0, I am substracting 1
> from this ontime.
>
> regards,
> mysticlol


If you take out the subtracting 1 and cause the user input to
just be one less than it otherwise would be, does it meet timing?

How about an intermediate register that already has the -1 applied --
so ontime gets reloaded from that?

Maybe add 1 to the DEFAULT_TIME value and change the down
counter to stop at one more than it currently stops at? Then you
don't need the minus 1.

-Dave




--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
 
Reply With Quote
 
KJ
Guest
Posts: n/a
 
      09-15-2006
mysticlol wrote:
> Hi,
>
> I am having trouble with this code. It is reducing the design max clock
> frequency considerably. How can I improve this?
>
> architecture rtl of clock_gen is
> constant DEFAULT_TIME : std_logic_vector := "1111011010";
> signal ontime : std_logic_vector( 10 downto 0);
> begin
>
> process(clock, reset)
> begin
> if reset='1' then
> ontime <= DEFAULT_TIME;
> elsif rising_edge(clock) then
> if go='1' then
> ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
> end if;
> end if;
> end process;
> .
> .
> .
> end architecture rtl;


First take a look at the critical timing path that is being reported by
the fitter and try to understand what is going on in that path.
Understanding what is going on in that path is the key to improving
performance.

The synthesis/fitter tools are all pretty smart and have probably
already performed the optomization that Dave suggested about moving the
calculation of 'ontime' outside the process. That's why when you tried
it, it didn't help...it probably synthesized to the exact same thing.
For that reason, you probably shouldn't write your code in that manner,
but keep it inside the logic block where it belongs. Sometimes there
are exceptions to this general guideline but in this case there isn't.

I'm guessing that the worst case path is the setup time from the input
'dataword' to the flip flops that compute 'ontime'. Since setup times
that start from external pins are almost always longer than ones that
start from an internal flip flop the way to improve clock cycle
performance is to simply register 'dataword' right at the input and
then feed the 'dataword_delayed' into your computation of 'ontime'.
The disadvantage is that you have added one clock cycle latency but in
this case it looks like you can probably change your code to compensate
for that.

KJ

 
Reply With Quote
 
Eric
Guest
Posts: n/a
 
      09-15-2006
Strange. Your critical path is only the carry propagation along the
substraction and your target clock frequency is easy to reach for the
current ASIC/FPGA technology (100MHz according to your reply?). Could
you give some more infos about

- Which technology do you target?
- Which frequency would you like to reach?
- What is the dataWord input delay wrt clock?

If your technology really prevent you to compute the 11-bit
substraction using a basic ripple carry substractor, then you'll have
to use some carry anticipation technics (such as carry-save adders and
so on). Try google and "arithmetic operator implementation".

Eric

 
Reply With Quote
 
David Ashley
Guest
Posts: n/a
 
      09-15-2006
KJ wrote:
> The synthesis/fitter tools are all pretty smart and have probably
> already performed the optomization that Dave suggested about moving the
> calculation of 'ontime' outside the process. That's why when you tried
> it, it didn't help...it probably synthesized to the exact same thing.
> For that reason, you probably shouldn't write your code in that manner,
> but keep it inside the logic block where it belongs. Sometimes there
> are exceptions to this general guideline but in this case there isn't.


KJ,

Can you be specific about:
> For that reason, you probably shouldn't write your code in that manner,
> but keep it inside the logic block where it belongs.


I want to make sure I'm following. Is it a question of

#1:

process(clk)
begin
if clk'event and clk = '1' then
state <= next_state;
end if;
end process;

process(state, ...)
begin
case state is
when state1 => next_state <= state2;blah;
when state2 => next_state <= state3;blah;
when others => next_state <= state1;blah;
end case;
end process;

VS
#2:

process(clk)
begin
if clk'event and clk = '1' then
case state is
when state1 => state <= state2;blah;
when state2 => state <= state3;blah;
when state3 => state <= state1;blah;
end case;
end if;
end process;

I just moved a block of code that had been doing #2 to
#1, but are you saying #2 is the way to go? I'm trying
to develop the right habits...

-Dave

-
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
 
Reply With Quote
 
KJ
Guest
Posts: n/a
 
      09-15-2006
David Ashley wrote:
> KJ wrote:
> > The synthesis/fitter tools are all pretty smart and have probably
> > already performed the optomization that Dave suggested about moving the
> > calculation of 'ontime' outside the process. That's why when you tried
> > it, it didn't help...it probably synthesized to the exact same thing.
> > For that reason, you probably shouldn't write your code in that manner,
> > but keep it inside the logic block where it belongs. Sometimes there
> > are exceptions to this general guideline but in this case there isn't.

>
> KJ,
>
> Can you be specific about:
> > For that reason, you probably shouldn't write your code in that manner,
> > but keep it inside the logic block where it belongs.

>
> I want to make sure I'm following. Is it a question of
>
> #1:
>
> process(clk)
> begin
> if clk'event and clk = '1' then
> state <= next_state;
> end if;
> end process;
>
> process(state, ...)
> begin
> case state is
> when state1 => next_state <= state2;blah;
> when state2 => next_state <= state3;blah;
> when others => next_state <= state1;blah;
> end case;
> end process;
>
> VS
> #2:
>
> process(clk)
> begin
> if clk'event and clk = '1' then
> case state is
> when state1 => state <= state2;blah;
> when state2 => state <= state3;blah;
> when state3 => state <= state1;blah;
> end case;
> end if;
> end process;
>
> I just moved a block of code that had been doing #2 to
> #1, but are you saying #2 is the way to go? I'm trying
> to develop the right habits...
>


This point almost always ignites a ton of responses from people with
some taking one side, others taking the other....so we best be careful
here lest wind get out of this :-0

The example you called #1 would be considered by many to be an example
of 'two process' coding style where you have one clocked process and
another combinatorial process and where some or all of the outputs of
the combinatorial process feeds into the clocked process. What you
called #2 is an example of what many would call 'single process' coding
style where there is simply one clocked process. The point that people
argue about is whether or not the 'one process' approach is better than
the 'two process'...and like I said, if word of this conversation gets
around there will no doubt be a number of people on both sides of the
fence stating their opinion....so I'll go first

- The 'one process' coding style will always take fewer lines of code.
Even your simple example takes 8 lines for the two process version
versus 7 for the one process (not counting the begin/end
process...which really should be counted also but just trying to strip
down to the absolute indisputable basics here. Typically what you'll
find is that the two process style bloats it up a bit more....the
number of lines of code difference between the two typically grows a
bit for each additional output signal of the process that is added.
- The 'two process' coding style is prone to two types of errors
* Incomplete sensitivity list (you had "process(state, ...)" ).
Maintenance of that "..." is not trivial in a real design, will
simulate just fine with no errors or warnings but synthesize to
something functionally different. There is no shortcut that I know of
to aid in that maintenance other than to run the code through a
synthesis engine and see where it warns you about an incomplete
sensitivity list and then fix it...and re-simulate....wasted time if
you ask me.
* Default values for signals inside the combinatorial process. Your
example covers every possible path assigning the one signal that is
inside the process but let's say that process assigned to 4 different
signals. You'd need to make sure that every possible path always got
assigned regardless of the logic. Typically what is done is right at
the begining of the combinatorial process you would have 'default'
assignments to those 4 signals and then head down into the meat of what
the process is attempting to do. If every possible path does not
assign a value then you've created a combinatorial latch...definite
'no,no' in FPGA/CPLDs. The method of always having a 'default'
assignment right at the begining of the process is at least a way to
insure that you cover this....and is also a contributor to why I
mentioned that the difference between the number of lines of code
written using the two process approach versus using the one process
approach grows with the number of signals being assigned within that
process. Using the one process coding style these additional
statements just aren't necessary.
- Whether one uses the 'one process' coding style or the 'two process'
coding style you will get the exact same synthesis results.
- More lines of code translates into more possible bugs.

Those are pretty much the indisputable facts about 'one process' versus
'two process' coding style that even the two process people will
begrudingly acknowledge. Pick up a textbook and you'll often see the
two process being waxed poetically and how it has some esoteric
advantage but when you cut through the cr-p and dig down to how it is
in any way better you'll come up empty. A lot of times the style that
people seem to use has more to do with
- How they were first taught it.
- How their first 'mentor' taught them.
- How the company they work for demands it.

>From that they now develop a personal preference and tend to stick with

it.

So take a look at the above mentioned disadvantages and I'll bet that
they are probably fairly closely aligned to how your personal
productivity will be measured in your job and decide for yourself.
Better yet, take some stripped down but somewhat realistic code and
implement it both ways and see firsthand for yourself.

As for the specific code of the original post, I simply think it
belongs inside the clocked process since ultimately this is a clocked
output and having two lines of code that are physically separated in
the file to implement what can be done with one is not a method I would
encourage.

The optomization that you mentioned as a suggestion would have been a
good one if the synthesis tool hadn't (most likely) already implemented
it. Based on the original poster's comment that he had tried your
suggestion and got the same performance would suggest that the
synthesis process had indeed sniffed this one out. It can be
enlightening sometimes to take a look at the synthesized output code
and see how it has implemented functionally the same thing but in a
much better (space, performance) way than an inspection of your source
code would seem to suggest that it was intended to be implemented.

KJ

 
Reply With Quote
 
David Ashley
Guest
Posts: n/a
 
      09-16-2006
KJ wrote:
>...
> - Whether one uses the 'one process' coding style or the 'two process'
> coding style you will get the exact same synthesis results.
> - More lines of code translates into more possible bugs.
>
> Those are pretty much the indisputable facts about 'one process' versus
> 'two process' coding style that even the two process people will
> begrudingly acknowledge. Pick up a textbook and you'll often see the
> two process being waxed poetically and how it has some esoteric
> advantage but when you cut through the cr-p and dig down to how it is
> in any way better you'll come up empty. A lot of times the style that
> people seem to use has more to do with
> - How they were first taught it.
> - How their first 'mentor' taught them.
> - How the company they work for demands it.
>...
>
> KJ


KJ,

I think I'm convinced. I had a module that was using the single clocked
process style, and I converted it to the 2 process style. So for every
clocked signal I had to declare a next_<whatever> signal. During
simulation I wasted time because my non-clocked process didn't have
all the sensitivities listed. Then during synthesis, I got bit because
I missed one or to next_<whatever> signals that didn't have a default
setting (as in starting out with next_cmd <= cmd So in this one
exercise I experienced pretty much everything you listed. Especially:
works fine in simulation, fails on actual hardware -- what a headache
to track down.

Now for the 2 coding style I had been attracted because it eliminates
one level of if -- the non-clocked region doesn't need to be inside
a "if clk'event and clk = '1' then". This makes for less indented code.

Another possible benefit to clocked/non-clocked processes: the non-clocked
can be divided up into different processes which can be kept small.
However that craps out when you want 2 processes to define the same
signal depending on state -- which means your state dependent code
has to be in one process anyway.

Trying both on a real world, but non-complex circuit, was a very good
exercise. Actually the design was for a simple memory test unit using
the open cores ddr controller -- it fills memory up with a non-repetitive
pattern, then reads it out and verifies it. If all is ok an LED stays lit.

All in all I think I lean towards the single clocked process. Thanks for
taking the time to elaborate all that!

-Dave

--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
 
Reply With Quote
 
mysticlol
Guest
Posts: n/a
 
      09-18-2006
Hi KJ,

The critical path is between 1. ontime FF to Count FF (6.36ns) and 2.
dataword_2 to ontime_8 (5.08 ns)

Actually it is meeting 100 MHz clock requirement. But in overall
project I need this module should reach 200 MHz. Now it is giving upto
150 MHz.

I know this might be annoying like asking questions without full code.
So let me give code....


library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;

---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;

entity clock_gen is
generic (default_div : std_logic_vector(10 downto 0) :=
"00000110001");
Port (
clk100 : in std_logic;
reset : in std_logic;
enable : in std_logic;
go : in std_logic;
dataWord : in std_logic_vector(11 downto 0);
clock_out : out std_logic;
done : out std_logic
);
end entity clock_gen;

architecture Behavioral of clock_gen is
signal count : std_logic_vector(10 downto 0);
signal ontime : std_logic_vector(10 downto 0);
signal clkout : std_logic;
signal go_en : std_logic;
begin

go_en <= go and enable;

process(clk100, reset)
begin
if reset='1' then
ontime <= default_div;
elsif rising_edge(clk100) then
if go_en='1' then
ontime <= dataWord(11 downto 1)-'1';
end if;
end if;
end process;

process(clk100, reset)
begin
if reset='1' then
done <= '1';
elsif rising_edge(clk100) then
if go_en='1' then
done <= '0';
else
done <= '1';
end if;
end if;
end process;

process(clk100, reset)
begin
if reset='1' then
count <= (others => '0');
elsif rising_edge(clk100) then
if go_en='1' or count=ontime then
count <= (others => '0');
else
count <= count + '1';
end if;
end if;
end process;

process(clk100, reset)
begin
if reset='1' then
clkout <= '0';
elsif rising_edge(clk100) then
if count=ontime then
clkout <= not clkout;
end if;
end if;
end process;

clock_out <= clkout;

end architecture Behavioral;


KJ, I am speachless .. what a guess.
You are an inspiration to programmers like me..


Regards,
Krishna




KJ wrote:
> mysticlol wrote:
> > Hi,
> >
> > I am having trouble with this code. It is reducing the design max clock
> > frequency considerably. How can I improve this?
> >
> > architecture rtl of clock_gen is
> > constant DEFAULT_TIME : std_logic_vector := "1111011010";
> > signal ontime : std_logic_vector( 10 downto 0);
> > begin
> >
> > process(clock, reset)
> > begin
> > if reset='1' then
> > ontime <= DEFAULT_TIME;
> > elsif rising_edge(clock) then
> > if go='1' then
> > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
> > end if;
> > end if;
> > end process;
> > .
> > .
> > .
> > end architecture rtl;

>
> First take a look at the critical timing path that is being reported by
> the fitter and try to understand what is going on in that path.
> Understanding what is going on in that path is the key to improving
> performance.
>
> The synthesis/fitter tools are all pretty smart and have probably
> already performed the optomization that Dave suggested about moving the
> calculation of 'ontime' outside the process. That's why when you tried
> it, it didn't help...it probably synthesized to the exact same thing.
> For that reason, you probably shouldn't write your code in that manner,
> but keep it inside the logic block where it belongs. Sometimes there
> are exceptions to this general guideline but in this case there isn't.
>
> I'm guessing that the worst case path is the setup time from the input
> 'dataword' to the flip flops that compute 'ontime'. Since setup times
> that start from external pins are almost always longer than ones that
> start from an internal flip flop the way to improve clock cycle
> performance is to simply register 'dataword' right at the input and
> then feed the 'dataword_delayed' into your computation of 'ontime'.
> The disadvantage is that you have added one clock cycle latency but in
> this case it looks like you can probably change your code to compensate
> for that.
>
> KJ


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Personal archive tool, looking for suggestions on improving the code mo reina Python 5 07-27-2010 03:47 PM
Looking for suggestions on improving numpy code David Lees Python 2 02-25-2008 05:11 PM
Improving performance of code ruds Java 13 04-08-2007 06:05 PM
Suggestions for improving code Aditya Mahajan Ruby 2 03-14-2006 07:00 AM
Improving code... Joe Van Dyk Ruby 9 11-17-2005 11:51 PM



Advertisments