Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > VHDL > Chasing Bugs in the Fog

Reply
Thread Tools

Chasing Bugs in the Fog

 
 
rickman
Guest
Posts: n/a
 
      06-18-2013
I have a bug in a test fixture that is FPGA based. I had thought it was
in the software which controls it, but after many hours of chasing it
around I've concluded it must be in the FPGA code.

I didn't think it was in the VHDL because it had been simulated well and
the nature of the bug is an occasional dropped character on the receive
side. Who can't design a UART? Well, it could be in the handshake with
the state machine, but still...

So I finally got around to adding some debug signals which I would
monitor on an analyzer and guess what, the bug is gone! I *hate* when
that happens. I can change the code so the debug signals only appear
when a control register is set to enable them, but still, I don't like
this. I want to know what is causing this DURN THING!

Anyone see this happen to them before?

Oh yeah, someone in another thread (that I can't find, likely because I
don't recall the group I posted it in) suggested I add synchronizing FFs
to the serial data in. Sure enough I had forgotten to do that. Maybe
that was the fix... of course! It wasn't metastability, I bet it was
feeding multiple bits of the state machine! Durn, I never make that
sort of error. Thanks to whoever it was that suggested the obvious that
I had forgotten.

--

Rick
 
Reply With Quote
 
 
 
 
Rob Gaddi
Guest
Posts: n/a
 
      06-18-2013
On Mon, 17 Jun 2013 20:00:01 -0400
rickman <(E-Mail Removed)> wrote:

> So I finally got around to adding some debug signals which I would
> monitor on an analyzer and guess what, the bug is gone! I *hate* when
> that happens. I can change the code so the debug signals only appear
> when a control register is set to enable them, but still, I don't like
> this. I want to know what is causing this DURN THING!
>
> Anyone see this happen to them before?
>
> Oh yeah, someone in another thread (that I can't find, likely because I
> don't recall the group I posted it in) suggested I add synchronizing FFs
> to the serial data in. Sure enough I had forgotten to do that. Maybe
> that was the fix... of course! It wasn't metastability, I bet it was
> feeding multiple bits of the state machine! Durn, I never make that
> sort of error. Thanks to whoever it was that suggested the obvious that
> I had forgotten.
>
> --
>
> Rick


Not metastability, a race condition. Asynchronous external input
headed to multiple clocked elements, each of which it reaches via a
different path with a different delay.

When you added debugging signals you changed the netlist, which changed
the place and route, making unpredictable changes to those delays. In
this case, it happened to push it into a place where _as far as you
tested_, it seems happy. But it's still unsafe, because as you change
other parts of the design, the P&R of that section will still change
anyhow, and you start getting my favorite situation, the problem that
comes and goes based on entirely unrelated factors.

The fix you fixed fixes it. When you resynchronized it on the same
clock as you're running around the rest of the logic, you forced that
path to become timing constrained. As such, the P&R takes it upon
itself to make sure that the timing of that route is irrelevant with
respect to the clock period, and your problem goes away for good.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
 
Reply With Quote
 
 
 
 
rickman
Guest
Posts: n/a
 
      06-18-2013
On 6/17/2013 8:14 PM, Rob Gaddi wrote:
> On Mon, 17 Jun 2013 20:00:01 -0400
> rickman<(E-Mail Removed)> wrote:
>
>> So I finally got around to adding some debug signals which I would
>> monitor on an analyzer and guess what, the bug is gone! I *hate* when
>> that happens. I can change the code so the debug signals only appear
>> when a control register is set to enable them, but still, I don't like
>> this. I want to know what is causing this DURN THING!
>>
>> Anyone see this happen to them before?
>>
>> Oh yeah, someone in another thread (that I can't find, likely because I
>> don't recall the group I posted it in) suggested I add synchronizing FFs
>> to the serial data in. Sure enough I had forgotten to do that. Maybe
>> that was the fix... of course! It wasn't metastability, I bet it was
>> feeding multiple bits of the state machine! Durn, I never make that
>> sort of error. Thanks to whoever it was that suggested the obvious that
>> I had forgotten.
>>
>> --
>>
>> Rick

>
> Not metastability, a race condition. Asynchronous external input
> headed to multiple clocked elements, each of which it reaches via a
> different path with a different delay.
>
> When you added debugging signals you changed the netlist, which changed
> the place and route, making unpredictable changes to those delays.


No, when changing the debug output I added the synchronization FFs which
fixed the problem.

My point was that when the other poster suggested that I need to sync to
the clock I mistook that for metastability forgetting that the input
went to multiple sections of logic. So actually I made the same mistake
twice... lol


> In
> this case, it happened to push it into a place where _as far as you
> tested_, it seems happy. But it's still unsafe, because as you change
> other parts of the design, the P&R of that section will still change
> anyhow, and you start getting my favorite situation, the problem that
> comes and goes based on entirely unrelated factors.
>
> The fix you fixed fixes it. When you resynchronized it on the same
> clock as you're running around the rest of the logic, you forced that
> path to become timing constrained. As such, the P&R takes it upon
> itself to make sure that the timing of that route is irrelevant with
> respect to the clock period, and your problem goes away for good.


Just to make sure of what was what (it has been two years since I last
worked with this design) I pulled the FFs out and added back just one.
Sure enough the bug reappears with no FFs, but goes away with just one.
The added debug info available allowed me to see exactly the error and
sure enough, when a start bit comes in there is a chance that the two
counters are not properly set and the error shows up in the center of
the bit where the current contents of the shift register are moved into
the holding register as a new char.

I guess what most likely happened is that when I wrote the UART code I
assumed the sync FFs would be external and when I wrote the wrapper code
I assumed the FFs were inside the UART. In other words, I didn't have a
proper spec and never gave this problem proper consideration.

I will revisit this design and look at the other inputs. No reason to
assume I didn't make the same mistake elsewhere.

--

Rick
 
Reply With Quote
 
Nicolas Matringe
Guest
Posts: n/a
 
      06-18-2013
Le 18/06/2013 23:45, rickman a écrit :

> I guess what most likely happened is that when I wrote the UART code I
> assumed the sync FFs would be external and when I wrote the wrapper code
> I assumed the FFs were inside the UART. In other words, I didn't have a
> proper spec and never gave this problem proper consideration.


Several years ago a young engineer reused my long proven UART code and
modified it, carelessly removing the synchronizing FF. He came to see me
and complained that my UART didn't work, it hung after some
unpredictable time.
I thought for a few minutes, guessed he probably had removed the FF and
fixed his problem right away.

Nicolas
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
The node.js Community is Quietly Changing the Face of Open Source Rodrick Brown Python 2 04-17-2013 04:47 PM
Re: The node.js Community is Quietly Changing the Face of Open Source Sven Python 0 04-16-2013 04:41 PM
Re: The node.js Community is Quietly Changing the Face of Open Source Ned Batchelder Python 0 04-16-2013 04:25 PM
Is there a difference between the use of the word montage vscollage Danny D. Digital Photography 8 04-15-2013 02:24 PM
Windows 8 - so bad it's hastening the death of the PC? ~misfit~ NZ Computing 18 04-15-2013 04:15 AM



Advertisments