Differences

This shows you the differences between two versions of the page.

chibios:articles:debugging7steps [2011/12/22 14:46] (current)
giovanni created
Line 1: Line 1:
 +====== The Seven Steps of the Successful Debugging ======
 +
 +<HTML>
 +<table>
 +<tr>
 +<td>
 +</HTML>
 +
 +Developing software is hard, developing embedded software with realtime requirements is much harder. When the time comes that you are facing "The Problem" then it is better to proceed in a rational way.
 +
 +<HTML>
 +</td>
 +<td>
 +<!-- Place this tag where you want the +1 button to render -->
 +<div align="right">
 +<g:plusone size="tall"></g:plusone>
 +</div>
 +</td>
 +</tr>
 +</table>
 +
 +<!-- Place this tag after the last plusone tag -->
 +<script type="text/javascript">
 +  (function() {
 +    var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true;
 +    po.src = 'https://apis.google.com/js/plusone.js';
 +    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
 +  })();
 +</script>
 +</HTML>
 +
 +===== The Seven Steps =====
 +
 +There are several steps that should be followed when facing non obvious anomalies:
 +
 +=== 1) Defect Detection: understanding that a problem is present ===
 +
 +There is no amount of testing that can prevent this, problems happen. First,  we need to understand that there is a problem, this is often not so simple because the most nasty problems are those that do not manifest themselves in an obvious way for example: increased response time Jitter, randomly missed deadlines or mysterious random lockups.
 +
 +=== 2) Defect Reproduction: being able to reproduce the problem ===
 +
 +If a problem cannot be reliably reproduced then it is hard to proceed, luckily if a problem is not dependent on external events (communication, sampling, reading of I/O lines, external interrupts) it will happen reliably because an MCU system is deterministic in nature. The worst case is a problem truly random in nature.
 +
 +=== 3) Defect Reduction: creating a minimal test case that exhibits the problem ===
 +
 +The first step toward a solution is to exclude all the code that is not necessary for the bug to happen. A small test case is easier to handle than a complex application. For random problems the best approach is to design/implement a stress test for the involved software components in order to trigger the problem more easily.
 +
 +=== 4) Defect Triggers: isolating the triggering condition ===
 +
 +Find the trigger of the bug, for example an IRQ preempting another IRQ at lower priority. Understanding when the problem happens is a necessary step toward the solution.
 +
 +=== 5) Defect Understanding: understanding the root cause(s) of the problem ===
 +
 +Before trying to fix the problem is necessary to understand exactly what is happening, without a complete understanding there is the risk to mask the effects of the problem rather than properly fix it. This is very dangerous because a masked problem *will* return and bite you.
 +
 +=== 6) Defect Analysis: understanding the impact on the system ===
 +
 +The defect solution could be a simple single line fix or impact the whole system design, the latter is, of course, more problematic because the defect would have its root in the system design itself.
 +
 +=== 7) Defect Elimination: fixing it ===
 +
 +Having the defect been found, reproduced, understood, analyzed now it is possible to properly implement the obvious fix.
 +
 +===== Fixing is not sufficient, there is more =====
 +
 +Now that the problem has been fixed there are several other things that should be considered before the incident can be closed:
 +  - Can similar problems occur in other parts of the systems?
 +  - Has the problem been properly documented/tracked so it will not happen again in the future?
 +  - Are there techniques or procedures that could have prevented the defect to occur in first place?
 +  - What I learned from the problem? Can I generalize what I learned?
  
 
chibios/articles/debugging7steps.txt · Last modified: 2011/12/22 14:46 by giovanni
 
Except where otherwise noted, content on this wiki is licensed under the following license:GNU Free Documentation License 1.3