The typical organizational approach to software bugs baffles me completely. When I first went to work at HP Labs I used the one system for categorizing and fixing bugs that seemed to work. For some reason, the Lab became afflicted with the desire to subscribe to popular industry fashions ("methodologies"), and abandoned it.
I took the old system to my new job, and then to another job after that, and during the last fifteen years I have revised it on my own while still keeping the essential framework intact. I have continued to use it.
The goals of any useful bug system are clear cut:
Most systems (including mine) incorporate some version of the terms priority and severity. Priority generally provides an answer to questions of urgency, whereas severity is often projected to be associated with how much work is required to fix the problem.
Most other systems provide a middle ground in the form of medium priority and medium severity. In most applications of the systems, a liberal dollop of subjectivity and political counsel propels a plurality (if not an outright majority) of the bugs into the pile that is medium on both axes: medium severity and medium priority.
The result of these failures of will is little more than admitting that the system is inadequate.
There are only two levels of urgency. No matter the problem, no matter the product, it either can or cannot be shipped with the problem unresolved, the bug un-repaired. In my system, the priority axis is just a checkbox, a bit, a yes-or-no:
Reductionism? Surely. But it is also the truth. There is no need for a priority assignment with some degree of finesse, and there is no benefit from maintaining a fiction that one is in use the way that polling uses "somewhat agree," "agree," and "strongly agree" to draw the person being interviewed into one camp or the other.
If you ask a programmer to estimate the severity of a bug, you are likely to, instead, get the subjective answer to another question: "How hard do you think this bug is to fix?" While the answer is important to know, it is not the answer to the objective question that needs to be asked: "What damage to the product is done by this bug?"
It is equally important to be able to make as much use as possible of the scoring system. The expression priority 1 may make conceptual sense because it aligns with the idiom top priority, but giving low numerical values to the most severe bugs hinders subsequent statistical analysis.
Instead, my system assigns low numbers to things that are not terribly severe, and higher numbers to more severe bugs.
How many levels do we need? How high can the numbers go? The choice is somewhat arbitrary, or perhaps it is subjective to use the earlier vocabulary. However, objectivity primarily consists of taking a confine-and-define approach to the ever present subjectivity.
Here are nine levels, adapted somewhat from the HP system. They have served me well.
In practice, high severity bugs are res ipsa loquitur show-stoppers. It is hard to imagine how one would stay in business selling a product filled with Severity 9 bugs. Just ask a Corvair owner. But it is also possible to have Severity 1 bugs kill a product, and in this case we turn to the Edsel. For example, an inadvertent misspelled word might be a profanity, in which case it would most definitely be a show-stopper.
A full discussion is beyond the scope of a web article, but there are couple of general principles that may be applied: for example, corner cases really cannot exist until testing has advanced to a point where the product has been well enough examined to decide whether an event is difficult to reproduce, and documentation errors are uncommon until it has been written.
There are three key features that can be discussed:
In a lot of systems the reward structure for programmers promotes claims of success such as "I have closed six bugs this week," or martyrdom, as in "I have been working on this bug for a week," neither of which contributes materially to a projection of the product's nearness to a release so that the company can begin making money.
In my system, we always work on the highest severity bugs first, in part because they do the most damage, and in part because the causes (and the associated repairs) tend to be related to less severe bugs. In my own experience it has been usual to fix a Severity 9 bug and have several 6-s and 5-s be closed out at the same time.
As a project manager, you should know that the correlation between effort-to-fix and severity is almost nil. Many Severity 9 bugs are extremely easy to fix because their consequences are so severe. But they might be difficult. On the other hand, the fix for many Severity 5 bugs may involve a long quest to provide functionality that is not absolutely necessary, and can be difficult to shoe-horn into the product.
Simply adding up the severities of the bugs will provide a meaningful result. If the sum of the severities of a hundred bugs is 450, and after a fixing a few and finding a few new ones you are left with ninety bugs with a total severity of 460, you can be sure that your original testing was woefully inadequate or the plan was not executed.
HP used a sum-of-the-squares figure for determining a statistic that was called "bug weight." In my experience there and afterwards, the squares of the larger severities do give a better picture of the stability of the product, and I recommend the use of squares. If you are familiar with the sabermetric concept known as Bill James' pythagorean theorem, and you are comfortable with programming only slightly more math into your spreadsheet, there is a good argument to be made that the best exponent is also 1.8 for bug weight. It is probably just a cosmic coincidence, rather than a profound insight into the workings of either software or baseball.
No. My view about the appropriateness of this metric is not set in stone. For example, the first big software project on which I had an ongoing role was the structural damage simulator for commercial aircraft at Boeing. After what happened, it was declared that there could simply be no bugs in the new prediction system, so a statistical evaluation of probabilities of failure in the software system was not appropriate for a system used to predict the probability of failure in the aircraft.
The system described here is maximally useful when these properties are associated with what you are trying to deliver: It is a moderately large system similar to other projects produced by the group or the company over time.