My Account  |  Cart Contents  |  Checkout | 
 
English Deutsch

Mende, Thilo - On the Evaluation of Defect Prediction Models


[978-3-86991-500-5]

MV-Wissenschaft, Softcover, 212 Seiten

Defect Prediction Models aim at identifying error-prone modules of a software system to guide quality assurance activities for example tests or code reviews. Due to their potential cost savings, such models have been actively researched for more than a decade, resulting in over 100 published research papers. Additionally, defect prediction models are often used for the empirical validation of software metrics and research hypotheses.

Despite the large body of existing research, the evaluation of defect prediction models has received only little attention. This is underlined by the large number of, sometimes only slightly differing evaluation approaches used by researchers. This thesis systematically investigates advantages and drawbacks of experimental design decisions and proposes guidelines for adequate evaluation procedures for defect prediction models.

First, different evaluation approaches are identified and summarized in a literature survey. Afterwards, we investigate the most common methods in detail. By using publicly available data sets, we demonstrate that different evaluation procedures have advantages and drawbacks, and may lead to different results. Additionally, we show that very simple models, based only on the size of modules, are able to achieve surprisingly good performance. The reason for this is an implicit assumption underlying almost all evaluation approaches, namely that the treatment costs for additional quality assurance activities are distributed uniformly across modules. We introduce the notion of effort-awareness that takes non-uniform treatment costs into account. Most models that perform well according to a uniform cost assumption are not cost-effective according to effort-aware performance measures. The performance can be increased significantly by using effort-aware predictions, both from a pract ical and from a statistical perspective. Whether effort-aware models based only on static code metrics are cost-effective in practice remains questionable, as shown in a case study.

In summary, our experiments show that the most commonly used evaluation procedures often lead to overly optimistic performance estimates and will not lead to cost-effective defect prediction models for many practical usage scenarios. The guidelines derived from these experiments, and in particular the notion of effort aware prediction and evaluation, can help to build defect prediction models usable in practice.

This product was added to our catalog on Monday 23 January, 2012.