Mutation testing

Mutation testing (or mutation analysis or program mutation) is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves making small changes to the program being tested. Its purpose is to help the tester develop effective regression tests by locating weaknesses in the test data used to test the program and discovering sections of the tested program's code that are seldom or never accessed during execution.

Introduction

Most of this article is about "program mutation", in which the program is modified. A more general definition of mutation analysis is using well-defined rules defined on syntactic structures to make systematic changes to software artifacts. Mutation analysis has been applied to other problems, but is usually applied to testing. So mutation testing is defined as using mutation analysis to design new software tests or to evaluate existing software tests.

Overview

Tests can be created to verify the correctness of the implementation of a given software system, but the creation of tests still poses the question of whether the tests are correct and sufficiently cover the requirements associated with the implementation. (This technological problem is itself an instance of a deeper philosophical problem named "Quis custodiet ipsos custodes?" ["Who will guard the guards?"].)

The idea behind mutation testing is that the program being tested works as intended, so if a mutant is introduced and functionality changes this means a bug is introduced, which the tests should then find. In this way, the tests are tested. If a mutant is not detected by the test suite, this typically indicates that the test suite is unable to locate the faults represented by the mutant, but it can also indicate that the mutation introduces no faults. That is, the mutation is a valid change, one that either produces a desired result or one that does not affect functionality. One (common) way a mutant can be valid is that the code that has been changed is "dead code" that is never executed.

For mutation testing to function at scale, a large number of mutants are usually introduced, leading to the compilation and execution of an extremely large number of copies of the program. This problem of the expense of mutation testing had reduced its practical use as a method of software testing. However, the increased use of object-oriented programming languages and unit testing frameworks has led to the creation of mutation testing tools that test individual portions of an application.

Goals

The goals of mutation testing are multiple:

identify weakly tested pieces of code (those for which mutants are not killed) and first developed and published by DeMillo, Lipton and Sayward. The first implementation of a mutation testing tool was by Timothy Budd as part of his PhD work (titled Mutation Analysis) in 1980 from Yale University.

Recently, with the availability of massive computing power, there has been a resurgence of mutation analysis within the computer science community, and work has been done to define methods of applying mutation testing to object oriented programming languages and non-procedural languages such as XML, SMV, and finite-state machines.

In 2004, a company called Certess Inc. (now part of Synopsys) extended many of the principles into the hardware verification domain. Whereas mutation analysis only expects to detect a difference in the output produced, Certess extends this by verifying that a checker in the testbench will actually detect the difference. This extension means that all three stages of verification, namely: activation, propagation, and detection are evaluated. They called this functional qualification.

Fuzzing can be considered to be a special case of mutation testing. In fuzzing, the messages or data exchanged inside communication interfaces (both inside and between software instances) are mutated to catch failures or differences in processing the data. Codenomicon (2001) and Mu Dynamics (2005) evolved fuzzing concepts to a fully stateful mutation testing platform, complete with monitors for thoroughly exercising protocol implementations.

Mutation testing overview

Mutation testing is based on two hypotheses. The first is the competent programmer hypothesis. This hypothesis states that competent programmers write programs that are close to being correct.

Subtle and important faults are also revealed by higher-order mutants, which further support the coupling effect. Higher-order mutants are enabled by creating mutants with more than one mutation.

Mutation testing is done by selecting a set of mutation operators and then applying them to the source program one at a time for each applicable piece of the source code. The result of applying one mutation operator to the program is called a mutant. If the test suite is able to detect the change (i.e., one of the tests fails), then the mutant is said to be killed.

For example, consider the following C++ code fragment:

if (a && b) {

c = 1;

} else {

c = 0;

}

</syntaxhighlight>

The condition mutation operator would replace <code>&&</code> with <code>||</code> and produce the following mutant:

if (a || b) {

c = 1;

} else {

c = 0;

}

</syntaxhighlight>

Now, for the test to kill this mutant, the following three conditions should be met:

A test must reach the mutated statement.
Test input data should infect the program state by causing different program states for the mutant and the original program. For example, a test with <code>a = 1</code> and <code>b = 0</code> would do this.
The incorrect program state (the value of 'c') must propagate to the program's output and be checked by the test.

These conditions are collectively called the RIP model. A 2014 systematic literature review of a wide range of approaches to overcome the Equivalent Mutant Problem identified 17 relevant techniques (in 22 articles) and three categories of techniques: detecting (DEM); suggesting (SEM); and avoiding equivalent mutant generation (AEMG). The experiment indicated that Higher Order Mutation in general and JudyDiffOp strategy in particular provide a promising approach to the Equivalent Mutant Problem.

In addition to equivalent mutants, there are subsumed mutants which are mutants that exist in the same source code location as another mutant, and are said to be "subsumed" by the other mutant. Subsumed mutants are not visible to a mutation testing tool, and do not contribute to coverage metrics. For example, let's say you have two mutants, A and B, that both change a line of code in the same way. Mutant A is tested first, and the result is that the code is not working correctly. Mutant B is then tested, and the result is the same as with mutant A. In this case, Mutant B is considered to be subsumed by Mutant A, since the result of testing Mutant B is the same as the result of testing Mutant A. Therefore, Mutant B does not need to be tested, as the result will be the same as Mutant A.

Mutation operators

To make syntactic changes to a program, a mutation operator serves as a guideline that substitutes portions of the source code. Given that mutations depend on these operators, scholars have created a collection of mutation operators to accommodate different programming languages, like Java. The effectiveness of these mutation operators plays a pivotal role in mutation testing.

Many mutation operators have been explored by researchers. Here are some examples of mutation operators for imperative languages:

Statement deletion
Statement duplication or insertion, e.g. <code>goto fail;</code>
Replacement of Boolean subexpressions with true and false
Replacement of some arithmetic operations with others, e.g. <code>+</code> with <code>*</code>, <code>-</code> with <code>/</code>
Replacement of some Boolean relations with others, e.g. <code>></code> with <code>>=</code>, <code>==</code> and <code><=</code>
Replacement of variables with others from the same scope (variable types must be compatible)
Remove method body.

These mutation operators are also called traditional mutation operators.

There are also mutation operators for object-oriented languages, for concurrent constructions, complex objects like containers, etc.

Types of mutation operators

Operators for containers are called class-level mutation operators. Operators at the class level alter the program's structure by adding, removing, or changing the expressions being examined. Specific operators have been established for each category of changes.

Apart from the class-level operators, MuJava also includes method-level mutation operators, referred to as traditional operators. These traditional operators are designed based on features commonly found in procedural languages. They carry out changes to statements by adding, substituting, or removing primitive operators. These operators fall into six categories: Arithmetic operators, Relational operators, Conditional operators, Shift operators, Logical operators and Assignment operators. This technique is crucial in software testing as it helps identify potential weaknesses or errors in the code. By deliberately making changes to the code and observing how it behaves, developers can uncover hidden bugs or flaws that might go unnoticed during regular testing. Statement mutation is like a diagnostic tool that provides insights into the code's robustness and resilience, helping programmers improve the overall quality and reliability of their software.

For example, in the code snippet below, entire 'else' section is removed:

function checkCredentials(username, password) {

if (username === "admin" && password === "password") {

return true;

}

</syntaxhighlight>

Value mutation

Value mutation occurs when modifications are made to parameter or constant values within the code. This typically involves adjusting the values by adding or subtracting 1, but it can also involve making more substantial changes to the values. The specific alterations made during value mutation include two main scenarios:

Firstly, there's the transformation from a small value to a higher value. This entails replacing a small value in the code with a larger one. The purpose of this change is to assess how the code responds when it encounters larger inputs. It helps ensure that the code can accurately and efficiently process these larger values without encountering errors or unexpected issues.