thumb | right | [[American Fuzzy Lop (software)|American Fuzzy Lop's afl-fuzz running on a test program]]

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, such as in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

For the purpose of security, input that crosses a trust boundary is often the most useful. To fuzz test a UNIX utility meant to automatically generate random input and command-line parameters for the utility. The project was designed to test the reliability of UNIX command line programs by executing a large number of random inputs in quick succession until they crashed. Miller's team was able to crash 25 to 33 percent of the utilities that they tested. They then debugged each of the crashes to determine the cause and categorized each detected failure. To allow other researchers to conduct similar experiments with other software, the source code of the tools, the test procedures, and the raw result data were made publicly available. was disclosed as a family of security bugs in the widely used UNIX Bash shell; most vulnerabilities of Shellshock were found using the fuzzer AFL. (Many Internet-facing services, such as some web server deployments, use Bash to process certain requests, allowing an attacker to cause vulnerable versions of Bash to execute arbitrary commands. This can allow an attacker to gain unauthorized access to a computer system.)

In April 2015, Hanno Böck showed how the fuzzer AFL could have found the 2014 Heartbleed vulnerability. (The Heartbleed vulnerability was disclosed in April 2014. It is a serious vulnerability that allows adversaries to decipher otherwise encrypted communication. The vulnerability was accidentally introduced into OpenSSL which implements TLS and is used by the majority of the servers on the internet. Shodan reported 238,000 machines still vulnerable in April 2016; 200,000 in January 2017.)

In August 2016, the Defense Advanced Research Projects Agency (DARPA) held the finals of the first Cyber Grand Challenge, a fully automated capture-the-flag competition that lasted 11 hours. The objective was to develop automatic defense systems that can discover, exploit, and correct software flaws in real-time. Fuzzing was used as an effective offense strategy to discover flaws in the software of the opponents. It showed tremendous potential in the automation of vulnerability detection. The winner was a system called "Mayhem" developed by the team ForAllSecure led by David Brumley.

In September 2016, Microsoft announced Project Springfield, a cloud-based fuzz testing service for finding security critical bugs in software. It supports Windows and Linux. It was archived three years later on November 1, 2023.

Early random testing

Testing programs with random inputs dates back to the 1950s when data was still stored on punched cards. (in analogy to Monte Carlo methods).

In 1981, Duran and Ntafos formally investigated the effectiveness of testing a program with random inputs. For example, when fuzzing the image library libpng, the user would provide a set of valid PNG image files as seeds while a mutation-based fuzzer would modify these seeds to produce semi-valid variants of each seed. The corpus of seed files may contain thousands of potentially similar inputs. Automated seed selection (or test suite reduction) allows users to pick the best seeds in order to maximize the total number of bugs found during a fuzz campaign.

A generation-based fuzzer generates inputs from scratch. For instance, a smart generation-based fuzzer) fuzzer leverages the input model to generate a greater proportion of valid inputs. For instance, if the input can be modelled as an abstract syntax tree, then a smart mutation-based fuzzer

Aware of program structure

Typically, a fuzzer is considered more effective if it achieves a higher degree of code coverage. The rationale is, if a fuzzer does not exercise certain structural elements in the program, then it is also not able to reveal bugs that are hiding in these elements. Some program elements are considered more critical than others. For instance, a division operator might cause a division by zero error, or a system call may crash the program.

A black-box fuzzer After all, the program may still fail for an input that has not been executed, yet; executing a program for all inputs is prohibitively expensive. If the objective is to prove a program correct for all inputs, a formal specification must exist and techniques from formal methods must be used.

Exposing bugs

In order to expose bugs, a fuzzer must be able to distinguish expected (normal) from unexpected (buggy) program behavior. However, a machine cannot always distinguish a bug from a feature. In automated software testing, this is also called the test oracle problem.

Typically, a fuzzer distinguishes between crashing and non-crashing inputs in the absence of specifications and to use a simple and objective measure. Crashes can be easily identified and might indicate potential vulnerabilities (e.g., denial of service or arbitrary code execution). However, the absence of a crash does not indicate the absence of a vulnerability. For instance, a program written in C may or may not crash when an input causes a buffer overflow. Rather the program's behavior is undefined.

To make a fuzzer more sensitive to failures other than crashes, sanitizers can be used to inject assertions that crash the program when a failure is detected. There are different sanitizers for different kinds of bugs:

  • to detect memory related errors, such as buffer overflows and use-after-free (using memory debuggers such as AddressSanitizer),
  • to detect race conditions and deadlocks (ThreadSanitizer),
  • to detect undefined behavior (UndefinedBehaviorSanitizer),
  • to detect memory leaks (LeakSanitizer), or
  • to check control-flow integrity (CFISanitizer).

Fuzzing can also be used to detect "differential" bugs if a reference implementation is available. For automated regression testing, the generated inputs are executed on two versions of the same program. For automated differential testing, the generated inputs are executed on two implementations of the same program (e.g., lighttpd and httpd are both implementations of a web server). If the two variants produce different output for the same input, then one may be buggy and should be examined more closely.

Validating static analysis reports

Static program analysis analyzes a program without actually executing it. This might lead to false positives where the tool reports problems with the program that do not actually exist. Fuzzing in combination with dynamic program analysis can be used to try to generate an input that actually witnesses the reported problem.

Browser security

Modern web browsers undergo extensive fuzzing. The Chromium code of Google Chrome is continuously fuzzed by the Chrome Security Team with 15,000 cores. For Microsoft Edge [Legacy] and Internet Explorer, Microsoft performed fuzzed testing with 670 machine-years during product development, generating more than 400 billion DOM manipulations from 1 billion HTML files. The Microsoft Security Research Centre (MSEC) developed the "!exploitable" tool which first creates a hash for a crashing input to determine its uniqueness and then assigns an exploitability rating:

  • Exploitable
  • Probably Exploitable
  • Probably Not Exploitable, or
  • Unknown.

Previously unreported, triaged bugs might be automatically reported to a bug tracking system. For instance, OSS-Fuzz runs large-scale, long-running fuzzing campaigns for several security-critical software projects where each previously unreported, distinct bug is reported directly to a bug tracker.

The following is a list of fuzzers described as "popular", "widely used", or similar in the academic literature.

<!--Barton Miller (2008). "Preface". In Ari Takanen, Jared DeMott and Charlie Miller, Fuzzing for Software Security Testing and Quality Assurance, </ref>-->

Further reading

  • A comprehensive guide on automated vulnerability research with emulated IoT devices.
  • A free, online, introductory textbook on fuzzing.
  • Ari Takanen, Jared D. DeMott, Charles Miller, Fuzzing for Software Security Testing and Quality Assurance, 2008,
  • Michael Sutton, Adam Greene, and Pedram Amini. Fuzzing: Brute Force Vulnerability Discovery, 2007, .
  • H. Pohl, Cost-Effective Identification of Zero-Day Vulnerabilities with the Aid of Threat Modeling and Fuzzing, 2011
  • Fabien Duchene, Detection of Web Vulnerabilities via Model Inference assisted Evolutionary Fuzzing, 2014, PhD Thesis
  • —Basically highlights why fuzzing works so well: because the input is the controlling program of the interpreter.
  • Fuzzing Project, includes tutorials, a list of security-critical open-source projects, and other resources.
  • University of Wisconsin Fuzz Testing (the original fuzz project) Source of papers and fuzz software.
  • Designing Inputs That Make Software Fail, conference video including fuzzy testing
  • Building 'Protocol Aware' Fuzzing Frameworks