The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. Instead of looking at the entire sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.

The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1981. Like the Needleman–Wunsch algorithm, of which it is a variation, Smith–Waterman is a dynamic programming algorithm. As such, it has the desirable property that it is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme). The main difference to the Needleman–Wunsch algorithm is that negative scoring matrix cells are set to zero. Traceback procedure starts at the highest scoring matrix cell and proceeds until a cell with score zero is encountered, yielding the highest scoring local alignment. Because of its quadratic time complexity, it often cannot be practically applied to large-scale problems and is replaced in favor of computationally more efficient alternatives such as (Gotoh, 1982), (Altschul and Erickson, 1986), and (Myers and Miller, 1988).

History

In 1970, Saul B. Needleman and Christian D. Wunsch proposed a heuristic homology algorithm for sequence alignment, also referred to as the Needleman–Wunsch algorithm. It is a global alignment algorithm that requires <math>O(mn)</math> calculation steps (<math>m</math> and <math>n</math> are the lengths of the two sequences being aligned). It uses the iterative calculation of a matrix for the purpose of showing global alignment. In the following decade, Sankoff, Reichert, Beyer and others formulated alternative heuristic algorithms for analyzing gene sequences. Sellers introduced a system for measuring sequence distances. In 1976, Waterman et al. added the concept of gaps into the original measurement system. In 1981, Smith and Waterman published their Smith–Waterman algorithm for calculating local alignment.

The Smith–Waterman algorithm is fairly demanding of time: To align two sequences of lengths <math>m</math> and <math>n</math>, <math>O(m^2n+n^2m)</math> time is required. Gotoh and applied this method. later showed how to run Gotoh's algorithm cache-efficiently in linear space using a different recursive divide-and-conquer strategy than the one used by Hirschberg. The resulting algorithm runs faster than Myers and Miller's algorithm in practice due to its superior cache performance. and an SSE2 vectorization developed by Farrar making optimal protein sequence database searches quite practical. A library, SSW, extends Farrar's implementation to return alignment information in addition to the optimal Smith–Waterman score.

Accelerated versions

FPGA

Cray demonstrated acceleration of the Smith–Waterman algorithm using a reconfigurable computing platform based on FPGA chips, with results showing up to 28x speed-up over standard microprocessor-based solutions. Another FPGA-based version of the Smith–Waterman algorithm shows FPGA (Virtex-4) speedups up to 100x over a 2.2&nbsp;GHz Opteron processor. The TimeLogic DeCypher and CodeQuest systems also accelerate Smith–Waterman and Framesearch using PCIe FPGA cards.

A 2011 Master's thesis includes an analysis of FPGA-based Smith–Waterman acceleration.

In a 2016 publication OpenCL code compiled with Xilinx SDAccel accelerates genome sequencing, beats CPU/GPU performance/W by 12-21x, a very efficient implementation was presented. Using one PCIe FPGA card equipped with a Xilinx Virtex-7 2000T FPGA, the performance per Watt level was better than CPU/GPU by 12-21x.

GPU

Lawrence Livermore National Laboratory and the United States (US) Department of Energy's Joint Genome Institute implemented an accelerated version of Smith–Waterman local sequence alignment searches using graphics processing units (GPUs) with preliminary results showing a 2x speed-up over software implementations. A similar method has already been implemented in the Biofacet software since 1997, with the same speed-up factor.

Several GPU implementations of the algorithm in NVIDIA's CUDA C platform are also available. When compared to the best known CPU implementation (using SIMD instructions on the x86 architecture), by Farrar, the performance tests of this solution using a single NVidia GeForce 8800 GTX card show a slight increase in performance for smaller sequences, but a slight decrease in performance for larger ones. However, the same tests running on dual NVidia GeForce 8800 GTX cards are almost twice as fast as the Farrar implementation for all sequence sizes tested.

A newer GPU CUDA implementation of SW is now available that is faster than previous versions and also removes limitations on query lengths. See CUDASW++.

Eleven different SW implementations on CUDA have been reported, three of which report speedups of 30X.

Finally, other GPU-accelerated implementations of the Smith-Waterman can be found in NVIDIA Parabricks, NVIDIA's software suite for genome analysis.

SIMD

In 2000, a fast implementation of the Smith–Waterman algorithm using the single instruction, multiple data (SIMD) technology available in Intel Pentium MMX processors and similar technology was described in a publication by Rognes and Seeberg. In contrast to the Wozniak (1997) approach, the new implementation was based on vectors parallel with the query sequence, not diagonal vectors. The company Sencel Bioinformatics has applied for a patent covering this approach. Sencel is developing the software further and provides executables for academic use free of charge.

A SSE2 vectorization of the algorithm (Farrar, 2007) is now available providing an 8-16-fold speedup on Intel/AMD processors with SSE2 extensions. which is available under the GNU Affero General Public License. In parallel, this software compares residues from sixteen different database sequences to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. It is faster than BLAST when using the BLOSUM50 matrix.

An implementation of Smith–Waterman named diagonalsw, in C and C++, uses SIMD instruction sets (SSE4.1 for the x86 platform and AltiVec for the PowerPC platform). It is released under an open-source MIT License.

Cell Broadband Engine

In 2008, Farrar described a port of the Striped Smith–Waterman