Scale-invariant feature transform

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Although the SIFT algorithm was previously protected by a patent, its patent expired in 2020.

Keypoint localization

thumb|After scale space extrema are detected (their location being shown in the uppermost image) the SIFT algorithm discards low-contrast keypoints (remaining points are shown in the middle image) and then filters out those located on edges. Resulting set of keypoints is shown on last image. Scale-space extrema detection produces too many keypoint candidates, some of which are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data for accurate location, scale, and ratio of principal curvatures. This information allows the rejection of points which are low contrast (and are therefore sensitive to noise) or poorly localized along an edge.

Interpolation of nearby data for accurate position

First, for each candidate keypoint, interpolation of nearby data is used to accurately determine its position. The initial approach was to just locate each keypoint at the location and scale of the candidate keypoint. The threshold of 0.2 was empirically chosen, and by replacing the fixed threshold with one systematically calculated, matching results can be improved. and absolute pose from only two keypoints, an often disregarded but useful measurement available in SIFT. These orientation measurements reduce the number of required correspondences, further increasing robustness exponentially.

Panorama stitching

SIFT feature matching can be used in image stitching for fully automated panorama reconstruction from non-panoramic images. The SIFT features extracted from the input images are matched against each other to find k nearest-neighbors for each feature. These correspondences are then used to find m candidate matching images for each image. Homographies between pairs of images are then computed using RANSAC and a probabilistic model is used for verification. Because there is no restriction on the input images, graph search is applied to find connected components of image matches such that each connected component will correspond to a panorama. Finally for each connected component bundle adjustment is performed to solve for joint camera parameters, and the panorama is rendered using multi-band blending. Because of the SIFT-inspired object recognition approach to panorama stitching, the resulting system is insensitive to the ordering, orientation, scale and illumination of the images. The input images can contain multiple panoramas and noise images (some of which may not even be part of the composite image), and panoramic sequences are recognized and rendered as output.

3D SIFT-like descriptors for human action recognition

Extensions of the SIFT descriptor to 2+1-dimensional spatio-temporal data in context of human action recognition in video sequences have been studied. is a pure image descriptor defined by performing all image measurements underlying the pure image descriptor in SIFT by Gaussian derivative responses as opposed to derivative approximations in an image pyramid as done in regular SIFT. In this way, discretization effects over space and scale can be reduced to a minimum allowing for potentially more accurate image descriptors. In Lindeberg (2015)

References

</references>

External links

Related studies:

Andrea Maricela Plaza Cordero, Jorge Luis Zambrano-Martinez, " Estudio y Selección de las Técnicas SIFT, SURF y ASIFT de Reconocimiento de Imágenes para el Diseño de un Prototipo en Dispositivos Móviles", 15º Concurso de Trabajos Estudiantiles, EST 2012
Lazebnik, S., Schmid, C., and Ponce, J., Semi-Local Affine Parts for Object Recognition, BMVC, 2004.

Tutorials:

Scale-Invariant Feature Transform (SIFT) in Scholarpedia
A simple step by step guide to SIFT
"The Anatomy of the SIFT Method" in Image Processing On Line, a detailed study of every step of the algorithm with an open source implementation and a web demo to try different parameters

Implementations:

Rob Hess's implementation of SIFT accessed 21 Nov 2012
ASIFT (Affine SIFT): large viewpoint matching with SIFT, with source code and online demonstration
VLFeat, an open source computer vision library in C (with a MEX interface to MATLAB), including an implementation of SIFT
LIP-VIREO , A toolkit for keypoint feature extraction (binaries for Windows, Linux and SunOS), including an implementation of SIFT
(Parallel) SIFT in C#, SIFT algorithm in C# using Emgu CV and also a modified parallel version of the algorithm.
DoH & LoG + affine, Blob detector adapted from a SIFT toolbox
ezSIFT: an easy-to-use standalone SIFT implementation in C/C++. A self-contained open-source SIFT implementation which does not require other libraries.
A 3D SIFT implementation: detection and matching in volumetric images.