Inverse iteration

In numerical analysis, inverse iteration (also known as the inverse power method) is an iterative eigenvalue algorithm. It allows one to find an approximate

eigenvector when an approximation to a corresponding eigenvalue is already known.

The method is conceptually similar to the power method.

It appears to have originally been developed to compute resonance frequencies in the field of structural mechanics.

The inverse power iteration algorithm starts with an approximation <math>\mu</math> for the eigenvalue corresponding to the desired eigenvector and a vector <math>b_0</math>, either a randomly selected vector or an approximation to the eigenvector. The method is described by the iteration

where <math>C_k</math> are some constants usually chosen as <math>C_k= \|(A - \mu I)^{-1}b_k \|. </math> Since eigenvectors are defined up to multiplication by constant, the choice of <math>C_k</math> can be arbitrary in theory; practical aspects of the choice of <math>C_k</math> are discussed below.

At every iteration, the vector <math>b_k</math> is multiplied by the matrix <math>(A - \mu I)^{-1}</math> and normalized.

It is exactly the same formula as in the power method, except replacing the matrix <math>A</math> by <math>(A - \mu I)^{-1}. </math>

The closer the approximation <math>\mu</math> to the eigenvalue is chosen, the faster the algorithm converges; however, incorrect choice of <math>\mu</math> can lead to slow convergence or to the convergence to an eigenvector other than the one desired. In practice, the method is used when a good approximation for the eigenvalue is known, and hence one needs only few (quite often just one) iterations.

Theory and convergence

The basic idea of the power iteration is choosing an initial vector <math>b</math> (either an eigenvector approximation or a random vector) and iteratively calculating <math>Ab, A^{2}b, A^{3}b,...</math>. Except for a set of zero measure, for any initial vector, the result will converge to an eigenvector corresponding to the dominant eigenvalue.

The inverse iteration does the same for the matrix <math>(A - \mu I)^{-1}</math>, so it converges to the eigenvector corresponding to the dominant eigenvalue of the matrix <math>(A - \mu I)^{-1}</math>.

Eigenvalues of this matrix are <math>(\lambda_1 - \mu)^{-1},...,(\lambda_n - \mu)^{-1}, </math> where <math> \lambda_i </math> are eigenvalues of <math>A</math>.

The largest of these numbers corresponds to the smallest of <math>(\lambda_1 - \mu),...,(\lambda_n - \mu). </math> The eigenvectors of <math>A</math> and of <math>(A - \mu I)^{-1}</math> are the same, since

Av=\lambda v \Leftrightarrow

(A-\mu I)v = \lambda v - \mu v \Leftrightarrow

(\lambda - \mu)^{-1} v = (A-\mu I)^{-1} v

</math>

Conclusion: The method converges to the eigenvector of the matrix <math>A</math> corresponding to the closest eigenvalue to <math>\mu .</math>

In particular, taking <math>\mu=0</math> we see that <math>(A)^{-1}b_k </math>

converges to the eigenvector corresponding to the eigenvalue of <math>A^{-1}</math> with the largest magnitude <math>\frac{1}{\lambda _N}</math> and thus can be used to determine the smallest magnitude eigenvalue of <math>A</math> since they are inversely related.

Speed of convergence

Let us analyze the rate of convergence of the method.

The power method is known to converge linearly to the limit, more precisely:

<math display="block"> \mathrm{Distance}( b^\mathrm{ideal}, b^{k}_\mathrm{Power~Method})=O \left( \left| \frac{\lambda_\mathrm{subdominant} }{\lambda_\mathrm{dominant} } \right|^k \right), </math>

hence for the inverse iteration method similar result sounds as:

<math display="block"> \mathrm{Distance}( b^\mathrm{ideal}, b^{k}_\mathrm{Inverse~iteration})=O \left( \left| \frac{\mu -\lambda_{\mathrm{closest~ to~ }\mu} }{\mu - \lambda_{\mathrm{second~ closest~ to~} \mu} } \right|^k \right). </math>

This is a key formula for understanding the method's convergence. It shows that if <math>\mu</math> is chosen close enough to some eigenvalue <math>\lambda </math>, for example <math> \mu- \lambda = \epsilon </math> each iteration will improve the accuracy <math> |\epsilon| /|\lambda +\epsilon - \lambda_{\mathrm{closest~ to~} \lambda} | </math> times. (We use that for small enough <math>\epsilon</math> "closest to <math>\mu</math>" and "closest to <math>\lambda </math>" is the same.) For small enough <math> |\epsilon|</math> it is approximately the same as <math> |\epsilon| /|\lambda - \lambda_{\text{closest to } \lambda}| </math>. Hence if one is able to find <math>\mu </math>, such that the <math> \epsilon </math> will be small enough, then very few iterations may be satisfactory.

Complexity

The inverse iteration algorithm requires solving a linear system or calculation of the inverse matrix.

For non-structured matrices (not sparse, not Toeplitz,...) this requires <math>O(n^{3})</math> operations.

Implementation options

The method is defined by the formula:

There are, however, multiple options for its implementation.

Calculate inverse matrix or solve system of linear equations

We can rewrite the formula in the following way:

emphasizing that to find the next approximation <math> b_{k+1} </math> we may solve a system of linear equations.

There are two options: one may choose an algorithm that solves a linear system, or one may calculate the inverse <math>(A - \mu I)^{-1}</math> and then apply it to the vector.

Both options have complexity O(n3), the exact number depends on the chosen method.

The choice depends also on the number of iterations. Naively, if at each iteration one solves a linear system, the complexity will be k O(n3), where k is number of iterations; similarly, calculating the inverse matrix and applying it at each iteration is of complexity k O(n3).

Note, however, that if the eigenvalue estimate <math>\mu</math> remains constant, then we may reduce the complexity to O(n3) + k O(n2) with either method.

Calculating the inverse matrix once, and storing it to apply at each iteration is of complexity O(n3) + k O(n2).

Storing an LU decomposition of <math>(A - \mu I)</math> and using forward and back substitution to solve the system of equations at each iteration is also of complexity O(n3) + k O(n2).

Inverting the matrix will typically have a greater initial cost, but lower cost at each iteration. Conversely, solving systems of linear equations will typically have a lesser initial cost, but require more operations for each iteration.

Tridiagonalization, Hessenberg form

If it is necessary to perform many iterations (or few iterations, but for many eigenvectors), then it might be wise to bring the matrix to the

upper Hessenberg form first (for symmetric matrix this will be tridiagonal form). Which costs <math display="inline">\frac{10}{3} n^3 + O(n^2)</math> arithmetic operations using a technique based on Householder reduction), with a finite sequence of orthogonal similarity transforms, somewhat like a two-sided QR decomposition. (For QR decomposition, the Householder rotations are multiplied only on the left, but for the Hessenberg case they are multiplied on both left and right.) For symmetric matrices this procedure costs <math display="inline">\frac{4}{3} n^3 + O(n^2)</math> arithmetic operations using a technique based on Householder reduction.