In mathematical analysis, particularly numerical analysis, the rate of convergence and order of convergence of a sequence that converges to a limit are any of several characterizations of how quickly that sequence approaches its limit. These are broadly divided into rates and orders of convergence that describe how quickly a sequence further approaches its limit once it is already close to it, called asymptotic rates and orders of convergence, and those that describe how quickly sequences approach their limits from starting points that are not necessarily close to their limits, called non-asymptotic rates and orders of convergence.

Asymptotic behavior is particularly useful for deciding when to stop a sequence of numerical computations, for instance once a target precision has been reached with an iterative root-finding algorithm, but pre-asymptotic behavior is often crucial for determining whether to begin a sequence of computations at all, since it may be impossible or impractical to ever reach a target precision with a poorly chosen approach. Asymptotic rates and orders of convergence are the focus of this article.

In practical numerical computations, asymptotic rates and orders of convergence follow two common conventions for two types of sequences: the first for sequences of iterations of an iterative numerical method and the second for sequences of successively more accurate numerical discretizations of a target. In formal mathematics, rates of convergence and orders of convergence are often described comparatively using asymptotic notation commonly called "big O notation," which can be used to encompass both of the prior conventions; this is an application of asymptotic analysis.

For iterative methods, a sequence <math>(x_k)</math> that converges to <math>L</math> is said to have asymptotic order of convergence <math>q \geq 1</math> and asymptotic rate of convergence <math>\mu</math> if

:<math>\lim _{k \rightarrow \infty} \frac{\left|x_{k+1}-L\right|}{\left|x_{k}-L\right|^{q=\mu.</math>

Where methodological precision is required, these rates and orders of convergence are known specifically as the rates and orders of Q-convergence, short for quotient-convergence, since the limit in question is a quotient of error terms. Series acceleration methods are techniques for improving the rate of convergence of the sequence of partial sums of a series and possibly its order of convergence, also.

Similar concepts are used for sequences of discretizations. For instance, ideally the solution of a differential equation discretized via a regular grid will converge to the solution of the continuous equation as the grid spacing goes to zero, and if so the asymptotic rate and order of that convergence are important properties of the gridding method. A sequence of approximate grid solutions <math>(y_k)</math> of some problem that converges to a true solution <math>S</math> with a corresponding sequence of regular grid spacings <math>(h_k)</math> that converge to 0 is said to have asymptotic order of convergence <math>q</math> and asymptotic rate of convergence <math>\mu</math> if

<math display="block">\lim _{k \rightarrow \infty} \frac{\left|y_k - S\right|}{h_k^{q=\mu,</math>

where the absolute value symbols stand for a metric for the space of solutions such as the uniform norm. Similar definitions also apply for non-grid discretization schemes such as the polygon meshes of a finite element method or the basis sets in computational chemistry: in general, the appropriate definition of the asymptotic rate <math>\mu</math> will involve the asymptotic limit of the ratio of an approximation error term above to an asymptotic order <math>q</math> power of a discretization scale parameter below.

In general, comparatively, one sequence <math>(a_k)</math> that converges to a limit <math>L_a</math> is said to asymptotically converge more quickly than another sequence <math>(b_k)</math> that converges to a limit <math>L_b</math> if

<math display="block">\lim _{k \rightarrow \infty} \frac{\left|a_k - L_a\right|}{|b_k - L_b|}=0,</math>

and the two are said to asymptotically converge with the same order of convergence if the limit is any positive finite value. The two are said to be asymptotically equivalent if the limit is equal to one. These comparative definitions of rate and order of asymptotic convergence are fundamental in asymptotic analysis and find wide application in mathematical analysis as a whole, including numerical analysis, real analysis, complex analysis, and functional analysis.

Asymptotic rates of convergence for iterative methods

Definitions

Q-convergence

Suppose that the sequence <math>(x_k)</math> of iterates of an iterative method converges to the limit number <math>L</math> as <math>k \rightarrow \infty</math>. The sequence is said to converge with order <math>q</math> to <math>L</math> and with a rate of convergence <math>\mu</math> if the <math>k \rightarrow \infty</math> limit of quotients of absolute differences of sequential iterates <math>x_k, x_{k+1}</math> from their limit <math>L</math> satisfies

<math display="block">\lim_{k \to \infty} \frac{|x_{k+1} - L|}{|x_k - L|^q} = \mu</math>

for some positive constant <math>\mu \in (0, 1)</math> if <math>q = 1</math> and <math>\mu \in (0, \infty)</math> if <math>q > 1</math>. Other more technical rate definitions are needed if the sequence converges but <math display="inline">\lim_{k \to \infty} \frac{|x_{k+1} - L|}{|x_k - L|} = 1</math> or the limit does not exist.

The common names for integer orders of convergence connect to asymptotic big O notation, where the convergence of the quotient implies <math display="inline">|x_{k+1} - L| = O(|x_k - L|^q).</math> These are linear, quadratic, and cubic polynomial expressions when <math>q</math> is 1, 2, and 3, respectively. More precisely, the limits imply the leading order error is exactly <math display="inline">\mu |x_k - L|^q,</math> which can be expressed using asymptotic small o notation as<math display="inline">|x_{k+1} - L| = \mu |x_k - L|^q + o(|x_k - L|^q).</math>

In general, when <math> q > 1</math> for a sequence or for any sequence that satisfies <math display="inline">\lim_{k \to \infty} \frac{|x_{k+1} - L|}{|x_k - L|} = 0,</math> those sequences are said to converge superlinearly (i.e., faster than linearly). A sequence <math>(x_k)</math> that converges to <math>L</math> is said to converge at least R-linearly if there exists an error-bounding sequence <math>(\varepsilon_k)</math> such that <math display="inline">

|x_k - L|\le\varepsilon_k\quad\text{for all }k

</math> and <math> (\varepsilon_k) </math> converges Q-linearly to zero; analogous definitions hold for R-superlinear convergence, R-sublinear convergence, R-quadratic convergence, and so on.

<math display="block">q \approx \frac{\log \left|\displaystyle\frac{x_{k+1} - x_k}{x_k - x_{k-1\right|}{\log \left|\displaystyle\frac{x_k - x_{k-1{x_{k-1} - x_{k-2\right|}.</math>

For numerical approximation of an exact value through a numerical method of order <math>q</math> see.

Accelerating convergence rates

Many methods exist to accelerate the convergence of a given sequence, i.e., to transform one sequence into a second sequence that converges more quickly to the same limit. Such techniques are in general known as "series acceleration" methods. These may reduce the computational costs of approximating the limits of the original sequences. One example of series acceleration by sequence transformation is Aitken's delta-squared process. These methods in general, and in particular Aitken's method, do not typically increase the order of convergence and thus they are useful only if initially the convergence is not faster than linear: if <math>(x_k)</math> converges linearly, Aitken's method transforms it into a sequence <math>(a_k)</math> that still converges linearly (except for pathologically designed special cases), but faster in the sense that <math display="inline">\lim_{k \rightarrow \infty} (a_k-L)/(x_k-L)= 0</math>. On the other hand, if the convergence is already of order &ge; 2, Aitken's method will bring no improvement.

Asymptotic rates of convergence for discretization methods

Definitions

A sequence of discretized approximations <math>(y_k)</math> of some continuous-domain function <math>S</math> that converges to this target, together with a corresponding sequence of discretization scale parameters <math>(h_k)</math> that converge to 0, is said to have asymptotic order of convergence <math>q</math> and asymptotic rate of convergence <math>\mu</math> if

<math display="block">\lim _{k \rightarrow \infty} \frac{\left|y_k - S\right|}{h_k^{q=\mu,</math>

for some positive constants <math>\mu</math> and <math>q</math> and using <math>|x|</math> to stand for an appropriate distance metric on the space of solutions, most often either the uniform norm, the absolute difference, or the Euclidean distance. Discretization scale parameters may be spacings of a regular grid in space or in time, the inverse of the number of points of a grid in one dimension, an average or maximum distance between points in a polygon mesh, the single-dimension spacings of an irregular sparse grid, or a characteristic quantum of energy or momentum in a quantum mechanical basis set.

When all the discretizations are generated using a single common method, it is common to discuss the asymptotic rate and order of convergence for the method itself rather than any particular discrete sequences of discretized solutions. In these cases one considers a single abstract discretized solution <math>y_h</math> generated using the method with a scale parameter <math>h</math> and then the method is said to have asymptotic order of convergence <math>q</math> and asymptotic rate of convergence <math>\mu</math> if

<math display="block">\lim _{h \rightarrow 0} \frac{\left|y_h - S\right|}{h^{q=\mu,</math>

again for some positive constants <math>\mu</math> and <math>q</math> and an appropriate metric <math>|x|.</math> This implies that the error of a discretization asymptotically scales like the discretization's scale parameter to the <math>q</math> power, or <math display="inline">\left|y_h - S \right| = O(h^{q})</math> using asymptotic big O notation. More precisely, it implies the leading order error is <math>\mu h^{q},</math> which can be expressed using asymptotic small o notation as<math display="inline">\left|y_h - S\right| = \mu h^{q} + o(h^{q}).</math>

In some cases multiple rates and orders for the same method but with different choices of scale parameter may be important, for instance for finite difference methods based on multidimensional grids where the different dimensions have different grid spacings or for finite element methods based on polygon meshes where choosing either average distance between mesh points or maximum distance between mesh points as scale parameters may imply different orders of convergence. In some especially technical contexts, discretization methods' asymptotic rates and orders of convergence will be characterized by several scale parameters at once with the value of each scale parameter possibly affecting the asymptotic rate and order of convergence of the method with respect to the other scale parameters.

Example

Consider the ordinary differential equation

:<math> \frac{dy}{dx} = -\kappa y </math>

with initial condition <math>y(0) = y_0</math>. We can approximate a solution to this one-dimensional equation using a sequence <math>(y_n)</math> applying the forward Euler method for numerical discretization using any regular grid spacing <math>h</math> and grid points indexed by <math>n </math> as follows:

:<math> \frac{y_{n+1} - y_n}{h} = -\kappa y_{n}, </math>

which implies the first-order linear recurrence with constant coefficients

:<math> y_{n+1} = y_n(1 - h\kappa). </math>

Given <math>y(0) = y_0</math>, the sequence satisfying that recurrence is the geometric progression

<math display="block"> y_{n} = y_0(1 - h\kappa)^n = y_0\left(1 - nh\kappa + \frac{n(n-1)}{2}h^2\kappa^2 + ....\right). </math>

The exact analytical solution to the differential equation is <math>y = f(x) = y_0\exp(-\kappa x)</math>, corresponding to the following Taylor expansion in <math>nh\kappa </math>:

<math display="block">f(x_n) = f(nh) = y_0\exp(-\kappa nh)

= y_0\left(1 - nh\kappa + \frac{n^2 h^2\kappa^2}{2} + ...\right).</math>

Therefore the error of the discrete approximation at each discrete point is

:<math display="block">|y_n - f(x_n)| = \frac{nh^2\kappa^2}{2} + \ldots</math>

For any specific <math>x = p</math>, given a sequence of forward Euler approximations <math>((y_n)_k)</math>, each using grid spacings <math>h_k</math> that divide <math>p</math> so that <math>n_{p,k} = p/h_k</math>, one has

<math display="block">\lim_{h_k \rightarrow 0} \frac{|y_k(p) - f(p)|}{h_k} = \lim_{h_k \rightarrow 0} \frac{|y_{k, n_{p,k - f(h_k n_{p,k})|}{h_k} = \frac{h_k n_{p,k} \kappa^2}{2} = \frac{p \kappa^2}{2}</math>

for any sequence of grids with successively smaller grid spacings <math>h_k</math>. Thus <math>((y_n)_k)</math> converges to <math>f(x)</math> pointwise with a convergence order <math>q = 1</math> and asymptotic error constant <math>p \kappa^2 / 2</math> at each point <math>p > 0.</math> Similarly, the sequence converges uniformly with the same order and with rate <math>L \kappa^2 / 2</math> on any bounded interval of <math>p \leq L</math>, but it does not converge uniformly on the unbounded set of all positive real values, <math>[0, \infty).</math>

Comparing asymptotic rates of convergence

Definitions

In asymptotic analysis in general, one sequence <math>(a_k)_{k \in \mathbb{N</math> that converges to a limit <math>L</math> is said to asymptotically converge to <math>L</math> with a faster order of convergence than another sequence <math>(b_k)_{k \in \mathbb{N</math> that converges to <math>L</math> in a shared metric space with distance metric <math>|\cdot|,</math> such as the real numbers or complex numbers with the ordinary absolute difference metrics, if

<math display="block">\lim _{k \rightarrow \infty} \frac{\left|a_k - L\right|}{|b_k - L|} = 0,</math>

the two are said to asymptotically converge to <math>L</math> with the same order of convergence if

<math display="block">\lim_{k \rightarrow \infty} \frac{\left|a_k - L\right|}{|b_k - L|} = \mu</math>

for some positive finite constant <math>\mu,</math> and the two are said to asymptotically converge to <math>L</math> with the same rate and order of convergence if

<math display="block">\lim_{k \rightarrow \infty} \frac{\left|a_k - L\right|}{|b_k - L|} = 1.</math>

These comparative definitions of rate and order of asymptotic convergence are fundamental in asymptotic analysis. For the first two of these there are associated expressions in asymptotic O notation: the first is that <math>a_k - L = o(b_k - L)</math> in small o notation and the second is that <math>a_k - L = \Theta(b_k - L) </math> in Knuth notation. The third is also called asymptotic equivalence, expressed <math>a_k - L \sim b_k - L.</math>

Examples

For any two geometric progressions <math>(a r^k)_{k \in \mathbb{N</math> and <math>(b s^k)_{k \in \mathbb{N,</math> with shared limit zero, the two sequences are asymptotically equivalent if and only if both <math>a = b</math> and <math>r = s.</math> They converge with the same order if and only if <math>r = s.</math> <math>(a r^k)</math> converges with a faster order than <math>(b s^k)</math> if and only if <math>r < s.</math> The convergence of any geometric series to its limit has error terms that are equal to a geometric progression, so similar relationships hold among geometric series as well. Any sequence that is asymptotically equivalent to a convergent geometric sequence may be either be said to "converge geometrically" or "converge exponentially" with respect to the absolute difference from its limit, or it may be said to "converge linearly" relative to a logarithm of the absolute difference such as the "number of decimals of precision." The latter is standard in numerical analysis.

For any two sequences of elements proportional to an inverse power of <math>k,</math> <math>(a k^{-n})_{k \in \mathbb{N</math> and <math>(b k^{-m})_{k \in \mathbb{N,</math> with shared limit zero, the two sequences are asymptotically equivalent if and only if both <math>a = b</math> and <math>n = m.</math> They converge with the same order if and only if <math>n = m.</math> <math>(a k^{-n})</math> converges with a faster order than <math>(b k^{-m})</math> if and only if <math>n > m.</math>

For any sequence <math>(a_k)_{k \in \mathbb{N</math> with a limit of zero, its convergence can be compared to the convergence of the shifted sequence <math>(a_{k-1})_{k \in \mathbb{N,</math> rescalings of the shifted sequence by a constant <math>\mu,</math> <math>(\mu a_{k-1})_{k \in \mathbb{N,</math> and scaled <math>q</math>-powers of the shifted sequence, <math>(\mu a_{k-1}^q)_{k \in \mathbb{N.</math> These comparisons are the basis for the Q-convergence classifications for iterative numerical methods as described above: when a sequence of iterate errors from a numerical method <math>(|x_k - L|)_{k \in \mathbb{N</math> is asymptotically equivalent to the shifted, exponentiated, and rescaled sequence of iterate errors <math>(\mu |x_{k-1} - L|^q)_{k \in \mathbb{N,</math> it is said to converge with order <math>q</math> and rate <math>\mu.</math>

Non-asymptotic rates of convergence

Non-asymptotic rates of convergence do not have the common, standard definitions that asymptotic rates of convergence have. Among formal techniques, Lyapunov theory is one of the most powerful and widely applied frameworks for characterizing and analyzing non-asymptotic convergence behavior.

For iterative methods, one common practical approach is to discuss these rates in terms of the number of iterates or the computer time required to reach close neighborhoods of a limit from starting points far from the limit. The non-asymptotic rate is then an inverse of that number of iterates or computer time. In practical applications, an iterative method that required fewer steps or less computer time than another to reach target accuracy will be said to have converged faster than the other, even if its asymptotic convergence is slower. These rates will generally be different for different starting points and different error thresholds for defining the neighborhoods. It is most common to discuss summaries of statistical distributions of these single point rates corresponding to distributions of possible starting points, such as the "average non-asymptotic rate," the "median non-asymptotic rate," or the "worst-case non-asymptotic rate" for some method applied to some problem with some fixed error threshold. These ensembles of starting points can be chosen according to parameters like initial distance from the eventual limit in order to define quantities like "average non-asymptotic rate of convergence from a given distance."

For discretized approximation methods, similar approaches can be used with a discretization scale parameter such as an inverse of a number of grid or mesh points or a Fourier series cutoff frequency playing the role of inverse iterate number, though it is not especially common. For any problem, there is a greatest discretization scale parameter compatible with a desired accuracy of approximation, and it may not be as small as required for the asymptotic rate and order of convergence to provide accurate estimates of the error. In practical applications, when one discretization method gives a desired accuracy with a larger discretization scale parameter than another it will often be said to converge faster than the other, even if its eventual asymptotic convergence is slower.

References