Radon–Nikodym theorem

In mathematics, the Radon–Nikodym theorem, named after Johann Radon and Otto M. Nikodym, is a result in measure theory that expresses the relationship between two measures defined on the same measurable space. A measure is a set function that assigns a consistent magnitude to the measurable subsets of a measurable space. Examples of a measure include area and volume, where the subsets are sets of points; or the probability of an event, which is a subset of possible outcomes within a wider probability space.

One way to derive a new measure from one already given is to assign a density to each point of the space, then integrate over the measurable subset of interest. This can be expressed as

:<math>\nu(A) = \int_A f \, d\mu,</math>

where <math>\nu</math> is the new measure being defined for any measurable subset <math>A</math> and the function <math>f</math> is the density at a given point. The integral is with respect to an existing measure <math>\mu</math>, which may often be the canonical Lebesgue measure on the real line <math>\R</math> or the <math>n</math>-dimensional Euclidean space <math>\R^n</math> (corresponding to our standard notions of length, area and volume). For example, if <math>f</math> represented mass density and <math>\mu</math> was the Lebesgue measure in three-dimensional space <math>\R^3</math>, then <math>\nu(A)</math> would equal the total mass in a spatial region <math>A</math>.

The Radon–Nikodym theorem essentially states that, under certain conditions, any measure <math>\nu</math> can be expressed in this way with respect to another measure <math>\mu</math> on the same space. The function <math>f</math> is then called the Radon–Nikodym derivative and is denoted by <math>d\nu/d\mu</math>. An important application is in probability theory, leading to the probability density function of a random variable.

The theorem is named after Johann Radon, who proved the theorem for the special case where the underlying space is <math>\R^n</math> in 1913, and for Otto Nikodym who proved the general case in 1930. In 1936, Hans Freudenthal generalized the Radon–Nikodym theorem by proving the Freudenthal spectral theorem, a result in Riesz space theory; this contains the Radon–Nikodym theorem as a special case.

A Banach space <math>Y</math> is said to have the Radon–Nikodym property if the generalization of the Radon–Nikodym theorem also holds (with the necessary adjustments made) for functions with values in <math>Y</math>. All Hilbert spaces have the Radon–Nikodym property.

Formal description

Radon–Nikodym theorem

The Radon–Nikodym theorem involves a measurable space <math>(X, \Sigma)</math> on which two σ-finite measures are defined, <math>\mu</math> and <math>\nu.</math>

It states that, if <math>\nu \ll \mu</math> (that is, if <math>\nu</math> is absolutely continuous with respect to <math>\mu</math>), then there exists a <math>\Sigma</math>-measurable function <math>f : X \to [0, \infty),</math> such that for any measurable set <math>A \in \Sigma,</math>

Radon–Nikodym derivative

The function <math>f</math> satisfying the above equality is , that is, if <math>g</math> is another function which satisfies the same property, then <math>f = g</math> . The function <math>f</math> is commonly written <math>d\nu/d\mu</math> and is called the . The choice of notation and the name of the function reflects the fact that the function is analogous to a derivative in calculus in the sense that it describes the rate of change of density of one measure with respect to another (the way the Jacobian determinant is used in multivariable integration).

Extension to signed or complex measures

A similar theorem can be proven for signed and complex measures: namely, that if <math>\mu</math> is a nonnegative σ-finite measure, and <math>\nu</math> is a finite-valued signed or complex measure such that <math>\nu \ll \mu,</math> that is, <math>\nu</math> is absolutely continuous with respect to <math>\mu,</math> then there is a <math>\mu</math>-integrable real- or complex-valued function <math>g</math> on <math>X</math> such that for every measurable set <math>A,</math>

Examples

In the following examples, the set <math>X</math> is the real interval <math>[0,1]</math>, and <math>\Sigma</math> is the Borel sigma-algebra on <math>X</math>.

Let <math>\mu</math> be the length measure on <math>X</math>, and let <math>\nu</math> assign to each subset <math>Y</math> of <math>X</math> twice the length of <math>Y</math>. Then <math display="inline">\frac{d\nu}{d\mu} = 2</math>.
Let <math>\mu</math> be the length measure on <math>X</math>, and let <math>\nu</math> assign to each subset <math>Y</math> of <math>X</math> the number of points from the set <math>\{0.1,\dots,0.9\}</math> that are contained in <math>Y</math>. Then <math>\nu</math> is not absolutely continuous with respect to <math>\mu</math> since it assigns non-zero measure to zero-length points. Indeed, there is no derivative <math display="inline">\frac{d\nu}{d\mu}</math>: there is no finite function that, when integrated e.g. from <math>(0.1 - \varepsilon)</math> to <math>(0.1 + \varepsilon)</math>, gives <math>1</math> for all <math>\varepsilon > 0</math>.
<math>\mu = \nu + \delta_0</math>, where <math>\nu</math> is the length measure on <math>X</math> and <math>\delta_0</math> is the Dirac measure on 0 (it assigns a measure of 1 to any set containing 0 and a measure of 0 to any other set). Then, <math>\nu</math> is absolutely continuous with respect to <math>\mu</math>, and <math display="inline">\frac{d\nu}{d\mu} = 1_{X\setminus \{0\</math> – the derivative is 0 at <math>x = 0</math> and 1 at <math>x > 0</math>.

Properties

Let ν, μ, and λ be σ-finite measures on the same measurable space. If ν ≪ λ and μ ≪ λ (ν and μ are both absolutely continuous with respect to λ), then <math display="block"> \frac{d(\nu+\mu)}{d\lambda} = \frac{d\nu}{d\lambda}+\frac{d\mu}{d\lambda} \quad \lambda\text{-almost everywhere}.</math>
If ν ≪ μ ≪ λ, then <math display="block">\frac{d\nu}{d\lambda}=\frac{d\nu}{d\mu}\frac{d\mu}{d\lambda}\quad\lambda\text{-almost everywhere}.</math>
In particular, if μ ≪ ν and ν ≪ μ, then <math display="block"> \frac{d\mu}{d\nu}=\left(\frac{d\nu}{d\mu}\right)^{-1}\quad\nu\text{-almost everywhere}.</math>
If μ ≪ λ and is a μ-integrable function, then <math display="block"> \int_X g\,d\mu = \int_X g\frac{d\mu}{d\lambda}\,d\lambda.</math>
If ν is a finite signed or complex measure, then <math display="block"> {d|\nu|\over d\mu} = \left|{d\nu\over d\mu}\right|. </math>

Applications

Probability theory

The theorem is very important in extending the ideas of probability theory from probability masses and probability densities defined over real numbers to probability measures defined over arbitrary sets. It tells if and how it is possible to change from one probability measure to another. Specifically, the probability density function of a random variable is the Radon–Nikodym derivative of the induced measure with respect to some base measure (usually the Lebesgue measure for continuous random variables).

For example, it can be used to prove the existence of conditional expectation for probability measures. The latter itself is a key concept in probability theory, as conditional probability is just a special case of it.

Financial mathematics

Amongst other fields, financial mathematics uses the theorem extensively, in particular via the Girsanov theorem. Such changes of probability measure are the cornerstone of the rational pricing of derivatives and are used for converting actual probabilities into those of the risk neutral probabilities.

Information divergences

If μ and ν are measures over , and μ ≪ ν

The Kullback–Leibler divergence from ν to μ is defined to be <math display="block"> D_\text{KL}(\mu \parallel \nu) = \int_X \log \left( \frac{d \mu}{d \nu} \right) \; d\mu.</math>
For α > 0, α ≠ 1 the Rényi divergence of order α from ν to μ is defined to be <math display="block">D_\alpha(\mu \parallel \nu) = \frac{1}{\alpha - 1} \log\left(\int_X\left(\frac{d\mu}{d\nu}\right)^{\alpha-1}\; d\mu\right).</math>

The assumption of σ-finiteness

The Radon–Nikodym theorem above makes the assumption that the measure μ with respect to which one computes the rate of change of ν is σ-finite.

Negative example

Here is an example when μ is not σ-finite and the Radon–Nikodym theorem fails to hold.

Consider the Borel σ-algebra on the real line. Let the counting measure, , of a Borel set be defined as the number of elements of if is finite, and otherwise. One can check that is indeed a measure. It is not -finite, as not every Borel set is at most the union of countably many finite sets. Let be the usual Lebesgue measure on this Borel algebra. Then, is absolutely continuous with respect to , since for a set one has only if is the empty set, and then is also zero.

Assume that the Radon–Nikodym theorem holds, that is, for some measurable function one has

:<math>\nu(A) = \int_A f \,d\mu</math>

for all Borel sets. Taking to be a singleton set, , and using the above equality, one finds

:<math> 0 = f(a)</math>

for all real numbers . This implies that the function , and therefore the Lebesgue measure , is zero, which is a contradiction.

Positive result

Assuming <math>\nu\ll\mu,</math> the Radon–Nikodym theorem also holds if <math>\mu</math> is localizable and <math>\nu</math> is accessible with respect to <math>\mu</math>, i.e., <math>\nu(A)=\sup\{\nu(B):B\in{\cal P}(A)\cap\mu^\operatorname{pre}(\R_{\ge0})\}</math> for all <math>A\in\Sigma.</math>