Multivariate random variable

In probability and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

Random vectors are often used as the underlying implementation of various types of aggregate random variables, e.g. a random matrix, random tree, random sequence, stochastic process, etc.

Formally, a multivariate random variable is a column vector <math> \mathbf{X} = (X_1,\dots,X_n)^\mathsf{T} </math> (or its transpose, which is a row vector) whose components are random variables on the probability space <math>(\Omega, \mathcal{F}, P)</math>, where <math>\Omega</math> is the sample space, <math>\mathcal{F}</math> is the sigma-algebra (the collection of all events), and <math>P</math> is the probability measure (a function returning each event's probability).

Probability distribution

Every random vector gives rise to a probability measure on <math>\mathbb{R}^n</math> with the Borel algebra as the underlying sigma-algebra. This measure is also known as the joint probability distribution, the joint distribution, or the multivariate distribution of the random vector.

The distributions of each of the component random variables <math>X_i</math> are called marginal distributions. The conditional probability distribution of <math>X_i</math> given <math>X_j</math> is the probability distribution of <math>X_i</math> when <math>X_j</math> is known to be a particular value.

The cumulative distribution function <math>F_{\mathbf{X : \R^n \mapsto [0,1]</math> of a random vector <math>\mathbf{X}=(X_1,\dots,X_n)^\mathsf{T} </math> is defined as

|cellpadding= 6

|border

|border colour = #0073CF

|background colour=#F5FFFA

where <math>\mathbf{x} = (x_1, \dots, x_n)^\mathsf{T}</math>.

Operations on random vectors

Random vectors can be subjected to the same kinds of algebraic operations as can non-random vectors: addition, subtraction, multiplication by a scalar, and the taking of inner products.

Affine transformations

Similarly, a new random vector <math>\mathbf{Y}</math> can be defined by applying an affine transformation <math>g\colon \mathbb{R}^n \to \mathbb{R}^n</math> to a random vector <math>\mathbf{X}</math>:

:<math>\mathbf{Y}=\mathbf{A}\mathbf{X}+b</math>, where <math>\mathbf{A}</math> is an <math>n \times n</math> matrix and <math>b</math> is an <math>n \times 1</math> column vector.

If <math>\mathbf{A}</math> is an invertible matrix and <math>\textstyle\mathbf{X}</math> has a probability density function <math>f_{\mathbf{X</math>, then the probability density of <math>\mathbf{Y}</math> is

:<math>f_{\mathbf{Y(y)=\frac{f_{\mathbf{X(\mathbf{A}^{-1}(y-b))}{|\det\mathbf{A}|}</math>.

Invertible mappings

More generally we can study invertible mappings of random vectors.

Let <math>g</math> be a one-to-one mapping from an open subset <math>\mathcal{D}</math> of <math>\mathbb{R}^n</math> onto a subset <math>\mathcal{R}</math> of <math>\mathbb{R}^n</math>, let <math>g</math> have continuous partial derivatives in <math>\mathcal{D}</math> and let the Jacobian determinant <math>\det\left (\frac{\partial \mathbf{y{\partial \mathbf{x\right )</math> of <math>g</math> be zero at no point of <math>\mathcal{D}</math>. Assume that the real random vector <math>\mathbf{X}</math> has a probability density function <math>f_{\mathbf{X(\mathbf{x})</math> and satisfies <math> P(\mathbf{X} \in \mathcal{D}) = 1</math>. Then the random vector <math>\mathbf{Y}=g(\mathbf{X})</math> is of probability density

:<math>\left. f_{\mathbf{Y(\mathbf{y})=\frac{f_{\mathbf{X(\mathbf{x})}{\left |\det\left (\frac{\partial \mathbf{y{\partial \mathbf{x\right )\right |} \right |_{\mathbf{x}=g^{-1}(\mathbf{y})} \mathbf{1}(\mathbf{y} \in R_\mathbf{Y})</math>

where <math>\mathbf{1}</math> denotes the indicator function and set <math>R_\mathbf{Y} = \{ \mathbf{y} = g(\mathbf{x}): f_{\mathbf{X(\mathbf{x}) > 0 \} \subseteq \mathcal{R} </math> denotes support of <math>\mathbf{Y}</math>.

Expected value

The expected value or mean of a random vector <math>\mathbf{X}</math> is a fixed vector <math>\operatorname{E}[\mathbf{X}]</math> whose elements are the expected values of the respective random variables.

Covariance and cross-covariance

Definitions

The covariance matrix (also called second central moment or variance-covariance matrix) of an <math>n \times 1</math> random vector is an <math>n \times n</math> matrix whose (i,j)th element is the covariance between the i th and the j th random variables. The covariance matrix is the expected value, element by element, of the <math>n \times n</math> matrix computed as <math>[\mathbf{X}-\operatorname{E}[\mathbf{X}]] [\mathbf{X}-\operatorname{E}[\mathbf{X}]]^T</math>, where the superscript T refers to the transpose of the indicated vector:

:<math>\operatorname{E}[\mathbf{X}^{T}A\mathbf{X}] = \operatorname{E}[\mathbf{X}]^{T}A\operatorname{E}[\mathbf{X}] + \operatorname{tr}(A K_{\mathbf{X}\mathbf{X),</math>

where <math>K_{\mathbf{X}\mathbf{X</math> is the covariance matrix of <math>\mathbf{X}</math> and <math>\operatorname{tr}</math> refers to the trace of a matrix — that is, to the sum of the elements on its main diagonal (from upper left to lower right). Since the quadratic form is a scalar, so is its expectation.

Proof: Let <math>\mathbf{z}</math> be an <math>m \times 1</math> random vector with <math>\operatorname{E}[\mathbf{z}] = \mu</math> and <math>\operatorname{Cov}[\mathbf{z}]= V</math> and let <math>A</math> be an <math>m \times m</math> non-stochastic matrix.

Then based on the formula for the covariance, if we denote <math>\mathbf{z}^T = \mathbf{X}</math> and <math>\mathbf{z}^T A^T = \mathbf{Y}</math>, we see that:

:<math>\operatorname{Cov}[\mathbf{X},\mathbf{Y}] = \operatorname{E}[\mathbf{X}\mathbf{Y}^T]-\operatorname{E}[\mathbf{X}]\operatorname{E}[\mathbf{Y}]^T </math>

Hence

:<math>\begin{align}

\operatorname{E}[XY^T] &= \operatorname{Cov}[X,Y]+\operatorname{E}[X]\operatorname{E}[Y]^T \\

\operatorname{E}[z^T Az] &= \operatorname{Cov}[z^T,z^T A^T] + \operatorname{E}[z^T]\operatorname{E}[z^T A^T ]^T \\

&=\operatorname{Cov}[z^T , z^T A^T] + \mu^T (\mu^T A^T)^T \\

&=\operatorname{Cov}[z^T , z^T A^T] + \mu^T A \mu ,

\end{align}</math>

which leaves us to show that

:<math>\operatorname{Cov}[z^T , z^T A^T ]=\operatorname{tr}(AV).</math>

This is true based on the fact that one can cyclically permute matrices when taking a trace without changing the result (e.g.: <math>\operatorname{tr}(AB) = \operatorname{tr}(BA)</math>).

We see that

:<math>\begin{align}

\operatorname{Cov}[z^T,z^T A^T] &= \operatorname{E} \left[\left(z^T - E(z^T) \right)\left(z^T A^T - E\left(z^T A^T \right) \right)^T \right] \\

&= \operatorname{E} \left[ (z^T - \mu^T) (z^T A^T - \mu^T A^T )^T \right]\\

&= \operatorname{E} \left[ (z - \mu)^T (Az - A\mu) \right].

\end{align}</math>

And since

:<math>\left( {z - \mu } \right)^T \left( {Az - A\mu } \right)</math>

is a scalar, then

:<math>(z - \mu)^T ( Az - A\mu)= \operatorname{tr}\left( {(z - \mu )^T (Az - A\mu )} \right) = \operatorname{tr} \left((z - \mu )^T A(z - \mu ) \right)</math>

trivially. Using the permutation we get:

:<math>\operatorname{tr}\left( {(z - \mu )^T A(z - \mu )} \right) = \operatorname{tr}\left( {A(z - \mu )(z - \mu )^T} \right),</math>

and by plugging this into the original formula we get:

:<math>\begin{align}

\operatorname{Cov} \left[ {z^T,z^T A^T} \right] &= E\left[ {\left( {z - \mu } \right)^T (Az - A\mu)} \right] \\

&= E \left[ \operatorname{tr}\left( A(z - \mu )(z - \mu )^T \right) \right] \\

&= \operatorname{tr} \left( {A \cdot \operatorname{E} \left((z - \mu )(z - \mu )^T \right) } \right) \\

&= \operatorname{tr} (A V).

\end{align}</math>

Expectation of the product of two different quadratic forms

One can take the expectation of the product of two different quadratic forms in a zero-mean Gaussian random vector <math>\mathbf{X}</math> as follows: