Trace (linear algebra)

In linear algebra, the trace of a square matrix , denoted , is defined as a sum of the elements on its main diagonal, <math>a_{11} + a_{22} + \dots + a_{nn}</math>. It is only defined for a square matrix ().

It can be shown that the trace of a matrix is equal to the sum of its eigenvalues (counted with algebraic multiplicities), see below. Also, for any matrices and of the same size. Thus, similar matrices have the same trace. As a consequence, one can define the trace of a linear operator mapping a finite-dimensional vector space into itself, since all matrices describing such an operator with respect to a basis are similar.

The trace is related to the derivative of the determinant (see Jacobi's formula).

Definition

The trace of an square matrix is defined as

<math display="block">\operatorname{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii} = a_{11} + a_{22} + \dots + a_{nn}</math>

where denotes the entry on the row and column of . The entries of can be real numbers, complex numbers, or more generally elements of a field . The trace is not defined for non-square matrices.

Example

Let be a matrix, with

<math display="block">\mathbf{A} =

\begin{pmatrix}

a_{11} & a_{12} & a_{13} \\

a_{21} & a_{22} & a_{23} \\

a_{31} & a_{32} & a_{33}

\end{pmatrix} =

\begin{pmatrix}

1 & 0 & 3 \\

11 & 5 & 2 \\

6 & 12 & -5

\end{pmatrix}

</math>

Then

<math display="block">\operatorname{tr}(\mathbf{A}) = \sum_{i=1}^{3} a_{ii} = a_{11} + a_{22} + a_{33} = 1 + 5 + (-5) = 1.</math>

Properties

Basic properties

The trace is a linear mapping. That is, Furthermore, as noted in the above formula, . These demonstrate the positive-definiteness and symmetry required of an inner product; it is common to call the Frobenius inner product of and . This is a natural inner product on the vector space of all real matrices of fixed dimensions. The norm derived from this inner product is called the Frobenius norm, and it satisfies a submultiplicative property, as can be proven with the Cauchy–Schwarz inequality:

<math display="block">0 \leq

\left[\operatorname{tr}(\mathbf{A} \mathbf{B})\right]^2 \leq

\operatorname{tr}\left(\mathbf{A}^\mathsf{T} \mathbf{A}\right) \operatorname{tr}\left(\mathbf{B}^\mathsf{T} \mathbf{B}\right) ,</math>

if and are real matrices such that is a square matrix. The Frobenius inner product and norm arise frequently in matrix calculus and statistics.

The Frobenius inner product may be extended to a hermitian inner product on the complex vector space of all complex matrices of a fixed size, by replacing by its complex conjugate.

The symmetry of the Frobenius inner product may be phrased more directly as follows: the matrices in the trace of a product can be switched without changing the result. If and are and real or complex matrices, respectively, then

This is notable both for the fact that does not usually equal , and also since the trace of either does not usually equal . The similarity-invariance of the trace, meaning that for any square matrix and any invertible matrix of the same dimensions, is a fundamental consequence. This is proved by

\operatorname{tr}\left(\mathbf{P}^{-1}(\mathbf{A}\mathbf{P})\right) =

\operatorname{tr}\left((\mathbf{A} \mathbf{P})\mathbf{P}^{-1}\right) =

\operatorname{tr}(\mathbf{A}).

</math>

Similarity invariance is the crucial property of the trace in order to discuss traces of linear transformations as below.

Additionally, for real column vectors <math>\mathbf{a}\in\mathbb{R}^n</math> and <math>\mathbf{b}\in\mathbb{R}^n</math>, the trace of the outer product is equivalent to the inner product:

Cyclic property

More generally, the trace is invariant under circular shifts, that is,

This is known as the cyclic property.

Arbitrary permutations are not allowed: in general,

<math display="block">\operatorname{tr}(\mathbf{A}\mathbf{B}\mathbf{C}) \ne \operatorname{tr}(\mathbf{A}\mathbf{C}\mathbf{B}) .</math>

However, if products of three symmetric matrices are considered, any permutation is allowed, since:

<math display="block">\operatorname{tr}(\mathbf{A}\mathbf{B}\mathbf{C}) = \operatorname{tr}\left(\left(\mathbf{A}\mathbf{B}\mathbf{C}\right)^{\mathsf T}\right) = \operatorname{tr}(\mathbf{C}\mathbf{B}\mathbf{A}) = \operatorname{tr}(\mathbf{A}\mathbf{C}\mathbf{B}),</math>

where the first equality is because the traces of a matrix and its transpose are equal. Note that this is not true in general for more than three factors.

Trace of a Kronecker product

The trace of the Kronecker product of two matrices is the product of their traces:

<math display="block">\operatorname{tr}(\mathbf{A} \otimes \mathbf{B}) = \operatorname{tr}(\mathbf{A})\operatorname{tr}(\mathbf{B}).</math>

Characterization of the trace

The following three properties:

<math display="block">\begin{align}

\operatorname{tr}(\mathbf{A} + \mathbf{B}) &= \operatorname{tr}(\mathbf{A}) + \operatorname{tr}(\mathbf{B}), \\

\operatorname{tr}(c\mathbf{A}) &= c \operatorname{tr}(\mathbf{A}), \\

\operatorname{tr}(\mathbf{A}\mathbf{B}) &= \operatorname{tr}(\mathbf{B}\mathbf{A}),

\end{align}</math>

characterize the trace up to a scalar multiple; in other words: If <math>f</math> is a linear functional on the space of square matrices that satisfies <math>f(xy) = f(yx),</math> then <math>f</math> and <math>\operatorname{tr}</math> are proportional.

For <math>n\times n</math> matrices, imposing the normalization <math>f(\mathbf{I}) = n</math> makes <math>f</math> equal to the trace.

Trace as the sum of eigenvalues

Given any matrix , there is

where are the eigenvalues of counted with algebraic multiplicity. This holds true even if is a real matrix and some (or all) of the eigenvalues are complex numbers, or more generally over any field with eigenvalues taken in an algebraic closure. The identity follows from the fact that is always similar to its Jordan form, an upper triangular matrix having on the main diagonal, together with the similarity-invariance of the trace discussed above. In contrast, the determinant of is the product of its eigenvalues; that is,

<math display="block">\det(\mathbf{A}) = \prod_i \lambda_i.</math>

Trace of commutator

When both and are matrices, the trace of the (ring-theoretic) commutator of and vanishes: , because and is linear. One can state this as "the trace is a map of Lie algebras from operators to scalars", as the commutator of scalars is trivial (it is an Abelian Lie algebra). In particular, using similarity invariance, it follows that the identity matrix is never similar to the commutator of any pair of matrices.

Conversely, any square matrix with zero trace is a linear combination of the commutators of pairs of matrices. Moreover, any square matrix with zero trace is unitarily equivalent to a square matrix with diagonal consisting of all zeros.

Traces of special kinds of matrices

Relationship to the characteristic polynomial

The trace of an <math>n \times n</math> matrix <math>A</math> is the coefficient of <math>t^{n-1}</math> in the characteristic polynomial, possibly changed of sign, according to the convention in the definition of the characteristic polynomial.

Derivative relationships

If is a square matrix with small entries and denotes the identity matrix, then we have approximately

<math display="block">\det(\mathbf{I}+\mathbf{a})\approx 1 + \operatorname{tr}(\mathbf{a}).</math>

Precisely this means that the trace is the derivative of the determinant function at the identity matrix. Jacobi's formula

<math display="block">d\det(\mathbf{A}) = \operatorname{tr} \big(\operatorname{adj}(\mathbf{A})\cdot d\mathbf{A}\big)</math>

is more general and describes the differential of the determinant at an arbitrary square matrix, in terms of the trace and the adjugate of the matrix.

From this (or from the connection between the trace and the eigenvalues), one can derive a relation between the trace function, the matrix exponential function, and the determinant:<math display="block">\det(\exp(\mathbf{A})) = \exp(\operatorname{tr}(\mathbf{A})).</math>

A related characterization of the trace applies to linear vector fields. Given a matrix , define a vector field on by . The components of this vector field are linear functions (given by the rows of ). Its divergence is a constant function, whose value is equal to .

By the divergence theorem, one can interpret this in terms of flows: if represents the velocity of a fluid at location and is a region in , the net flow of the fluid out of is given by , where is the volume of .

The trace is a linear operator, hence it commutes with the derivative:

<math display="block">d \operatorname{tr} (\mathbf{X}) = \operatorname{tr}(d\mathbf{X}) .</math>

Trace of a linear operator

In general, given some linear map of finite rank (where is a vector space), we can define the trace of this map by considering the trace of a matrix representation of , that is, choosing a basis for and describing as a matrix relative to this basis, and taking the trace of this square matrix. The result will not depend on the basis chosen, since different bases will give rise to similar matrices, allowing for the possibility of a basis-independent definition for the trace of a linear map.

Such a definition can be given using the canonical isomorphism between the space of linear endomorphisms of of finite rank and , where is the dual space of . Let be in and let be in . Then the trace of the decomposable element is defined to be ; the trace of a general element is defined by linearity. The trace of a linear map of finite rank can then be defined as the trace, in the above sense, of the element of corresponding to f under the above-mentioned canonical isomorphism. Using an explicit basis for and the corresponding dual basis for , one can show that this gives the same definition of the trace as given above.

In the language of tensor products

Given a vector space over the field , there is a natural bilinear map given by sending to the scalar . The universal property of the tensor product automatically implies that this bilinear map is induced by a linear functional on .

Similarly, there is a natural bilinear map given by sending to the linear map . The universal property of the tensor product, just as used previously, says that this bilinear map is induced by a linear map . If is finite-dimensional, then this linear map is a linear isomorphism.<blockquote>Given any matrix <math>\boldsymbol W\in \R^{n\times n}</math>, and any random <math>\boldsymbol u\in \R^n</math> with <math>\mathbb E[\boldsymbol u\boldsymbol u^\intercal] = \mathbf I</math>, we have <math>\mathbb E[\boldsymbol u^\intercal\boldsymbol W\boldsymbol u ] = \operatorname{tr}\boldsymbol W</math>. </blockquote>

For a proof expand the expectation directly.

Usually, the random vector is sampled from <math>\operatorname N(\mathbf 0,\mathbf I)</math> (normal distribution) or <math>\{\pm n^{-1/2}\}^n</math> (Rademacher distribution).

More sophisticated stochastic estimators of trace have been developed.

Applications

If a 2 x 2 real matrix has zero trace, its square is a diagonal matrix.

The trace of a 2 × 2 complex matrix is used to classify Möbius transformations. First, the matrix is normalized to make its determinant equal to one. Then, if the square of the trace is 4, the corresponding transformation is parabolic. If the square is in the interval , it is elliptic. Finally, if the square is greater than 4, the transformation is loxodromic. See classification of Möbius transformations.

The trace is used to define characters of group representations. Two representations of a group are equivalent (up to change of basis on ) if for all .

The trace also plays a central role in the distribution of quadratic forms.

The trace can be used to classify von Neumann Algebra factors. Generalizations of the trace can be used to define noncommutative integration theory.

Lie algebra

The trace is a map of Lie algebras <math>\operatorname{tr}:\mathfrak{gl}_n\to K</math> from the Lie algebra <math>\mathfrak{gl}_n</math> of linear operators on an -dimensional space ( matrices with entries in <math>K</math>) to the Lie algebra of scalars; as is Abelian (the Lie bracket vanishes), the fact that this is a map of Lie algebras is exactly the statement that the trace of a bracket vanishes:

<math display="block">\operatorname{tr}([\mathbf{A}, \mathbf{B}]) = 0 \text{ for each }\mathbf A,\mathbf B\in\mathfrak{gl}_n.</math>

The kernel of this map consists of matrices whose trace is zero, often called or , and these matrices form the simple Lie algebra <math>\mathfrak{sl}_n</math>, which is the Lie algebra of the special linear group of matrices with determinant 1. The special linear group consists of the matrices which do not change volume, while the special linear Lie algebra is the matrices which do not alter volume of infinitesimal sets.

In fact, there is an internal direct sum decomposition <math>\mathfrak{gl}_n = \mathfrak{sl}_n \oplus K</math> of operators/matrices into traceless operators/matrices and scalar operators/matrices. The projection map onto scalar operators can be expressed in terms of the trace, concretely as:

<math display="block">\mathbf{A} \mapsto \frac{1}{n}\operatorname{tr}(\mathbf{A})\mathbf{I}.</math>

Formally, one can compose the trace (the counit map) with the unit map <math>K\to\mathfrak{gl}_n</math> of "inclusion of scalars" to obtain a map <math>\mathfrak{gl}_n\to\mathfrak{gl}_n</math> mapping onto scalars, and multiplying by . Dividing by makes this a projection, yielding the formula above.

In terms of short exact sequences, one has

<math display="block">0 \to \mathfrak{sl}_n \to \mathfrak{gl}_n \overset{\operatorname{tr{\to} K \to 0</math>

which is analogous to

<math display="block">1 \to \operatorname{SL}_n \to \operatorname{GL}_n \overset{\det}{\to} K^* \to 1</math>

(where <math>K^*=K\setminus\{0\}</math>) for Lie groups. However, the trace splits naturally (via <math>1/n</math> times scalars) so <math>\mathfrak{gl}_n=\mathfrak{sl}_n\oplus K</math>, but the splitting of the determinant would be as the th root times scalars, and this does not in general define a function, so the determinant does not split and the general linear group does not decompose:

<math display="block">\operatorname{GL}_n \neq \operatorname{SL}_n \times K^*.</math>

Bilinear forms

The bilinear form (where , are square matrices)

<math display="block">B(\mathbf{X}, \mathbf{Y}) = \operatorname{tr}(\operatorname{ad}(\mathbf{X})\operatorname{ad}(\mathbf{Y}))</math>

: where <math>\operatorname{ad}(\mathbf{X})\mathbf{Y} = [\mathbf{X}, \mathbf{Y}] = \mathbf{X}\mathbf{Y} - \mathbf{Y}\mathbf{X}</math>

: and for orientation, if <math>\operatorname{det} \mathbf{Y} \ne 0 </math>

:: then <math>\operatorname{ad}(\mathbf{X}) = \mathbf{X} - \mathbf{Y}\mathbf{X}\mathbf{Y}^{-1} ~.</math>

<math> B(\mathbf{X}, \mathbf{Y})</math> is called the Killing form; it is used to classify Lie algebras.

The trace defines a bilinear form:

<math display="block">(\mathbf{X}, \mathbf{Y}) \mapsto \operatorname{tr}(\mathbf{X}\mathbf{Y}) ~.</math>

The form is symmetric, non-degenerate and associative in the sense that:

<math display="block">\operatorname{tr}(\mathbf{X}[\mathbf{Y}, \mathbf{Z}]) = \operatorname{tr}([\mathbf{X}, \mathbf{Y}]\mathbf{Z}).</math>

For a complex simple Lie algebra (such as ), every such bilinear form is proportional to each other; in particular, to the Killing form.

Two matrices and are said to be trace orthogonal if

<math display="block">\operatorname{tr}(\mathbf{X}\mathbf{Y}) = 0.</math>

There is a generalization to a general representation <math>(\rho,\mathfrak{g},V)</math> of a Lie algebra <math>\mathfrak{g}</math>, such that <math>\rho</math> is a homomorphism of Lie algebras <math>\rho: \mathfrak{g} \rightarrow \text{End}(V).</math> The trace form <math>\text{tr}_V</math> on <math>\text{End}(V)</math> is defined as above. The bilinear form

<math display="block">\phi(\mathbf{X},\mathbf{Y}) = \text{tr}_V(\rho(\mathbf{X})\rho(\mathbf{Y}))</math>

is symmetric and invariant due to cyclicity.

Generalizations

The concept of trace of a matrix is generalized to the trace class of compact operators on Hilbert spaces, and the analog of the Frobenius norm is called the Hilbert–Schmidt norm.

If is a trace-class operator, then for any orthonormal basis <math>\{e_n\}_{n=1}</math>, the trace is given by

<math display="block">\operatorname{tr}(K) = \sum_n \left\langle e_n, Ke_n \right\rangle,</math>

and is finite and independent of the orthonormal basis. This trace can be generalized to von Neumann Algebras.

The Dixmier trace generalizes the usual trace beyond trace-class operators.

The partial trace is another generalization of the trace that is operator-valued. The trace of a linear operator <math>Z</math> which lives on a product space <math>A\otimes B</math> is equal to the partial traces over <math>A</math> and <math>B</math>:

<math display="block">\operatorname{tr}(Z) = \operatorname{tr}_A \left(\operatorname{tr}_B(Z)\right) = \operatorname{tr}_B \left(\operatorname{tr}_A(Z)\right).</math>

For more properties and a generalization of the partial trace, see traced monoidal categories.

If <math>A</math> is a general associative algebra over a field <math>k</math>, then a trace on <math>A</math> is often defined to be any functional <math>\operatorname{tr}:A\to k</math> which vanishes on commutators; <math>\operatorname{tr}([a,b])=0</math> for all <math>a,b\in A</math>. Such a trace is not uniquely defined; it can always at least be modified by multiplication by a nonzero scalar.

A supertrace is the generalization of a trace to the setting of superalgebras.

The operation of tensor contraction generalizes the trace to arbitrary tensors.

Gomme and Klein (2011) define a matrix trace operator <math>\operatorname{trm}</math> that operates on block matrices and use it to compute second-order perturbation solutions to dynamic economic models without the need for tensor notation.