Symmetry of second derivatives

In mathematics, the symmetry of second derivatives (also called the equality of mixed partials) is the fact that exchanging the order of partial derivatives of a multivariate function

<math display="block">f\left(x_1,\, x_2,\, \ldots,\, x_n\right)</math>

does not change the result if some continuity conditions are satisfied (see below); that is, the second-order partial derivatives satisfy the identities

<math display="block">\frac {\partial}{\partial x_i} \left( \frac{\partial f}{\partial x_j} \right) \ = \

\frac {\partial}{\partial x_j} \left( \frac{\partial f}{\partial x_i} \right).

</math>

In other words, the matrix of the second-order partial derivatives, known as the Hessian matrix, is a symmetric matrix.

Sufficient conditions for the symmetry to hold are given by Schwarz's theorem, also called Clairaut's theorem or Young's theorem.

In the context of partial differential equations, it is called the Schwarz integrability condition.

<!--

In physics, however, it is important for the understanding of many phenomena in nature to remove this restrictions and allow functions to violate the Schwarz integrability criterion, which makes them multivalued. The simplest example is the function <math>\arctan\; y/x</math>. At first one defines this with a cut in the complex <math>\left(x,\, y\right)</math>-plane running from 0 to infinity. The cut makes the function single-valued. In complex analysis, however, one thinks of this function as having several 'sheets' (forming a Riemann surface). It is useless until they explain where and how the function violates Schwarz integrability condition.

-->

Formal expressions of symmetry

In symbols, the symmetry may be expressed as:

<math display="block">\frac {\partial}{\partial x} \left( \frac{\partial f}{\partial y} \right) \ = \ \frac {\partial}{\partial y} \left( \frac{\partial f}{\partial x} \right)

\qquad\text{or}\qquad

\frac {\partial^2\! f} {\partial x\,\partial y} \ =\ \frac{\partial^2\! f} {\partial y\,\partial x}.

</math>

Another notation is:

<math display="block">\partial_x\partial_y f = \partial_y\partial_x f \qquad\text{or}\qquad f_{yx} = f_{xy}.</math>

In terms of composition of the differential operator which takes the partial derivative with respect to :

From this relation it follows that the ring of differential operators with constant coefficients, generated by the , is commutative; but this is only true as operators over a domain of sufficiently differentiable functions. It is easy to check the symmetry as applied to monomials, so that one can take polynomials in the as a domain. In fact smooth functions are another valid domain.

In terms of the total derivative, the second derivative of a twice-differentiable function <math>f\colon X\to Y</math> at a point <math>p\in X</math> is a linear map

<math display="block">D^2 f_p \colon X \to \mathcal L(X, Y)</math>

Where <math>\mathcal L(X, Y)</math> is the normed vector space of linear maps from <math>X</math> to <math>Y</math>. By uncurrying, it may be identified with a bilinear form from <math>X \times X</math> to <math>Y</math>. The symmetry of second derivatives then says that this symmetric, ie. <math>D^2f_p (u)(v) = D^2f_p (v)(u)</math> for all <math>u, v \in X</math>. Setting <math>u</math> and <math>v</math> to be the standard basis vectors <math>\mathbf e_i</math> and <math>\mathbf e_j</math> recovers the partial derivatives version. More generally, whenever <math>f</math> is <math>r</math>-times differentiable its derivative at a point <math>p</math> is a symmetric <math>r</math>-linear form <math>D^r f_p \colon X^{\otimes r} \to Y</math>.

History

The result on the equality of mixed partial derivatives under certain conditions has a long history. The list of unsuccessful proposed proofs started with Euler's, published in 1740, although already in 1721 Bernoulli had implicitly assumed the result with no formal justification. Clairaut also published a proposed proof in 1740, with no other attempts until the end of the 18th century. Starting then, for a period of 70 years, a number of incomplete proofs were proposed. The proof of Lagrange (1797) was improved by Cauchy (1823), but assumed the existence and continuity of the partial derivatives <math>\tfrac{\partial^2 f}{\partial x^2}</math> and <math>\tfrac{\partial^2 f}{\partial y^2}</math>. Other attempts were made by P. Blanchet (1841), Duhamel (1856), Sturm (1857), Schlömilch (1862), and Bertrand (1864). Finally in 1867 Lindelöf systematically analyzed all the earlier flawed proofs and was able to exhibit a specific counterexample where mixed derivatives failed to be equal.

Six years after that, Schwarz succeeded in giving the first rigorous proof. Dini later contributed by finding more general conditions than those of Schwarz. Eventually a clean and more general version was found by Jordan in 1883 that is still the proof found in most textbooks. Minor variants of earlier proofs were published by Laurent (1885), Peano (1889 and 1893), J. Edwards (1892), P. Haag (1893), J. K. Whittemore (1898), Vivanti (1899) and Pierpont (1905). Further progress was made in 1907-1909 when E. W. Hobson and W. H. Young found proofs with weaker conditions than those of Schwarz and Dini. In 1918, Carathéodory gave a different proof based on the Lebesgue integral.

Schwarz's theorem

In mathematical analysis, Schwarz's theorem (or Clairaut's theorem on equality of mixed partials) named after Alexis Clairaut and Hermann Schwarz, states that for a function <math>f \colon \Omega \to \mathbb{R}</math> defined on a set <math>\Omega \subset \mathbb{R}^n</math>, if <math>\mathbf{p}\in \mathbb{R}^n</math> is a point such that some neighborhood of <math>\mathbf{p}</math> is contained in <math>\Omega</math> and <math>f</math> has continuous second partial derivatives on that neighborhood of <math>\mathbf{p}</math>, then for all and in <math>\{1, 2 \ldots,\, n\},</math>

\frac{\partial^2}{\partial x_i\, \partial x_j} f(\mathbf{p}) =

\frac{\partial^2}{\partial x_j\, \partial x_i} f(\mathbf{p}).

</math>

The partial derivatives of this function commute at that point.

There exists a version of this theorem where <math>f</math> is only required to be twice differentiable at the point <math>\mathbf{p}</math>.

One easy way to establish this theorem (in the case where <math>n = 2</math>, <math>i = 1</math>, and <math>j = 2</math>, which readily entails the result in general) is by applying Green's theorem to the gradient of <math>f.</math>

An elementary proof for functions on open subsets of the plane is as follows (by a simple reduction, the general case for the theorem of Schwarz easily reduces to the planar case). Let <math>f(x,y)</math> be a differentiable function on an open rectangle <math>\Omega</math> containing a point <math>(a,b)</math> and suppose that <math>df</math> is continuous with continuous <math>\partial_x \partial _y f</math> and <math>\partial_y\partial_x f</math> over <math>\Omega.</math> Define

<math display="block">\begin{align}

u{\left(h,\, k\right)} &= f\left(a{+}h,\, b{+}k\right) - f\left(a{+}h,\, b\right), \\

v{\left(h,\, k\right)} &= f\left(a{+}h,\, b{+}k\right) - f\left(a,\, b{+}k\right), \\

w{\left(h,\, k\right)} &= f\left(a{+}h,\, b{+}k\right) - f\left(a{+}h,\, b\right) - f\left(a,\, b{+}k\right) + f\left(a,\, b\right).

\end{align}</math>

These functions are defined for <math>\left|h\right|, \left|k\right| < \varepsilon</math>, where <math>\varepsilon > 0 </math> and <math>\left[a{-}\varepsilon,\, a{+}\varepsilon\right] \times \left[b{-}\varepsilon,\, b{+}\varepsilon\right]</math> is contained in <math>\Omega.</math>

By the mean value theorem, for fixed and non-zero, <math>\phi'</math> can be found in the open interval <math> (0,1)</math> with

<math display="block">\begin{align}

w{\left(h,\, k\right)}

&= u{\left(h,\, k\right)} - u{\left(0,\, k\right)} = h\, \partial_x u{\left(\theta h,\, k\right)} \\[0.5ex]

&= h \left[\partial_x f{\left(a{+}\theta h,\, b{+}k\right)} - \partial_x f{\left(a{+}\theta h,\, b\right)}\right] \\[0.5ex]

&= hk \, \partial_y \partial_x f{\left(a{+}\theta h,\, b{+}\theta^\prime k\right)}

\\[1ex]

w{\left(h,\, k\right)}

&= v{\left(h,\, k\right)} - v{\left(h,\, 0\right)} = k\,\partial_y v\left(h,\, \phi k\right) \\[0.5ex]

&= k \left[\partial_y f{\left(a{+}h,\, b{+}\phi k\right)} - \partial_y f{\left(a,\, b{+}\phi k\right)}\right] \\[0.5ex]

&= hk\, \partial_x\partial_y f{\left(a{+}\phi^\prime h,\, b{+}\phi k\right)}.

\end{align}</math>

Since <math>h,\,k \neq 0</math>, the first equality below can be divided by <math>hk</math>:

<math display="block">\begin{align}

hk\,\partial_y\partial_x f{\left(a{+}\theta h,\, b{+}\theta^\prime k\right)} &=

hk \, \partial_x\partial_y f{\left(a{+}\phi^\prime h,\, b{+}\phi k\right)}, \\

\partial_y\partial_x f{\left(a{+}\theta h,\, b{+}\theta^\prime k\right)} &=

\partial_x\partial_y f{\left(a{+}\phi^\prime h,\, b{+}\phi k\right)}.

\end{align}</math>

Letting <math>h,\,k</math> tend to zero in the last equality, the continuity assumptions on <math>\partial_y \partial_x f</math> and <math>\partial_x \partial_y f</math> now imply that

\frac{\partial^2}{\partial x\partial y}f\left(a,\, b\right) =

\frac{\partial^2}{\partial y\partial x}f\left(a,\, b\right).

</math>

This account is a straightforward classical method found in many text books, for example in Burkill, Apostol and Rudin.

Sufficiency of twice-differentiability

A weaker condition than the continuity of second partial derivatives (which is implied by the latter) which suffices to ensure symmetry is that all partial derivatives are themselves differentiable. Another strengthening of the theorem, in which existence of the permuted mixed partial is asserted, was provided by Peano in a short 1890 note on Mathesis:

Distribution theory formulation

The theory of distributions (generalized functions) eliminates analytic problems with the symmetry. The derivative of an integrable function can always be defined as a distribution, and symmetry of mixed partial derivatives always holds as an equality of distributions. The use of formal integration by parts to define differentiation of distributions puts the symmetry question back onto the test functions, which are smooth and certainly satisfy this symmetry. In more detail (where f is a distribution, written as an operator on test functions, and φ is a test function),

<math display="block">\begin{align}

\left(D_1 D_2 f\right)[\phi] &= -\left(D_2f\right)\left[D_1\phi\right]

= -\left(D_1 f\right)\left[D_2\phi\right] \\

&= f\left[D_2 D_1\phi\right] = f\left[D_1 D_2\phi\right] \\

&= \left(D_2 D_1 f\right)[\phi].

\end{align}</math>

Another approach, which defines the Fourier transform of a function, is to note that on such transforms partial derivatives become multiplication operators that commute much more obviously.

Requirement of continuity

The requirement of continuity is not essential. For commutativity of the second partials at a point, it is only necessary that the function be twice-differentiable at the point, in the sense of multivariable calculus. That is: the first partials of the function must be differentiable at the point. This is different, however, from just existence of the second partials. Continuity of the second partials is a common sufficient condition (but not a necessary one) to ensure twice-differentiability in this sense.

thumb|right|The function f(x, y), as shown in equation (), does not have symmetric second derivatives at its origin.

An example of non-symmetry is the function (due to Peano)

This can be visualized by the polar form <math>f(r \cos(\theta), r\sin(\theta)) = \frac{r^2 \sin(4\theta)}{4}</math>; it is everywhere continuous, but its derivatives at cannot be computed algebraically. Rather, the limit of difference quotients shows that <math>f_x(0,0) = f_y(0,0) = 0</math>, so the graph <math>z = f(x, y)</math> has a horizontal tangent plane at , and the partial derivatives <math>f_x, f_y</math> exist and are everywhere continuous. However, the second partial derivatives are not continuous at , and the symmetry fails. In fact, along the x-axis the y-derivative is <math>f_y(x,0) = x</math>, and so:

f_{yx}(0,0) =

\lim_{\varepsilon \to 0} \frac{f_y(\varepsilon,0) - f_y(0,0)}{\varepsilon} =

</math>

In contrast, along the y-axis the x-derivative <math>f_x(0,y) = -y</math>, and so <math>f_{xy}(0,0) = -1</math>. That is, <math>f_{yx} \ne f_{xy}</math> at , although the mixed partial derivatives do exist, and at every other point the symmetry does hold.

The above function, written in polar coordinates, can be expressed as

<math display="block">f(r,\, \theta) = \frac{r^2 \sin{4\theta{4},</math>

showing that the function oscillates four times when traveling once around an arbitrarily small loop containing the origin. Intuitively, therefore, the local behavior of the function at (0, 0) cannot be described as a quadratic form, and the Hessian matrix thus fails to be symmetric.

In general, the interchange of limiting operations need not commute. Given two variables near and two limiting processes on

corresponding to making h → 0 first, and to making k → 0 first. It can matter, looking at the first-order terms, which is applied first. This leads to the construction of pathological examples in which second derivatives are non-symmetric. This kind of example belongs to the theory of real analysis where the pointwise value of functions matters. When viewed as a distribution the second partial derivative's values can be changed at an arbitrary set of points as long as this has Lebesgue measure 0. Since in the example the Hessian is symmetric everywhere except , there is no contradiction with the fact that the Hessian, viewed as a Schwartz distribution, is symmetric.

In Lie theory

Consider the first-order differential operators Di to be infinitesimal operators on Euclidean space. That is, Di in a sense generates the one-parameter group of translations parallel to the xi-axis. These groups commute with each other, and therefore the infinitesimal generators do also; the Lie bracket

is this property's reflection. In other words, the Lie derivative of one coordinate with respect to another is zero.

Application to differential forms

The Clairaut-Schwarz theorem is the key fact needed to prove that for every <math>C^\infty</math> (or at least twice differentiable) differential form <math>\omega\in\Omega^k(M)</math>, the second exterior derivative vanishes: <math>d^2\omega := d(d\omega) = 0</math>. This implies that every differentiable exact form (i.e., a form <math>\alpha</math> such that <math>\alpha = d\omega</math> for some form <math>\omega</math>) is closed (i.e., <math>d\alpha = 0</math>), since <math>d\alpha = d(d\omega) = 0</math>.

In the middle of the 18th century, the theory of differential forms was first studied in the simplest case of 1-forms in the plane, i.e. <math>A\,dx + B\,dy</math>, where <math>A</math> and <math>B</math> are functions in the plane. The study of 1-forms and the differentials of functions began with Clairaut's papers in 1739 and 1740. At that stage his investigations were interpreted as ways of solving ordinary differential equations. Formally Clairaut showed that a 1-form <math>\omega = A \, dx + B \, dy</math> on an open rectangle is closed, i.e. if and only <math>\omega</math> has the form <math>df</math> for some function <math>f</math> in the disk. The solution for <math>f</math> can be written by Cauchy's integral formula

while if the closed property <math> d\omega=0</math> is the identity (In modern language this is one version of the Poincaré lemma.)

Notes

References

(reprinted 1978)

Formal expressions of symmetry

History

Schwarz's theorem

Sufficiency of twice-differentiability

Distribution theory formulation

Requirement of continuity

In Lie theory

Application to differential forms

Notes

References

Further reading