Legendre transformation

[[Image:Legendre transformation.png|thumb|256px|right|The function <math>f(x)</math> is defined on the interval <math display="inline">[a,b]</math>. For a given <math>p</math>, the difference <math>px - f(x)</math> takes the maximum at <math>x'</math>. Thus, the Legendre transformation of <math>f(x)</math> is <math>f^*(p) =p x'-f(x')</math>.]]

In mathematics, the Legendre transformation (or Legendre transform), first introduced by Adrien-Marie Legendre in 1787 when studying the minimal surface problem, is an involutive transformation on real-valued functions that are convex on a real variable. Specifically, if a real-valued multivariable function is convex on one of its independent real variables, then the Legendre transform with respect to this variable is applicable to the function.

In physical problems, the Legendre transform is used to convert functions of one quantity (such as position, pressure, or temperature) into functions of the conjugate quantity (momentum, volume, and entropy, respectively). In this way, it is commonly used in classical mechanics to derive the Hamiltonian formalism out of the Lagrangian formalism (or vice versa) and in thermodynamics to derive the thermodynamic potentials, as well as in the solution of differential equations of several variables.

For sufficiently smooth functions on the real line, the Legendre transform <math>f^*</math> of a function <math>f</math> can be specified, up to an additive constant, by the condition that the functions' first derivatives are inverse functions of each other. This can be expressed in Euler's derivative notation as

<math display="block">Df(\cdot) = \left( D f^* \right)^{-1}(\cdot)~,</math> where <math>D</math> is an operator of differentiation, <math>\cdot</math> represents an argument or input to the associated function, <math>(\phi)^{-1}(\cdot)</math> is an inverse function such that <math>(\phi) ^{-1}(\phi(x))=x</math>, or equivalently, as <math>f'(f^{*\prime}(x^*)) = x^*</math> and <math>f^{*\prime}(f'(x)) = x</math> in Lagrange's notation.

The generalization of the Legendre transformation to affine spaces and non-convex functions is known as the convex conjugate (also called the Legendre–Fenchel transformation), which can be used to construct a function's convex hull.

Definition

Definition in one-dimensional real space

Let <math>I \sub \R</math> be an interval, and <math>f:I \to \R</math> a convex function; then the Legendre transform of <math>f</math> is the function <math>f^*:I^* \to \R</math> defined by

where <math display="inline">\sup</math> denotes the supremum over <math>I</math>, e.g., <math display="inline">x</math> in <math display="inline">I</math> is chosen such that <math display="inline">x^*x - f(x)</math> is maximized at each <math display="inline">x^*</math>, or <math display="inline">x^*</math> is such that <math>x^*x-f(x)</math> has a bounded value throughout <math display="inline">I</math> (e.g., when <math>f(x)</math> is a linear function).

The function <math>f^*</math> is called the convex conjugate function of <math>f</math>. For historical reasons (rooted in analytic mechanics), the conjugate variable is often denoted <math>p</math>, instead of <math>x^*</math>. If the convex function <math>f</math> is defined on the whole line and is everywhere differentiable, then

<math display="block">f^*(p)=\sup_{x\in I}(px - f(x)) = \left( p x - f(x) \right)|_{x = (f')^{-1}(p)} </math>

can be interpreted as the negative of the <math>y</math>-intercept of the tangent line to the graph of <math>f</math> that has slope <math>p</math>.

Definition in n-dimensional real space

The generalization to convex functions <math>f:X \to \R</math> on a convex set <math>X \sub \R^n</math> is straightforward: <math>f^*:X^* \to \R</math> has domain

<math display="block">X^*= \left \{x^* \in \R^n:\sup_{x\in X}(\langle x^*,x\rangle-f(x))<\infty \right \}</math>

and is defined by

<math display="block">f^*(x^*) = \sup_{x\in X}(\langle x^*,x\rangle-f(x)),\quad x^*\in X^* ~,</math>

where <math>\langle x^*,x \rangle</math> denotes the dot product of <math>x^*</math> and <math>x</math>.

The Legendre transformation is an application of the duality relationship between points and lines. The functional relationship specified by <math>f</math> can be represented equally well as a set of <math>(x,y)</math> points, or as a set of tangent lines specified by their slope and intercept values.

Understanding the Legendre transform in terms of derivatives

For a differentiable convex function <math>f</math> on the real line with the first derivative <math>f'</math> and its inverse <math>(f')^{-1}</math>, the Legendre transform of <math>f</math>, <math> f^*</math>, can be specified, up to an additive constant, by the condition that the functions' first derivatives are inverse functions of each other, i.e., <math>f' = ((f^*)')^{-1}</math> and <math>(f^*)' = (f')^{-1}</math>.

To see this, first note that if <math> f</math> as a convex function on the real line is differentiable and <math> \overline{x} </math> is a critical point of the function of <math> x \mapsto p \cdot x -f(x) </math>, then the supremum is achieved at <math display="inline"> \overline{x}</math> (by convexity, see the first figure in this Wikipedia page). Therefore, the Legendre transform of <math> f</math> is <math> f^*(p)= p \cdot \overline{x} - f(\overline{x})</math>.

Then, suppose that the first derivative <math>f'</math> is invertible and let the inverse be <math> g = (f')^{-1} </math>. Then for each <math display="inline"> p</math>, the point <math> g(p)</math> is the unique critical point <math display="inline"> \overline{x}</math> of the function <math> x \mapsto px -f(x) </math> (i.e., <math> \overline{x} = g(p)</math>) because <math> f'(g(p))=p </math> and the function's first derivative with respect to <math>x</math> at <math> g(p)</math> is <math> p-f'(g(p))=0 </math>. Hence we have <math> f^*(p) = p \cdot g(p) - f(g(p))</math> for each <math display="inline"> p</math>. By differentiating with respect to <math display="inline"> p</math>, we find

Since <math> f'(g(p))=p</math> this simplifies to <math>(f^*)'(p) = g(p) = (f')^{-1}(p)</math>. In other words, <math>(f^*)'</math> and <math>f'</math> are inverses to each other.

In general, if <math> h' = (f')^{-1} </math> as the inverse of <math> f',</math> then <math> h' = (f^*)' </math> so integration gives <math> f^* = h +c</math>, where <math> c </math> is a constant.

In practical terms, given <math>f(x),</math> the parametric plot of <math>x f'(x) - f(x)</math> versus <math>f'(x)</math> amounts to the graph of <math>f^*(p)</math> versus <math>p.</math>

In some cases (e.g., thermodynamic potentials, below), a non-standard requirement is used, amounting to an alternative definition of with a minus sign,

Definition in physical contexts

In analytical mechanics and thermodynamics, the Legendre transformation is usually defined as follows: suppose <math>f</math> is a function of <math>x</math>; then we have

Performing the Legendre transformation on this function means that we take <math>p = \frac{df}{dx}</math> as the independent variable, so that the above expression can be written as

and according to the product rule <math>d(uv) = u \, dv + v \, du,</math> we then have

<math display="block">d \left(x p - f \right) = x \, dp + p \, dx - df = x \, dp,</math>

and taking <math>f^* = xp - f,</math> we have <math>df^* = x \, dp,</math> which means

When <math>f</math> is a function of <math>n</math> variables <math>x_1, x_2, \cdots, x_n</math>, then we can perform the Legendre transformation on each one or several variables: we have

<math display="block">d f = p_1 \, dx_1 + p_2 \, dx_2 + \cdots + p_n \, dx_n,</math>

where <math>p_i = \frac{\partial f}{\partial x_i}.</math> Then if we want to perform the Legendre transformation on, e.g. <math>x_1</math>, then we take <math>p_1</math> together with <math>x_2, \cdots, x_n</math> as independent variables, and with Leibniz's rule we have

<math display="block">d (f - x_1 p_1) = -x_1 \, dp_1 + p_2 \, dx_2 + \cdots + p_n \, dx_n.</math>

So for the function <math>\varphi(p_1, x_2, \cdots, x_n) = f(x_1, x_2, \cdots, x_n) - x_1 p_1,</math> we have

<math display="block">\frac{\partial \varphi}{\partial p_1} = -x_1,\quad \frac{\partial \varphi}{\partial x_2} = p_2,\quad \cdots,

\quad \frac{\partial \varphi}{\partial x_n} = p_n.</math>

We can also do this transformation for variables <math>x_2, \cdots, x_n</math>. If we do it to all the variables, then we have

<math display="block">d\varphi = -x_1 \, dp_1 - x_2 \, dp_2 - \cdots - x_n \, dp_n </math> where <math>\varphi = f - x_1 p_1 - x_2 p_2 - \cdots - x_n p_n. </math>

In analytical mechanics, people perform this transformation on variables <math>\dot q_1, \dot q_2, \cdots, \dot q_n </math> of the Lagrangian <math>L(q_1, \cdots, q_n, \dot{q}_1, \cdots, \dot{q}_n) </math> to get the Hamiltonian:

<math display="block">H(q_1, \cdots, q_n, p_1, \cdots, p_n) = \sum_{i=1}^n p_i \dot{q}_i -

L(q_1, \cdots, q_n, \dot{q}_1 \cdots, \dot{q}_n). </math>

In thermodynamics, this transformation is applied to variables according to the type of thermodynamic system desired; for example, starting from the energy representation cardinal function of state, the internal energy <math>U(S,V)</math>, we have

so we can perform the Legendre transformation on either or both of <math>S, V </math> to yield

<math display="block">\begin{align}

dH &= d (U + pV) \ \ \ \ \ \ \ \ \ \ = \ \ \ \ T\,dS + V \,dp \\

dF &= d(U - TS) \ \ \ \ \ \ \ \ \ \ = -S\,dT - p \,dV \\

dG &= d(U - TS + pV) = -S \,dT + V \,dp,

\end{align}</math>

and each of these three expressions has a physical meaning.

This definition of the Legendre transformation is the one originally introduced by Legendre in his work in 1787, This can be seen as consequence of the following two observations. On the one hand, the hyperplane tangent to the epigraph of <math>f</math> at some point <math>(\mathbf x, f(\mathbf x))\in U\times \mathbb{R}</math> has normal vector <math>(\nabla f(\mathbf x),-1)\in\mathbb{R}^{n+1}</math>. On the other hand, any closed convex set <math>C\in\mathbb{R}^m</math> can be characterized via the set of its supporting hyperplanes by the equations <math>\mathbf x\cdot\mathbf n = h_C(\mathbf n)</math>, where <math>h_C(\mathbf n)</math> is the support function of <math>C</math>. But the definition of Legendre transform via the maximization matches precisely that of the support function, that is, <math>f^*(\mathbf x)=h_{\operatorname{epi}(f)}(\mathbf x,-1) </math>. We thus conclude that the Legendre transform characterizes the epigraph in the sense that the tangent plane to the epigraph at any point <math>(\mathbf x,f(\mathbf x))</math> is given explicitly by<math display="block">\{\mathbf z\in\mathbb{R}^{n+1}: \,\, \mathbf z\cdot \mathbf x= f^*(\mathbf x)\}. </math>

Alternatively, if is a vector space and is its dual vector space, then for each point of and of , there is a natural identification of the cotangent spaces with and with . If is a real differentiable function over , then its exterior derivative, , is a section of the cotangent bundle and as such, we can construct a map from to . Similarly, if is a real differentiable function over , then defines a map from to . If both maps happen to be inverses of each other, we say we have a Legendre transform. The notion of the tautological one-form is commonly used in this setting.

When the function is not differentiable, the Legendre transform can still be extended, and is known as the Legendre-Fenchel transformation. In this more general setting, a few properties are lost: for example, the Legendre transform is no longer its own inverse (unless there are extra assumptions, like convexity).

Legendre transformation on manifolds

Let <math display="inline">M</math> be a smooth manifold, let <math>E</math> and <math display="inline">\pi : E\to M</math> be a vector bundle on <math>M</math> and its associated bundle projection, respectively. Let <math display="inline">L : E\to \R</math> be a smooth function. We think of <math display="inline">L</math> as a Lagrangian by analogy with the classical case where <math display="inline">M = \R</math>, <math display="inline">E = TM = \Reals \times \Reals </math> and <math display="inline">L(x,v) = \frac 1 2 m v^2 - V(x)</math> for some positive number <math display="inline">m\in \Reals</math> and function <math display="inline">V : M \to \Reals</math>.

As usual, the dual of <math display="inline">E</math> is denoted by <math display="inline">E^*</math>. The fiber of <math display="inline">\pi</math> over <math display="inline">x\in M</math> is denoted <math display="inline">E_x</math>, and the restriction of <math display="inline">L</math> to <math display="inline">E_x</math> is denoted by <math display="inline">L|_{E_x} : E_x\to \R</math>. The Legendre transformation of <math display="inline">L</math> is the smooth morphism<math display="block">\mathbf F L : E \to E^*</math> defined by <math display="inline">\mathbf FL(v) = d(L|_{E_x})_v \in E_x^*</math>, where <math display="inline">x = \pi(v)</math>. Here we use the fact that since <math display="inline">E_x</math> is a vector space, <math display="inline">T_v(E_x)</math> can be identified with <math display="inline">E_x</math>.

In other words, <math display="inline">\mathbf FL(v)\in E_x^*</math> is the covector that sends <math display="inline">w\in E_x</math> to the directional derivative <math display="inline">\left.\frac d {dt}\right|_{t=0} L(v + tw)\in \R</math>.

To describe the Legendre transformation locally, let <math display="inline">U\subseteq M</math> be a coordinate chart over which <math display="inline">E</math> is trivial. Picking a trivialization of <math display="inline">E</math> over <math display="inline">U</math>, we obtain charts <math display="inline">E_U \cong U \times \R^r</math> and <math display="inline">E_U^* \cong U \times \R^r</math>. In terms of these charts, we have <math display="inline">\mathbf FL(x; v_1, \dotsc, v_r) = (x; p_1,\dotsc, p_r)</math>, where <math display="block">p_i = \frac {\partial L}{\partial v_i}(x; v_1, \dotsc, v_r)</math> for all <math display="inline">i = 1, \dots, r</math>. If, as in the classical case, the restriction of <math display="inline">L : E\to \mathbb R</math> to each fiber <math display="inline">E_x</math> is strictly convex and bounded below by a positive definite quadratic form minus a constant, then the Legendre transform <math display="inline">\mathbf FL : E\to E^*</math> is a diffeomorphism. Suppose that <math display="inline">\mathbf FL</math> is a diffeomorphism and let <math display="inline">H : E^* \to \R</math> be the "Hamiltonian" function defined by <math display="block">H(p) = p \cdot v - L(v),</math> where <math display="inline">v = (\mathbf FL)^{-1}(p)</math>. Using the natural isomorphism <math display="inline">E\cong E^{**}</math>, we may view the Legendre transformation of <math display="inline">H</math> as a map <math display="inline">\mathbf FH : E^* \to E</math>. Then we have