Implicit function theorem

In multivariable calculus, the implicit function theorem is a theorem that provides sufficient conditions under which a planar curve specified by <math>F(x,y)=0</math> can also be specified as the graph of a function <math>f</math>, so that for each point <math>(x,y)</math> on part of the curve, we have <math>y = f(x)</math>. An example is the unit circle, whose points <math>(x,y)</math> satisfy <math>x^2+y^2-1=0</math>, which can locally be solved (if <math>y > 0</math>) by <math>y=\sqrt{1-x^2}</math>, expressing the top semicircle as a graph. It is not always possible to solve the equation <math>F(x,y)=0</math> for <math>y</math> algebraically, and the implicit function theorem gives analytic conditions under which there exists a function <math>f</math> whose graph belongs to the given curve, and, in some formulations, also gives a way of constructing approximations to <math>f</math>.

More generally, given a system of equations (often abbreviated into ), the theorem states that, under a mild condition on the partial derivatives (with respect to each ) at a point, the variables are differentiable functions of the in some neighbourhood of the point. As these functions generally cannot be expressed in closed form, they are implicitly defined by the equations, and this motivated the name of the theorem.

In other words, under a mild condition on the partial derivatives, the set of zeros of a system of equations is locally the graph of a function.

History

Augustin-Louis Cauchy (1789–1857) is credited with the first rigorous form of the implicit function theorem. Ulisse Dini (1845–1918) generalized the real-variable version of the implicit function theorem to the context of functions of any number of real variables.

Two variables case

Let <math>f:\R^2 \to \R</math> be a continuously differentiable function defining the implicit equation of a curve <math> f(x,y) = 0 </math>. Let <math>(x_0, y_0)</math> be a point on the curve, that is, a point such that <math>f(x_0, y_0)=0</math>. In this simple case, the implicit function theorem can be stated as follows:

Proof. By differentiating the equation , one gets

<math display=block>\frac{\partial f}{ \partial x}(x, \varphi(x))+\varphi'(x)\, \frac{\partial f}{ \partial y}(x, \varphi(x))=0. </math>

and thus

<math display=block>\varphi'(x)=-\frac{\frac{\partial f}{ \partial x}(x, \varphi(x))}{\frac{\partial f}{ \partial y}(x, \varphi(x))}.</math>

This gives an ordinary differential equation for , with the initial condition .

Since <math display=inline>\frac{\partial f}{ \partial y} (x_0, y_0) \neq 0,</math> the right-hand side of the differential equation is continuous. Hence, the Peano existence theorem applies so there is a (possibly non-unique) solution. To see why <math display=inline> \varphi </math> is unique, note that the function <math display=inline> g_x(y)=f(x,y) </math> is strictly monotone in a neighbourhood of <math display=inline>x_0,y_0</math> (as <math display=inline>\frac{\partial f}{ \partial y} (x_0, y_0) \neq 0</math>), thus it is injective. If <math display=inline> \varphi,\phi </math> are solutions to the differential equation, then <math display=inline> g_x(\varphi(x))=g_x(\phi(x))=0 </math> and by injectivity we get, <math display=inline> \varphi(x)=\phi(x) </math>.

First example

thumb|right|200px|The unit circle of implicit equation cannot be represented as the graph of a function. Around the point where the tangent is not vertical, the bolded [[circular arc is the graph of some function of , while around , there is no function of with the circle as its graph. This is exactly what the implicit function theorem asserts in this case.]]

If we define the function , then the equation cuts out the unit circle as the level set . There is no way to represent the unit circle as the graph of a function of one variable because for each choice of , there are two choices of y, namely <math>\pm\sqrt{1-x^2}</math>.

However, it is possible to represent part of the circle as the graph of a function of one variable. If we let <math>g_1(x) = \sqrt{1-x^2}</math> for , then the graph of provides the upper half of the circle. Similarly, if <math>g_2(x) = -\sqrt{1-x^2}</math>, then the graph of gives the lower half of the circle.

The implicit function theorem says that under some mild assumptions, functions like and always exist, even in situations where they cannot be written down by explicit formulas. It guarantees that and are differentiable, and it even works in situations where we do not have a formula for .

General case

Let <math>f: \R^{n+m} \to \R^m</math> be a continuously differentiable function. We think of <math>\R^{n+m}</math> as the Cartesian product <math>\R^n\times\R^m,</math> and we write a point of this product as <math>(\mathbf{x}, \mathbf{y}) = (x_1,\ldots, x_n, y_1, \ldots y_m).</math> Starting from the given function <math>f</math>, our goal is to construct a function <math>g: \R^n \to \R^m</math> whose graph <math>(\textbf{x}, g(\textbf{x}))</math> is precisely the set of all <math>(\textbf{x}, \textbf{y})</math> such that <math>f(\textbf{x}, \textbf{y}) = \textbf{0}</math>.

As noted above, this may not always be possible. We will therefore fix a point <math>(\textbf{a}, \textbf{b}) = (a_1, \dots, a_n, b_1, \dots, b_m)</math> which satisfies <math>f(\textbf{a}, \textbf{b}) = \textbf{0}</math>, and we will ask for a <math>g</math> that works near the point <math>(\textbf{a}, \textbf{b})</math>. In other words, we want an open set <math>U \subset \R^n</math> containing <math>\textbf{a}</math>, an open set <math>V \subset \R^m</math> containing <math>\textbf{b}</math>, and a function <math>g : U \to V</math> such that the graph of <math>g</math> satisfies the relation <math>f = \textbf{0}</math> on <math>U\times V</math>, and that no other points within <math>U \times V</math> do so. In symbols,

<math display="block">\{ (\mathbf{x}, g(\mathbf{x})) \mid \mathbf x \in U \} = \{ (\mathbf{x}, \mathbf{y})\in U \times V \mid f(\mathbf{x}, \mathbf{y}) = \mathbf{0} \}.</math>

To state the implicit function theorem, we need the Jacobian matrix of <math>f</math>, which is the matrix of the partial derivatives of <math>f</math>. Abbreviating <math>(a_1, \dots, a_n, b_1, \dots, b_m)</math> to <math>(\textbf{a}, \textbf{b})</math>, the Jacobian matrix is

<math display="block">(Df)(\mathbf{a},\mathbf{b})

= \left[\begin{array}{ccc|ccc}

\frac{\partial f_1}{\partial x_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_1}{\partial x_n}(\mathbf{a},\mathbf{b}) &

\frac{\partial f_1}{\partial y_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_1}{\partial y_m}(\mathbf{a},\mathbf{b}) \\

\vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\

\frac{\partial f_m}{\partial x_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_m}{\partial x_n}(\mathbf{a},\mathbf{b}) &

\frac{\partial f_m}{\partial y_1}(\mathbf{a},\mathbf{b}) & \cdots & \frac{\partial f_m}{\partial y_m}(\mathbf{a},\mathbf{b})

\end{array}\right]

= \left[\begin{array}{c|c} X & Y \end{array}\right]</math>

where <math>X</math> is the matrix of partial derivatives in the variables <math>x_i</math> and <math>Y</math> is the matrix of partial derivatives in the variables <math>y_j</math>. The implicit function theorem says that if <math>Y</math> is an invertible matrix, then there are <math>U</math>, <math>V</math>, and <math>g</math> as desired. Writing all the hypotheses together gives the following statement.

Statement of the theorem

Let <math>f: \R^{n+m} \to \R^m</math> be a continuously differentiable function, and let <math>\R^{n+m}</math> have coordinates <math>(\textbf{x}, \textbf{y})</math>. Fix a point <math>(\textbf{a}, \textbf{b}) = (a_1,\dots,a_n, b_1,\dots, b_m)</math> with <math>f(\textbf{a}, \textbf{b}) = \mathbf{0}</math>, where <math>\mathbf{0} \in \R^m</math> is the zero vector. If the Jacobian matrix (this is the right-hand panel of the Jacobian matrix shown in the previous section):

<math display="block">J_{f, \mathbf{y (\mathbf{a}, \mathbf{b}) = \left [ \frac{\partial f_i}{\partial y_j} (\mathbf{a}, \mathbf{b}) \right ]</math>

is invertible, then there exists an open set <math>U \subset \R^n</math> containing <math>\textbf{a}</math> such that there exists a unique function <math>g: U \to \R^m</math> such that and Moreover, <math>g</math> is continuously differentiable and, denoting the left-hand panel of the Jacobian matrix shown in the previous section as:

J_{f, \mathbf{x (\mathbf{a}, \mathbf{b}) = \left [ \frac{\partial f_i}{\partial x_j} (\mathbf{a}, \mathbf{b}) \right ],

</math>

the Jacobian matrix of partial derivatives of <math>g</math> in <math>U</math> is given by the matrix product:

\left[\frac{\partial g_i}{\partial x_j} (\mathbf{x})\right]_{m\times n} =- \left [ J_{f, \mathbf{y(\mathbf{x}, g(\mathbf{x})) \right ]_{m \times m} ^{-1} \, \left [ J_{f, \mathbf{x(\mathbf{x}, g(\mathbf{x})) \right ]_{m \times n}

</math>

A proof may be found in the inverse function theorem article. Here, the two-dimensional case is detailed.

Higher derivatives

If, moreover, <math>f</math> is analytic or continuously differentiable <math>k</math> times in a neighbourhood of <math>(\textbf{a}, \textbf{b})</math>, then one may choose <math>U</math> in order that the same holds true for <math>g</math> inside <math>U</math>. In the analytic case, this is called the analytic implicit function theorem.

The circle example

Let us go back to the example of the unit circle. In this case n = m = 1 and <math>f(x,y) = x^2 + y^2 - 1</math>. The matrix of partial derivatives is just a 1 × 2 matrix, given by

<math display="block">(Df)(a,b) = \begin{bmatrix} \dfrac{\partial f}{\partial x}(a,b) & \dfrac{\partial f}{\partial y}(a,b) \end{bmatrix} = \begin{bmatrix} 2a & 2b \end{bmatrix}</math>

Thus, here, the in the statement of the theorem is just the number ; the linear map defined by it is invertible if and only if . By the implicit function theorem we see that we can locally write the circle in the form for all points where . For we run into trouble, as noted before. The implicit function theorem may still be applied to these two points, by writing as a function of , that is, <math>x = h(y)</math>; now the graph of the function will be <math>\left(h(y), y\right)</math>, since where we have , and the conditions to locally express the function in this form are satisfied.

The implicit derivative of y with respect to x, and that of x with respect to y, can be found by totally differentiating the implicit function <math>x^2+y^2-1</math> and equating to 0:

giving

and

Application: change of coordinates

Suppose we have an -dimensional space, parametrised by a set of coordinates <math> (x_1,\ldots,x_m) </math>. We can introduce a new coordinate system <math> (x'_1,\ldots,x'_m) </math> by supplying m functions <math> h_1\ldots h_m </math> each being continuously differentiable. These functions allow us to calculate the new coordinates <math> (x'_1,\ldots,x'_m) </math> of a point, given the point's old coordinates <math> (x_1,\ldots,x_m) </math> using <math> x'_1=h_1(x_1,\ldots,x_m), \ldots, x'_m=h_m(x_1,\ldots,x_m) </math>. One might want to verify if the opposite is possible: given coordinates <math> (x'_1,\ldots,x'_m) </math>, can we 'go back' and calculate the same point's original coordinates <math> (x_1,\ldots,x_m) </math>? The implicit function theorem will provide an answer to this question. The (new and old) coordinates <math>(x'_1,\ldots,x'_m, x_1,\ldots,x_m)</math> are related by f = 0, with

<math display="block">f(x'_1,\ldots,x'_m,x_1,\ldots, x_m)=(h_1(x_1,\ldots, x_m)-x'_1,\ldots , h_m(x_1,\ldots, x_m)-x'_m).</math>

Now the Jacobian matrix of f at a certain point (a, b) [ where <math>a=(x'_1,\ldots,x'_m), b=(x_1,\ldots,x_m)</math> ] is given by

<math display="block">(Df)(a,b) = \left [\begin{matrix}

-1 & \cdots & 0 \\

\vdots & \ddots & \vdots \\

0 & \cdots & -1

\end{matrix}\left|

\begin{matrix}

\frac{\partial h_1}{\partial x_1}(b) & \cdots & \frac{\partial h_1}{\partial x_m}(b)\\

\vdots & \ddots & \vdots\\

\frac{\partial h_m}{\partial x_1}(b) & \cdots & \frac{\partial h_m}{\partial x_m}(b)\\

\end{matrix} \right.\right] = [-I_m |J ].</math>

where Im denotes the m × m identity matrix, and is the matrix of partial derivatives, evaluated at (a, b). (In the above, these blocks were denoted by X and Y. As it happens, in this particular application of the theorem, neither matrix depends on a.) The implicit function theorem now states that we can locally express <math> (x_1,\ldots,x_m) </math> as a function of <math> (x'_1,\ldots,x'_m) </math> if J is invertible. Demanding J is invertible is equivalent to det J ≠ 0, thus we see that we can go back from the primed to the unprimed coordinates if the determinant of the Jacobian J is non-zero. This statement is also known as the inverse function theorem.

Example: polar coordinates

As a simple application of the above, consider the plane, parametrised by polar coordinates . We can go to a new coordinate system (cartesian coordinates) by defining functions and . This makes it possible given any point to find corresponding Cartesian coordinates . When can we go back and convert Cartesian into polar coordinates? By the previous example, it is sufficient to have , with

<math display="block">J =\begin{bmatrix}

\frac{\partial x(R,\theta)}{\partial R} & \frac{\partial x(R,\theta)}{\partial \theta} \\

\frac{\partial y(R,\theta)}{\partial R} & \frac{\partial y(R,\theta)}{\partial \theta} \\

\end{bmatrix}=

\begin{bmatrix}

\cos \theta & -R \sin \theta \\

\sin \theta & R \cos \theta

\end{bmatrix}.</math>

Since , conversion back to polar coordinates is possible if . So it remains to check the case . It is easy to see that in case , our coordinate transformation is not invertible: at the origin, the value of θ is not well-defined.

Generalizations

Banach space version

Based on the inverse function theorem in Banach spaces, it is possible to extend the implicit function theorem to Banach space valued mappings.

Let X, Y, Z be Banach spaces. Let the mapping be continuously Fréchet differentiable. If <math>(x_0,y_0)\in X\times Y</math>, <math>f(x_0,y_0)=0</math>, and <math>y\mapsto Df(x_0,y_0)(0,y)</math> is a Banach space isomorphism from Y onto Z, then there exist neighbourhoods U of x0 and V of y0 and a Fréchet differentiable function g : U → V such that f(x, g(x)) = 0 and f(x, y) = 0 if and only if y = g(x), for all <math>(x,y)\in U\times V</math>.

Implicit functions from non-differentiable functions

Various forms of the implicit function theorem exist for the case when the function f is not differentiable. It is standard that local strict monotonicity suffices in one dimension. The following more general form was proven by Kumagai based on an observation by Jittorntrum.

Consider a continuous function <math>f : \R^n \times \R^m \to \R^n</math> such that <math>f(x_0, y_0) = 0</math>. If there exist open neighbourhoods <math>A \subset \R^n</math> and <math>B \subset \R^m</math> of x0 and y0, respectively, such that, for all y in B, <math>f(\cdot, y) : A \to \R^n</math> is locally one-to-one, then there exist open neighbourhoods <math>A_0 \subset \R^n</math> and <math>B_0 \subset \R^m</math> of x0 and y0, such that, for all <math>y \in B_0</math>, the equation

f(x, y) = 0 has a unique solution

where g is a continuous function from B0 into A0.

Collapsing manifolds

Perelman’s collapsing theorem for 3-manifolds, the capstone of his proof of Thurston's geometrization conjecture, can be understood as an extension of the implicit function theorem.

Notes

References

11 Generalized Implicit Function Theorem Journal Of Indonesian Mathematical Society DOI:10.22.342 v3211.1551

License CCBY-NC-ND-4.0

Discusses Implicit function Theorem for X a topological space , Y Banach Space, Z a Topological Vector space