Derivative (multivariable calculus)

In mathematics, the derivative of a function at a point is the linear part of the best affine approximation to the function near the point. In one-variable calculus, this is the tangent line approximation. In multivariable calculus, the same property is generalized to define the derivative of a vector-valued function or function of a vector argument. Sometimes called the total derivative, in contrast with partial derivatives, the derivative approximates the function with respect to all of its arguments, not just a single one. In many situations, this is the same as considering all partial derivatives simultaneously.

In functional analysis, particularly in infinite dimensions, the derivative in this sense is called the Fréchet derivative.

Derivative as a linear map

Let <math>U \subseteq \R^n</math> be an open subset. Then a function <math>f \colon U \to \R^m</math> is said to be differentiable at a point <math>a\in U</math> if there exists a linear transformation <math>Df_a \colon \R^n \to \R^m</math> such that

:<math>\lim_{x \to a} \frac{f(x)-f(a)-Df_a(x-a)}{\|x-a\|}=0</math>

where <math>\| \ldots \|</math> denotes the norm of <math>\ldots</math>. The linear map <math>Df_a</math> is called the derivative or differential of <math>f</math> at <math>a</math>. Here <math>Df_a(x-a)</math> refers to applying the linear transformation <math>Df_a</math> to the vector <math>(x-a)</math>; in coordinates, this is a matrix-vector product. Other notations for the derivative include <math>D_a f</math> and <math>Df(a)</math>. A function is differentiable if its derivative exists at every point in its domain.

Conceptually, the definition of the derivative expresses the idea that <math>Df_a</math> is the best linear approximation to <math>f(a+h)-f(a)</math> for small <math>h</math>. This can be made precise by quantifying the error in the linear approximation <math>Df_a</math>. To do so, write

:<math>f(a + h) = f(a) + Df_a(h) + \varepsilon(h),</math>

where <math>\varepsilon(h)</math> equals the error in the approximation. To say that the derivative of <math>f</math> at <math>a</math> is <math>Df_a</math> is equivalent to the statement

:<math>\varepsilon(h) = o(h),</math>

where <math>o</math> is little-o notation and means that <math>\varepsilon(h)/\|h\|</math> tends to zero as <math>h \to 0</math>. The derivative <math>Df_a</math> is the unique linear transformation for which the error term is this small, and this is the sense in which it is the best linear approximation to <math>f(a+h)-f(a)</math>.

Differentiability

thumb|right|Plot of <math>\frac{x^2 y}{x^4+y^2}\sqrt{x^2+y^2}</math>, a function such that the directional derivative <math>\nabla_uf(0,0) = 0</math>, a linear functional of <math>u</math>, but which is not differentiable

The function <math>f</math> is differentiable if and only if each of its components <math>f_i \colon U \to \R</math> is differentiable, so when studying derivatives, it is often possible to work one coordinate at a time in the codomain. However, the same is not true of the coordinates in the domain. It is true that if <math>f</math> is differentiable at <math>a</math>, then each partial derivative <math>\partial f/\partial x_i</math> exists at <math>a</math>.

The converse does not hold: it can happen that all of the partial derivatives of <math>f</math> at <math>a</math> exist, but <math>f</math> is not differentiable at <math>a</math>. An example is the following function, which is continuous and has both partial derivatives zero at the origin, but is not differentiable there:

<math display="block">f(x,y) = \begin{cases}\frac{xy}{\sqrt{x^2+y^2&(x,y)\ne(0,0)\\ 0 & (x,y)=(0,0)\end{cases}.</math>

(In polar coordinates, this function is <math>f = r\cos\theta\sin\theta</math>.)

Even the existence and linearity of all directional derivatives at a point is not sufficient for differentiability; the essential additional requirement is that the linear approximation hold uniformly as the increment tends to zero from all directions. An example is

<math display="block">f(x,y) = \begin{cases} \frac{x^2y}{x^4+y^2}\sqrt{x^2+y^2} & (x,y)\ne (0,0)\\ 0 & (x,y)=(0,0)\end{cases}</math>

whose directional derivatives are all 0 at (0,0), but which fails to be differentiable there.

However, if all the partial derivatives of <math>f</math> at <math>a</math> exist in a neighborhood of <math>a</math> and are continuous at <math>a</math>, then <math>f</math> is differentiable at <math>a</math>. If <math>f</math> is differentiable at a point, then the derivative of <math>f</math> is the linear transformation corresponding to the Jacobian matrix of partial derivatives at the point.

Differentials

In some advanced calculus texts, the derivative is also called the differential. However, this term has several different, but closely connected meanings, in mathematics and the sciences.

When a differentiable function <math>f \colon \mathbb R^n\to\mathbb R</math> is scalar valued, the derivative of <math>f</math> at <math>a</math> may be written as the Jacobian matrix, which in this instance is a row matrix (a matrix consisting of elements in a single row, i.e., a row vector):

:<math>D f_a = \begin{bmatrix} \frac{\partial f}{\partial x_1}(a) & \cdots & \frac{\partial f}{\partial x_n}(a) \end{bmatrix}.</math>

The linear approximation property of the derivative implies that if

:<math>\Delta x = \begin{bmatrix} \Delta x_1 & \cdots & \Delta x_n \end{bmatrix}^\mathsf{T}</math>

is a small vector (where the <math>\mathsf{T}</math> denotes transpose, so that this vector is a column vector), then

:<math>f(a + \Delta x) - f(a) \approx D f_a \, \Delta x = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(a) \, \Delta x_i.</math>

Heuristically, this suggests that if <math>dx_1, \ldots, dx_n</math> are infinitesimal increments in the coordinate directions, then

:<math>df_a = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(a) \, dx_i</math>

and this is the differential of <math display="inline">f</math> at <math display="inline">a</math>. In fact, the notion of the infinitesimal, which is merely symbolic here, can be equipped with extensive mathematical structure. Techniques, such as the theory of differential forms, effectively give analytical and algebraic descriptions of objects like infinitesimal increments, <math>dx_i</math>. For instance, <math>dx_i</math> may be inscribed as a linear functional on the vector space <math>\R^n</math>. Evaluating <math>dx_i</math> at a vector <math>h</math> in <math>\R^n</math> measures how much <math>h</math> points in the <math>i</math>-th coordinate direction. The differential <math>df_a</math> is a linear combination of linear functionals and hence is itself a linear functional. The evaluation <math>df_a(h)</math> is the directional derivative of <math>f</math> along <math>h</math>. This point of view makes the derivative an instance of the exterior derivative.

Being a linear form, the differential is naturally a covector. In a Euclidean space, a covector is naturally dual to a vector (its transpose). This vector is called the gradient of <math>f</math>, and points in the direction in which <math>f</math> increases most rapidly.

Suppose now that <math>f</math> is a vector-valued function, that is, <math>f \colon \R^n \to \R^m</math>. In this case, the components <math>f_i</math> of <math>f</math> are real-valued functions, so they have associated differential forms <math>df_i</math>. The differential <math>df</math> amalgamates these forms into a single object and is therefore an instance of a vector-valued differential form.

If <math>f \colon M\to N</math> is a mapping between differentiable manifolds, differentiability can be formulated by differentiability in any coordinate chart. Invariantly, the differential of <math>f</math> at a point <math>p\in M</math> is a linear map <math>df_p \colon T_p M\to T_{f(p)}N</math> from the tangent space of <math>M</math> at <math>p</math> to that of <math>N</math> at <math>f(p)</math>. This is also known as the pushforward. This is closely related to the derivative as a linear approximation, since evaluating <math>df_p</math> on a tangent vector gives the directional derivative of <math>f</math> in that direction. Nevertheless, the differential is not quite the same thing as the derivative of a function between vector spaces: <math>f</math> is a mapping of manifolds and not of the vector spaces <math>T_pM</math> and <math>T_{f(p)}N</math> where the linear approximation lives.

In applications such as thermodynamics, the language of differentials often has an additional conceptual role. Expressions such as <math>dU=T\,dS-P\,dV</math> describe differentials of state functions and lead to questions about exactness, natural variables, and relations among partial derivatives. This use is related to the derivative of a multivariable function, but it is not just another name for the linear map <math>Df_a</math> associated with an ordinary function.

Total derivative

The term total derivative is also used in more than one way. In some mathematical texts it denotes the full derivative <math>Df_a</math>, as opposed to any one partial derivative. In that sense, the total derivative is the linear map that accounts for variation in all coordinate directions simultaneously.

In many applied contexts, however, "total derivative" refers instead to the derivative of a composite dependence. For example, if

:<math>z=f(x,y),\qquad x=x(t),\quad y=y(t),</math>

then the total derivative of <math>z</math> with respect to <math>t</math> is

:<math>\frac{dz}{dt}

\frac{\partial f}{\partial x}\frac{dx}{dt}

\frac{\partial f}{\partial y}\frac{dy}{dt}.</math>

This is the ordinary derivative of the composite function <math>f(x(t),y(t))</math>, computed by the chain rule. Equivalently, it is obtained by applying the derivative of <math>f</math> to the velocity vector of the path <math>t\mapsto (x(t),y(t))</math>:

:<math>\frac{d}{dt}f(x(t),y(t))

Df_{(x(t),y(t))}(x'(t),y'(t)).</math>

This chain-rule sense of "total derivative" is common in physics, engineering, economics, and other applied fields. In mechanics, for instance, the total time derivative of a function <math>F(q,p,t)</math> along a trajectory includes both explicit dependence on <math>t</math> and implicit dependence through the time-dependent variables <math>q(t)</math> and <math>p(t)</math>. In fluid mechanics, related terminology appears in the material derivative, which differentiates a quantity along the motion of a fluid parcel.

Another example is in classical mechanics, where the total derivative of a function that depends on phase space parameters and time is its partial derivative in time plus its Poisson bracket with the Hamiltonian <math>H</math>:

<math display="block"> \frac{df}{dt} = \frac{\partial f}{\partial t} + \{f,H\}</math>

Like the material derivative, total derivative in mechanics has the property that it is the derivative of the composite when pulled back to any Hamiltonian trajectory, but is still treated as a function of all phase space coordinates and time.

In comparative statics, total derivatives often describe how endogenous variables change with respect to exogenous variables in an implicitly defined system of equations. The endogeneous variables are generally not explicit functions of the exogeneous variables, other than through the implicit function theorem, and the total derivative is handled implicitly.

Thus, although "total derivative" can mean the derivative <math>Df_a</math> in the sense above, the term also commonly refers to a derivative along a specified dependence or process. This ambiguity is one reason to distinguish between the derivative as a linear map and the various differential or total-derivative notations used in applications.

Chain rule

A form of the chain rule generalizes from one-variable calculus. It says that, for two functions <math>f</math> and <math>g</math>, the derivative of the composite function <math>f \circ g</math> at <math>a</math> satisfies

:<math>D(f \circ g)_a = Df_{g(a)} \circ Dg_a</math>

where the composite on the right-hand side is the composition of linear maps. If the derivatives of <math>f</math> and <math>g</math> are identified with their Jacobian matrices, then the composite on the right-hand side is simply matrix multiplication.

Example: Differentiation with direct dependencies

Suppose that f is a function of two variables, x and y. If these two variables are independent, so that the domain of f is <math>\R^2</math>, then the behavior of f may be understood in terms of its partial derivatives in the x and y directions. However, in some situations, x and y may be dependent. For example, it might happen that f is constrained to a curve <math>y = y(x)</math>. In this case, we are actually interested in the behavior of the composite function <math>f(x, y(x))</math>. The partial derivative of f with respect to x does not give the true rate of change of f with respect to changing x because changing x necessarily changes y, while the partial derivative assumes y is fixed. However, the chain rule takes such dependencies into account. Write <math>\gamma(x) = (x, y(x))</math>. Then, the chain rule says

:<math>D(f \circ \gamma)_{x_0} = Df_{(x_0, y(x_0))} \circ D\gamma_{x_0}.</math>

By expressing the derivative using Jacobian matrices such as<math display="block">\begin{array}{lcl}

Df_{(x_0, y(x_0))} & = & \begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \end{bmatrix}_{(x_0, y(x_0))} \\

D\gamma_{x_0} & = & \begin{bmatrix} \frac{\partial x}{\partial x} \\ \frac{\partial y}{\partial x} \end{bmatrix}_{x_0}.

\end{array}</math>This becomes:

:<math>\frac{df(x, y(x))}{dx}(x_0) = \frac{\partial f}{\partial x}(x_0, y(x_0)) \, \frac{dx}{dx}(x_0) + \frac{\partial f}{\partial y}(x_0, y(x_0)) \, \frac{dy}{dx}(x_0).</math>

Suppressing the evaluation at <math>x_0</math> for legibility, we may also write this as

:<math>\frac{df(x, y(x))}{dx} = \frac{\partial f}{\partial x} \frac{dx}{dx} + \frac{\partial f}{\partial y} \frac{dy}{dx}.</math>

This gives a straightforward formula for the derivative of <math>f(x, y(x))</math> in terms of the partial derivatives of <math>f</math> and the derivative of <math>y(x)</math>.

For example, suppose

:<math>f(x,y)=xy.</math>

The rate of change of f with respect to x is usually the partial derivative of f with respect to x; in this case,

:<math>\frac{\partial f}{\partial x} = y.</math>

However, if y depends on x, the partial derivative does not give the true rate of change of f as x changes because the partial derivative assumes that y is fixed. Suppose we are constrained to the line

:<math>y=x.</math>

Then

:<math>f(x,y) = f(x,x) = x^2,</math>

and the total derivative of f with respect to x is

:<math>\frac{df}{dx} = 2 x,</math>

which we see is not equal to the partial derivative <math>\partial f/\partial x</math>. Instead of immediately substituting for y in terms of x, however, we can also use the chain rule as above:

:<math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y}\frac{dy}{dx} = y+x \cdot 1 = x+y = 2x.</math>

Example: Differentiation with indirect dependencies

While one can often perform substitutions to eliminate indirect dependencies, the chain rule provides for a more efficient and general technique. Suppose <math>L(t,x_1,\dots,x_n)</math> is a function of time <math>t</math> and <math>n</math> variables <math>x_i</math> which themselves depend on time. Then, the time derivative of <math>L</math> is

:<math>\frac{dL}{dt} = \frac{d}{dt} L \bigl(t, x_1(t), \ldots, x_n(t)\bigr).</math>

The chain rule expresses this derivative in terms of the partial derivatives of <math>L</math> and the time derivatives of the functions <math>x_i</math>:

:<math>\frac{dL}{dt}

= \frac{\partial L}{\partial t} + \sum_{i=1}^n \frac{\partial L}{\partial x_i}\frac{dx_i}{dt}

= \biggl(\frac{\partial}{\partial t} + \sum_{i=1}^n \frac{dx_i}{dt}\frac{\partial}{\partial x_i}\biggr)(L).</math>

This expression is often used in physics for a gauge transformation of the Lagrangian, as two Lagrangians that differ only by the total time derivative of a function of time and the <math>n</math> generalized coordinates lead to the same equations of motion. An interesting example concerns the resolution of causality concerning the Wheeler–Feynman time-symmetric theory. The operator in brackets (in the final expression above) is also called the total derivative operator (with respect to <math>t</math>).

For example, the total derivative of <math>f(x(t),y(t))</math> is

:<math>\frac{df}{dt} = { \partial f \over \partial x}{dx \over dt} + {\partial f \over \partial y}{dy \over dt }.</math>

Here there is no <math>\partial f / \partial t</math> term since <math>f</math> itself does not depend on the independent variable <math>t</math> directly.

Total differential equation

A total differential equation is a differential equation expressed in terms of total derivatives. Since the exterior derivative is coordinate-free, in a sense that can be given a technical meaning, such equations are intrinsic and geometric.

Application to equation systems

In economics, it is common for the total derivative to arise in the context of a system of equations. For example, a simple supply-demand system might specify the quantity q of a product demanded as a function D of its price p and consumers' income I, the latter being an exogenous variable, and might specify the quantity supplied by producers as a function S of its price and two exogenous resource cost variables r and w. The resulting system of equations

:<math>q=D(p, I),</math>

:<math>q=S(p, r, w),</math>

determines the market equilibrium values of the variables p and q. The total derivative <math>dp/dr</math> of p with respect to r, for example, gives the sign and magnitude of the reaction of the market price to the exogenous variable r. In the indicated system, there are a total of six possible total derivatives, also known in this context as comparative static derivatives: , , , , , and . The total derivatives are found by totally differentiating the system of equations, dividing through by, say , treating and as the unknowns, setting , and solving the two totally differentiated equations simultaneously, typically by using Cramer's rule.

References

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A. D. Polyanin and V. F. Zaitsev, Handbook of Exact Solutions for Ordinary Differential Equations (2nd edition), Chapman & Hall/CRC Press, Boca Raton, 2003.
From thesaurus.maths.org total derivative

Notes

External links

Ronald D. Kriz (2007) Envisioning total derivatives of scalar functions of two dimensions using raised surfaces and tangent planes from Virginia Tech