right|thumb|A comparison of [[gradient descent (green) and Newton's method (red) for minimizing a function (with small step sizes). Newton's method uses curvature information (i.e. the second derivative) to take a more direct route.]]
<!--__NOTOC__-->
In calculus, Newton's method (also called Newton–Raphson) is an iterative method for finding the roots of a differentiable function <math>f</math>, which are solutions to the equation <math>f(x)=0</math>. However, to optimize a twice-differentiable <math>f</math>, our goal is to find the roots of <math>f'</math>. We can therefore use Newton's method on its derivative <math>f'</math> to find solutions to <math>f'(x)=0</math>, also known as the critical points of <math>f</math>. These solutions may be minima, maxima, or saddle points; see section "Several variables" in Critical point (mathematics) and also section "Geometric interpretation" in this article. This is relevant in optimization, which aims to find (global) minima of the function <math>f</math>.
Newton's method
The central problem of optimization is minimization of functions. Let us first consider the case of univariate functions, i.e., functions of a single real variable. We will later consider the more general and more practically useful multivariate case.
Given a twice differentiable function <math> f:\mathbb{R}\to \mathbb{R}</math>, we seek to solve the optimization problem
:<math> \min_{x\in \mathbb{R f(x) .</math>
Newton's method attempts to solve this problem by constructing a sequence <math>\{x_k\}</math> from an initial guess (starting point) <math>x_0\in \mathbb{R}</math> that converges towards a minimizer <math>x_*</math> of <math>f</math> by using a sequence of second-order Taylor approximations of <math>f</math> around the iterates. The second-order Taylor expansion of around <math>x_k</math> is
:<math>f(x_k + t) \approx f(x_k) + f'(x_k) t + \frac{1}{2} f(x_k) t^2.</math>
The next iterate <math>x_{k+1}</math> is defined so as to minimize this quadratic approximation in <math>t</math>, and setting <math>x_{k+1}=x_k + t</math>. If the second derivative is positive, the quadratic approximation is a convex function of <math>t</math>, and its minimum can be found by setting the derivative to zero. Since
:<math>\displaystyle 0 = \frac{\rm d}
