Regula falsi

In mathematics, the regula falsi, method of false position, or false position method is a family of algorithms used to solve linear equations and smooth nonlinear equations for a single unknown value.

In its oldest known examples found in cuneiform and hieroglyphic writings, the method replaces simple trial and error with proportional correction of an initial guess. In modern usage, the method relies on linear interpolation based on two different guesses.

Two historical types

Two basic types of false position method can be distinguished historically, simple false position and double false position.

Simple false position is aimed at solving problems involving direct proportion and can be thought

of as an early algorithm for division. Such problems can be written algebraically in the form: determine such that

:<math>ax = b ,</math>

if and are known. The method begins by using a test input value , and finding the corresponding output value by multiplication: . The correct answer is then found by proportional adjustment, .

As an example, consider problem 26 in the Rhind papyrus, which asks for a solution of (written in modern notation) the equation . This is solved by false position. First, guess that to obtain, on the left, . This guess is a good choice since it produces an integer value. However, 4 is not the solution of the original equation, as it gives a value which is three times too small. To compensate, multiply (currently set to 4) by 3 and substitute again to get , verifying that the solution is .

Double false position is aimed at solving more difficult problems that can be written algebraically in the form: determine such that

:<math>f(x) = ax + c = 0 ,</math>

if it is known that

:<math>\begin{aligned}

f(x_1) &= b_1; \\

f(x_2) &= b_2.

\end{aligned}</math>

Double false position is mathematically equivalent to linear interpolation. By using a pair of test inputs and the corresponding pair of outputs, the result of this algorithm given by,

:<math> x = \frac{b_1 x_2 - b_2 x_1}{b_1 - b_2},</math>

would be memorized and carried out by rote. Indeed, the rule as given by Robert Recorde in his Ground of Artes (c. 1542) is: dated from 200 BC to AD 100, most of Chapter 7 was devoted to the algorithm. There, the procedure was justified by concrete arithmetical arguments, then applied creatively to a wide variety of story problems, including one involving what we would call secant lines on a conic section. A more typical example is this "joint purchase" problem involving an "excess and deficit" condition:

Now an item is purchased jointly; everyone contributes 8 [coins], the excess is 3; everyone contributes 7, the deficit is 4. Tell: The number of people, the item price, what is each? Answer: 7 people, item price 53.

</blockquote>

Between the 9th and 10th centuries, the Egyptian mathematician Abu Kamil wrote a now-lost treatise on the use of double false position, known as the Book of the Two Errors (Kitāb al-khaṭāʾayn). The oldest surviving writing on double false position from the Middle East is that of Qusta ibn Luqa (10th century), an Arab mathematician from Baalbek, Lebanon. He justified the technique by a formal, Euclidean-style geometric proof. Within the tradition of medieval Muslim mathematics, double false position was known as hisāb al-khaṭāʾayn ("reckoning by two errors"). It was used for centuries to solve practical problems such as commercial and juridical questions (estate partitions according to rules of Quranic inheritance), as well as purely recreational problems. The algorithm was often memorized with the aid of mnemonics, such as a verse attributed to Ibn al-Yasamin and balance-scale diagrams explained by al-Hassar and Ibn al-Banna, all three being mathematicians of Moroccan origin.

Leonardo of Pisa (Fibonacci) devoted Chapter 13 of his book Liber Abaci (AD 1202) to explaining and demonstrating the uses of double false position, terming the method regulis elchatayn after the al-khaṭāʾayn method that he had learned from Arab sources. Pacioli's term nearly disappeared in the 16th century European works and the technique went by various names such as "Rule of False", "Rule of Position" and "Rule of False Position". Regula Falsi appears as the Latinized version of Rule of False as early as 1690.

More precisely, suppose that in the -th iteration the bracketing interval is . Construct the line through the points and , as illustrated. This line is a secant or chord of the graph of the function . In point-slope form, its equation is given by

:<math> y - f(b_k) = \frac{f(b_k)-f(a_k)}{b_k-a_k} (x-b_k). </math>

Now choose to be the -intercept of this line, that is, the value of for which , and substitute these values to obtain

:<math> f(b_k) + \frac{f(b_k)-f(a_k)}{b_k-a_k} (c_k-b_k) = 0. </math>

Solving this equation for c<sub>k</sub> gives:

:<math>

c_k = b_k - f(b_k) \frac{b_k-a_k}{f(b_k)-f(a_k)} = \frac{a_k f(b_k) - b_k f(a_k)}{f(b_k) - f(a_k)}.</math>

This last symmetrical form has a computational advantage when using floating-point arithmetic: As a solution is approached, and will be very close together, and nearly always of the same sign. Such a subtraction can lose precision through cancellation. Because and are always of opposite sign the “subtraction” in the numerator of the improved formula is effectively an addition (as is the subtraction in the denominator too).

At iteration number , the number is calculated as above and then, if and have the same sign, set and , otherwise set and . This process is repeated until the root is approximated sufficiently well.

The above formula is also used in the secant method.

For nonlinear functions, once the interval of search

shrinks far enough that the second derivative has constant sign throughout the interval, one endpoint

of the search becomes fixed, while the other converges to the root. Thus, the best estimate of the solution

is the last calculated value of <math>c_k</math>. However, because the interval stops shrinking, regula falsi can not match the bisection method's guarantee of precision. In some cases, rate of convergence can drop below that of the bisection method. Modified versions of regula falsi are generally to be preferred because they can fix these shortcomings at minimal cost.

Analysis

Since the initial end-points

and are chosen such that and are of opposite signs, at each step, one of the end-points will get closer to a root of .

If the second derivative of is of constant sign (so there is no inflection point) in the interval,

then one endpoint (the one where also has the same sign) will remain fixed for all subsequent

iterations while the converging endpoint becomes updated. As a result,

unlike the bisection method, the width of the bracket does not tend to

zero (unless the zero is at an inflection point around which ). As a consequence, the linear approximation to , which is used to pick the false position,

does not improve as rapidly as possible.

One example of this phenomenon is the function

:<math> f(x) = 2x^3-4x^2+3x </math>

on the initial bracket

[−1,1]. The left end, −1, is never replaced (it does not change at first and after the first three iterations, is negative on the interval) and thus the width

of the bracket never falls below 1. Hence, the right endpoint approaches 0 at

a linear rate (the number of accurate digits grows linearly, with a rate of convergence of 2/3).

For discontinuous functions, this method can only be expected to find a point where the function changes sign (for example at for or the sign function). In addition to sign changes, it is also possible for the method to converge to a point where the limit of the function is zero, even if the function is undefined (or has another value) at that point (for example at for the function given by when and by , starting with the interval [-0.5, 3.0]).

It is mathematically possible with discontinuous functions for the method to fail to converge to a zero limit or sign change, but this is not a problem in practice since it would require an infinite sequence of coincidences for both endpoints to get stuck converging to discontinuities where the sign does not change, for example at in

:<math>f(x) = \frac{1}{(x-1)^2} + \frac{1}{(x+1)^2}.</math>

The method of bisection avoids this hypothetical convergence problem.

Improvements in regula falsi

Though regula falsi always converges, usually considerably faster than bisection, there are situations that can slow its convergence – sometimes to a prohibitive degree. That problem isn't unique to regula falsi: Other than bisection, all of the numerical equation-solving methods can have a slow-convergence or no-convergence problem under some conditions. Sometimes, Newton's method and the secant method diverge instead of converging – and often do so under the same conditions that slow regula falsi's convergence.

But, though regula falsi is one of the best methods, and even in its original un-improved version would often be the best choice; for example, when Newton's isn't used because the derivative is prohibitively time-consuming to evaluate, or when Newton's and Successive-Substitutions have failed to converge.

Regula falsi's failure mode is easy to detect: The same end-point is retained twice in a row. The problem is easily remedied by picking instead a modified false position, chosen to avoid slowdowns due to those relatively unusual unfavorable situations. A number of such improvements to regula falsi have been proposed; two of them, the Illinois algorithm and the Anderson–Björk algorithm, are described below.

The Illinois algorithm

The Illinois algorithm halves the -value of the retained end point in the next estimate computation when the new -value (that is, )) has the same sign as the previous one ()), meaning that the end point of the previous step will be retained. Hence:

:<math> c_k = \frac{\frac{1}{2}f(b_k) a_k - f(a_k) b_k}{\frac{1}{2}f(b_k) - f(a_k)}</math>

:<math> c_k = \frac{f(b_k) a_k - \frac{1}{2}f(a_k) b_k}{f(b_k) - \frac{1}{2}f(a_k)},</math>

down-weighting one of the endpoint values to force the next to occur on that side of the function.

The above adjustment to regula falsi is called the Illinois algorithm by some scholars. Ford (1995) summarizes and analyzes this and other similar superlinear variants of the method of false position.

\begin{align}

m' &= 1 - \frac{f(c_k)}{f(b_k)},\\

m &=

\begin{cases}

m' & \text{if } m' > 0, \\

\frac{1}{2} & \text{otherwise.}

\end{cases}

\end{align}

</math>

For simple roots, Anderson–Björck performs very well in practice.

ITP method

Given <math>\kappa_1\in (0,\infty), \kappa_2 \in \left[1,1+\phi\right) </math>, <math>n_{1/2} \equiv \lceil(b_0-a_0)/2\epsilon\rceil </math> and <math>n_0\in[0,\infty) </math> where <math>\phi </math> is the golden ration <math>\tfrac{1}{2}(1+\sqrt{5}) </math>, in each iteration <math>j = 0,1,2... </math> the ITP method calculates the point <math>x_{\text{ITP </math> following three steps:

[Interpolation Step] Calculate the bisection and the regula falsi points: <math>x_{1/2} \equiv \frac{a+b}{2} </math> and <math>x_f \equiv \frac{bf(a)-af(b)}{f(a)-f(b)} </math> ;
[Truncation Step] Perturb the estimator towards the center: <math>x_t \equiv x_f+\sigma \delta </math> where <math>\sigma \equiv \text{sign}(x_{1/2}-x_f) </math> and <math>\delta \equiv \min\{\kappa_1|b-a|^{\kappa_2},|x_{1/2}-x_f|\} </math> ;
[Projection Step] Project the estimator to minmax interval: <math>x_{\text{ITP \equiv x_{1/2} -\sigma \rho_k </math> where <math>\rho_k \equiv \min\left\{\epsilon 2^{n_{1/2}+n_0-j} - \frac{b-a}{2},|x_t-x_{1/2}|\right\} </math>.

The value of the function <math>f(x_{\text{ITP) </math> on this point is queried, and the interval is then reduced to bracket the root by keeping the sub-interval with function values of opposite sign on each end. This three step procedure guarantees that the minmax properties of the bisection method are enjoyed by the estimate as well as the superlinear convergence of the secant method. And, is observed to outperform both bisection and interpolation based methods under smooth and non-smooth functions.

Practical considerations

When solving one equation, or just a few, using a computer, the bisection method is an adequate choice. Although bisection isn't as fast as the other methods—when they're at their best and don't have a problem—bisection nevertheless is guaranteed to converge at a useful rate, roughly halving the error with each iteration – gaining roughly a decimal place of accuracy with every 3 iterations.

For manual calculation, by calculator, one tends to want to use faster methods, and they usually, but not always, converge faster than bisection. But a computer, even using bisection, will solve an equation, to the desired accuracy, so rapidly that there's no need to try to save time by using a less reliable method—and every method is less reliable than bisection.

An exception would be if the computer program had to solve equations very many times during its run. Then the time saved by the faster methods could be significant.

Then, a program could start with Newton's method, and, if Newton's isn't converging, switch to regula falsi, maybe in one of its improved versions, such as the Illinois or Anderson–Björck versions. Or, if even that isn't converging as well as bisection would, switch to bisection, which always converges at a useful, if not spectacular, rate.

When the change in has become very small, and is also changing very little, then Newton's method most likely will not run into trouble, and will converge. So, under those favorable conditions, one could switch to Newton's method if one wanted the error to be very small and wanted very fast convergence.

Example: Growth of a bulrush

In chapter 7 of The Nine Chapters, a root finding problem can be translated to modern language as follows:

Excess And Deficit Problem #11:

A bulrush grew 3 units on its first day. At the end of each day, the plant is observed to have grown by of the previous day's growth.
A club-rush grew 1 unit on its first day. At the end of each day, the plant has grown by 2 times as much as the previous day's growth.
Find the time [in fractional days] that the club-rush becomes as tall as the bulrush.

Answer: the height is

Explanation:

Suppose it is day 2. The club-rush is shorter than the bulrush by 1.5 units.
Suppose it is day 3. The club-rush is taller than the bulrush by 1.75 units. ∎

</blockquote>

thumb|Plot of function , its exact root (point ), and the approximated root

To understand this, we shall model the heights of the plants on day ( = 1, 2, 3...) after a geometric series.

:<math>B(n) = \sum_{i=1}^n 3 \cdot \frac{1}{2^{i-1 \quad</math>Bulrush

:<math>C(n) = \sum_{i=1}^n 1 \cdot 2^{i-1} \quad</math>Club-rush

For the sake of better notations, let <math>\ k = i-1 ~.</math> Rewrite the plant height series <math>\ B(n),\ C(n)\ </math> in terms of and invoke the formula for a geometric series.

:<math>\ B(n) = \sum_{k=0}^{n-1} 3 \cdot \frac{1}{2^k} = 3 \left( \frac{1 - (\tfrac{1}{2})^{n-1+1{1 - \tfrac{1}{2 \right) = 6 \left( 1 - \frac{1}{2^n} \right)</math>

:<math>\ C(n) = \sum_{k=0}^{n-1} 2^k = \frac{~~ 1 - 2^n}{\ 1 - 2\ } = 2^n - 1\ </math>

Now, use regula falsi to find the root of <math>\ (C(n) - B(n))\ </math>

:<math>\ F(n) := C(n) - B(n) = \frac{6}{2^n} + 2^n - 7\ </math>

Estimated root (1st iteration):

:<math>\ \hat{x} ~=~ \frac{~ x_1 F(x_2) - x_2 F(x_1) ~}{F(x_2) - F(x_1)} ~=~ \frac{~ 2 \times 1.75 + 3 \times 1.5 ~}{1.75 + 1.5} ~\approx~ 2.4615\ </math>

Example code

This example program, written in the C programming language, is an example of the Illinois algorithm.

To find the positive number where , the equation is transformed into a root-finding form .

include <stdio.h>
include <math.h>

double f(double x) {

return cos(x) - x * x * x;

}

/* a,b: endpoints of an interval where we search

e: half of upper bound for relative error

m: maximal number of iteration

double falsi_method(double (*f)(double), double a, double b, double e, int m) {

double c, fc;

int n, side = 0;

/* starting values at endpoints of interval */

double fa = f(a);

double fb = f(b);

for (n = 0; n < m; n++) {

c = (fa * b - fb * a) / (fa - fb);

if (fabs(b - a) < e * fabs(b + a))

break;

fc = f(c);

if (fc * fb > 0) {

/* fc and fb have same sign, copy c to b */

b = c; fb = fc;

if (side == -1)

fa /= 2;

side = -1;

} else if (fa * fc > 0) {

/* fc and fa have same sign, copy c to a */

a = c; fa = fc;

if (side == +1)

fb /= 2;

side = +1;

} else {

/* fc * f_ very small (looks like zero) */

break;

}

return c;

}

int main(void) {

printf("%0.15f\n", falsi_method(&f, 0, 1, 5E-15, 100));

return 0;

}

</syntaxhighlight>

After running this code, the final answer is approximately

0.865474033101614.