Kahan summation algorithm

In numerical analysis, the Kahan summation algorithm, also known as compensated summation, significantly reduces the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the naive approach. This is done by keeping a separate running compensation (a variable to accumulate small errors), in effect extending the precision of the sum by the precision of the compensation variable.

In particular, simply summing <math>n</math> numbers in sequence has a worst-case error that grows proportional to <math>n</math>, and a root mean square error that grows as <math>\sqrt{n}</math> for random inputs (the roundoff errors form a random walk). With compensated summation, using a compensation variable with sufficiently high precision the worst-case error bound is effectively independent of <math>n</math>, so a large number of values can be summed with an error that only depends on the floating-point precision of the result. Ivo Babuška seems to have come up with a similar algorithm independently (hence Kahan–Babuška summation). Similar, earlier techniques are, for example, Bresenham's line algorithm, keeping track of the accumulated error in integer operations (although first documented around the same time) and the delta-sigma modulation.

The algorithm

In pseudocode, the algorithm will be:

function KahanSum(input)

// Prepare the accumulator.

var sum = 0.0

// A running compensation for lost low-order bits.

var c = 0.0

// The array input has elements indexed input[1] to input[input.length].

for i = 1 to input.length do

// c is zero the first time around.

var y = input[i] - c

// Alas, sum is big, y small, so low-order digits of y are lost.

var t = sum + y

// (t - sum) cancels the high-order part of y;

// subtracting y recovers negative (low part of y)

c = (t - sum) - y

// Algebraically, c should always be zero. Beware

// overly-aggressive optimizing compilers!

sum = t

// Next time around, the lost low part will be added to y in a fresh attempt.

next i

return sum

This algorithm can also be rewritten to use the Fast2Sum algorithm:

function KahanSum2(input)

// Prepare the accumulator.

var sum = 0.0

// A running compensation for lost low-order bits.

var c = 0.0

// The array input has elements indexed input[1] to input[input.length].

for i = 1 to input.length do

// c is zero the first time around.

var y = input[i] + c

// sum + c is an approximation to the exact sum.

(sum,c) = Fast2Sum(sum,y)

// Next time around, the lost low part will be added to y in a fresh attempt.

next i

return sum

Worked example

The algorithm does not mandate any specific choice of radix, only for the arithmetic to "normalize floating-point sums before rounding or truncating". The relative error bound of every (backwards stable) summation method by a fixed algorithm in fixed precision (i.e. not those that use arbitrary-precision arithmetic, nor algorithms whose memory and time requirements change based on the data), is proportional to this condition number. This is still much worse than compensated summation, however. However, if the sum can be performed in twice the precision, then <math>\varepsilon</math> is replaced by <math>\varepsilon^2</math>, and naive summation has a worst-case error comparable to the <math>O(n \varepsilon^2)</math> term in compensated summation at the original precision.

By the same token, the <math>\Sigma |x_i|</math> that appears in <math>E_n</math> above is a worst-case bound that occurs only if all the rounding errors have the same sign (and are of maximal possible magnitude). introduced an improved version of Kahan algorithm, which he calls an "improved Kahan–Babuška algorithm", which also covers the case when the next term to be added is larger in absolute value than the running sum, effectively swapping the role of what is large and what is small. In pseudocode, the algorithm is:

function KahanBabushkaNeumaierSum(input)

var sum = 0.0

var c = 0.0 // A running compensation for lost low-order bits.

for i = 1 to input.length do

var t = sum + input[i]

if |sum| >= |input[i]| then

c += (sum - t) + input[i] // If sum is bigger, low-order digits of input[i] are lost.

else

c += (input[i] - t) + sum // Else low-order digits of sum are lost.

endif

sum = t

next i

return sum + c // Correction only applied once in the very end.

This enhancement is similar to the Fast2Sum version of Kahan's algorithm with Fast2Sum replaced by 2Sum.

For many sequences of numbers, both algorithms agree, but a simple example due to Peters which he called a second-order "iterative Kahan–Babuška algorithm". In pseudocode, the algorithm is:

function KahanBabushkaKleinSum(input)

var sum = 0.0

var cs = 0.0

var ccs = 0.0

for i = 1 to input.length do

var c, cc

var t = sum + input[i]

if |sum| >= |input[i]| then

c = (sum - t) + input[i]

else

c = (input[i] - t) + sum

endif

sum = t

t = cs + c

if |cs| >= |c| then

cc = (cs - t) + c

else

cc = (c - t) + cs

endif

cs = t

ccs = ccs + cc

end loop

return sum + (cs + ccs)

Speed

In the Kahan summation algorithm, each iteration of the loop depends on the result of a previous iteration (loop-carried data dependency). This reduces the performance gain from modern superscalar processor designs. To reduce this penalty, one can split the accumulator variables s and c into several copies each working on a portion of the input. These copies can be accumulated in parallel using a superscalar CPU, SIMD, or even multiprocessing. The separate copies are finally accumulated together using the scalar (non-parallel) Kahan algorithm.

Alternatives

Although Kahan's algorithm achieves <math>O(1)</math> error growth for summing n numbers, only slightly worse <math>O(\log n)</math> growth can be achieved by pairwise summation: one recursively divides the set of numbers into two halves, sums each half, and then adds the two sums. In practice, with roundoff errors of random signs, the root mean square errors of pairwise summation actually grow as <math>O\left(\sqrt{\log n}\right)</math>. This style of summation is used by Python's <code>math.fsum</code> function.

Another method that uses only integer arithmetic, but a large accumulator, was described by Kirchner and Kulisch; a hardware implementation was described by Müller, Rüb and Rülling.

Possible invalidation by compiler optimization

In principle, a sufficiently aggressive optimizing compiler could destroy the effectiveness of Kahan summation: for example, if the compiler simplified expressions according to the associativity rules of real arithmetic, it might "simplify" the second step in the sequence

: <code>t = sum + y;</code>

: <code>c = (t - sum) - y;</code>

: <code>c = ((sum + y) - sum) - y;</code>

and then to

: <code>c = 0;</code>

thus eliminating the error compensation. In practice, many compilers do not use associativity rules (which are only approximate in floating-point arithmetic) in simplifications, unless explicitly directed to do so by compiler options enabling "unsafe" optimizations, although the Intel C++ Compiler is one example that allows associativity-based transformations by default. The original K&R C version of the C programming language allowed the compiler to re-order floating-point expressions according to real-arithmetic associativity rules, but the subsequent ANSI C standard prohibited re-ordering in order to make C better suited for numerical applications (and more similar to Fortran, which also prohibits re-ordering), although in practice compiler options can re-enable re-ordering, as mentioned above.

A portable way to inhibit such optimizations locally is to break one of the lines in the original formulation into two statements, and make two of the intermediate products volatile:

function KahanSum(input)

var sum = 0.0

var c = 0.0

for i = 1 to input.length do

var y = input[i] - c

volatile var t = sum + y

volatile var z = t - sum

c = z - y

sum = t

next i

return sum

Support by libraries

In general, built-in "sum" functions in computer languages typically provide no guarantees that a particular summation algorithm will be employed, much less Kahan summation. The BLAS standard for linear algebra subroutines explicitly avoids mandating any particular computational order of operations for performance reasons, and BLAS implementations typically do not use Kahan summation.

The standard library of the Python computer language uses the Neumaier summation in the builtin "sum()" function starting with CPython 3.12. NumPy makes no guarantee for the order of summation, though partial pairwise summation is "often" used.

In the Julia language, the default implementation of the <code>sum</code> function does pairwise summation for high accuracy with good performance, but an external library provides an implementation of Neumaier's variant named <code>sum_kbn</code> for the cases when higher accuracy is needed.

In the C# language, HPCsharp nuget package implements the Neumaier variant and pairwise summation: both as scalar, data-parallel using SIMD processor instructions, and parallel multi-core.

References

External links

Floating-point Summation, Dr. Dobb's Journal September, 1996

The algorithm

Worked example

Speed

Alternatives

Possible invalidation by compiler optimization

Support by libraries

See also

References

External links