Fixed-point arithmetic

In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents (1/100 of a dollar). More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g., a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.

In the fixed-point representation, the fraction is often expressed in the same number base as the integer part, but using negative powers of the base b. The most common variants are decimal (base 10) and binary (base 2). The latter is commonly known also as binary scaling. Thus, if n fraction digits are stored, the value will always be an integer multiple of b−n. Fixed-point representation can also be used to omit the low-order digits of integer values, for instance, when representing large dollar values as multiples of $1000 ($1K).

When decimal fixed-point numbers are displayed for human reading, the fraction digits are usually separated from those of the integer part by a radix character (usually "." in English, but "," or some other symbol in many other languages). Internally, however, there is no separation, and the distinction between the two groups of digits is defined only by the programs that handle such numbers.

Fixed-point representation was the norm in mechanical calculators. Since most modern processors have a fast floating-point unit (FPU), fixed-point representations in processor-based implementations are now used only in special situations, such as in low-cost embedded microprocessors and microcontrollers; in applications that demand high speed or low power consumption or small chip area, like image, video, and digital signal processing; or when their use is more natural for the problem. Examples of the latter are accounting of dollar amounts, when fractions of cents must be rounded to whole cents in strictly prescribed ways; and the evaluation of functions by table lookup, or any application where rational numbers need to be represented without rounding errors (which fixed-point does but floating-point cannot). Fixed-point representation is still the norm for field-programmable gate array (FPGA) implementations, as floating-point support in an FPGA requires significantly more resources than fixed-point support.

Binary fixed-point (binary scaling) was widely used from the late 1960s to the 1980s for real-time computing that was mathematically intensive, such as flight simulation and in nuclear power plant control algorithms. It is still used in many DSP applications and custom-made microprocessors. Computations involving angles would use binary angular measurement.

Binary fixed point is used in the STM32G4 series CORDIC co-processors and in the discrete cosine transform algorithms used to compress JPEG images.

Electronic instruments such as electricity meters and digital clocks often use polynomials to compensate for introduced errors, e.g. from temperature or power supply voltage. The coefficients are produced by polynomial regression. Binary fixed-point polynomials can utilize more bits of precision than floating-point and do so in fast code using inexpensive CPUs. Accuracy, crucial for instruments, compares well to equivalent-bit floating-point calculations, if the fixed-point polynomials are evaluated using Horner's method (e.g. ) to reduce the number of times that rounding occurs, and the fixed-point multiplications utilize rounding addends.

Operations

Addition and subtraction

To add or subtract two values with the same implicit scaling factor, it is sufficient to add or subtract the underlying integers; the result will have their common implicit scaling factor and can thus be stored in the same program variables as the operands. These operations yield the exact mathematical result, as long as no overflow occurs—that is, as long as the resulting integer can be stored in the receiving program variable. If overflow happens, it occurs like with ordinary integers of the same signedness. In the unsigned and signed-via-two's-complement cases, the overflow behaviour is well-known as a finite group.

If the operands have different scaling factors, then they must be converted to a common scaling factor before the operation.

Multiplication

To multiply two fixed-point numbers, it suffices to multiply the two underlying integers and assume that the scaling factor of the result is the product of their scaling factors.

: (p/q) * (r/s) = pr/qs

The result will be exact, with no rounding, provided that it does not overflow the receiving variable. (Specifically, with integer multiplication, the product is up to twice the width of the two factors.)

For example, multiplying the numbers 123 scaled by 1/1000 (0.123) and 25 scaled by 1/10 (2.5) yields the integer 123×25 = 3075 scaled by (1/1000)×(1/10) = 1/10000, that is 3075/10000 = 0.3075. As another example, multiplying the first number by 155 implicitly scaled by 1/32 (155/32 = 4.84375) yields the integer 123×155 = 19065 with implicit scaling factor (1/1000)×(1/32) = 1/32000, that is 19065/32000 = 0.59578125.

In binary, it is common to use a scaling factor that is a power of two. After the multiplication, the scaling factor can be divided away by shifting right. Shifting is simple and fast in most computers.

When right-shifting or a typical integer-division instruction (such as C integer division and x86 idiv) is used, the result is equivalent to a flooring division (floor(x/y)). A method with rounding can be used to reduce the error introduced. Three variations are possible based on choice of tie-breaking:

Round-half-up is possible by adding a 'rounding addend' of half of the scaling factor before shifting. The proof: roundup(x/y) = floor(x/y + 0.5) = floor((x + y/2)/y). If y = 2^n, this is equivalent to (x + 2^(n−1)) >> n (where >> represents right shift).
Round-half-down is, by analogy, floor((x - y/2)/y) or (x - 2^(n-1)) >> n.
Round-half-to-even basically entails making an extra decision on top of round-half-up. It is slightly more complicated but still requires no branching on a CPU.

These rounding methods are usable in any scaling through integer division. For example, they are also applicable to the discussion on rescaling.

Division

The division of fixed-point numbers can be understood as the division of two fractions of potentially different denominators (scaling factors). With and (where p q r s are all integers), the naive approach is to rearrange the fraction to form a new scaling factor (s/q):

: (p/q) / (r/s) = (p÷r) / (s÷q)

For example, division of 3456 scaled by 1/100 (34.56) and 1234 scaled by 1/1000 (1.234) yields the integer 3456÷1234 = 3 (rounded) with scale factor (1/100)/(1/1000) = 10, that is, 30. As another example, the division of the first number by 155 implicitly scaled by 1/32 (155/32 = 4.84375) yields the integer 3456÷155 = 22 (rounded) with implicit scaling factor (1/100)/(1/32) = 32/100 = 8/25, that is 22×32/100 = 7.04.

With very similar s and q, the above algorithm results in an overly coarse scaling factor. This can be improved by first converting the dividend to a smaller scaling factor. Say we reduce the scaling factor by n times, then we instead calculate:

: (p/q) / (r/s) = (np/nq) / (r/s) = (np÷r) / (s÷nq)

For example, if a = 1.23 is represented as 123 with scaling 1/100, and b = 6.25 is represented as 6250 with scaling 1/1000, then simple division of the integers yields 123÷6250 = 0 (rounded) with scaling factor (1/100)/(1/1000) = 10. If a is first converted to 1,230,000 with scaling factor 1/1000000, the result will be 1,230,000÷6250 = 197 (rounded) with scale factor 1/1000 (0.197). The exact value 1.23/6.25 is 0.1968.

A different way to think about the scaling is to consider division, the inverse operation of multiplication. If multiplication leads to a finer scaling factor, it is reasonable that the dividend needs to have a finer scaling factor as well to recover the original value given.

Scaling conversion

In fixed-point computing, it is often necessary to convert a value to a different scaling factor. This operation is necessary, for example:

To store a value into a program variable that has a different implicit scaling factor;
To convert two values to the same scaling factor, so that they can be added or subtracted;
To restore the original scaling factor of a value after multiplying or dividing it by another;
To improve the accuracy of the result of a division;
To ensure that the scaling factor of a product or quotient is a simple power like 10n or 2n;
To ensure that the result of an operation can be stored in a program variable without overflow;
To reduce the cost of hardware that processes fixed-point data.

To convert a number from a fixed point type with scaling factor R to another type with scaling factor S, the underlying integer must be multiplied by the ratio R/S. Thus, for example, to convert the value 1.23 = 123/100 from scaling factor R=1/100 to one with scaling factor S=1/1000, the integer 123 must be multiplied by (1/100)/(1/1000) = 10, yielding the representation 1230/1000.

If the scaling factor is a power of the base used internally to represent the integer, changing the scaling factor requires only dropping low-order digits of the integer, or appending zero digits. However, this operation must preserve the sign of the number. In two's complement representation, that means extending the sign bit as in arithmetic shift operations.

If S does not divide R (in particular, if the new scaling factor S is greater than the original R), the new integer may have to be rounded.

In particular, if r and s are fixed-point variables with implicit scaling factors R and S, the operation r ← r×s requires multiplying the respective integers and explicitly dividing the result by S. The result may have to be rounded, and overflow may occur.

For example, if the common scaling factor is 1/100, multiplying 1.23 by 0.25 entails multiplying 123 by 25 to yield 3075 with an intermediate scaling factor of 1/10000. In order to return to the original scaling factor 1/100, the integer 3075 then must be multiplied by 1/100, that is, divided by 100, to yield either 31 (0.31) or 30 (0.30), depending on the rounding policy used.

Similarly, the operation r ← r/s will require dividing the integers and explicitly multiplying the quotient by S. Rounding and/or overflow may occur here too.

Conversion to and from floating-point

To convert a number from floating point to fixed point, one may multiply it by the scaling factor S, then round the result to the nearest integer. Care must be taken to ensure that the result fits in the destination variable or register. Depending on the scaling factor and storage size, and on the range input numbers, the conversion may not entail any rounding.

To convert a fixed-point number to floating-point, one may convert the integer to floating-point and then divide it by the scaling factor S. This conversion may entail rounding if the integer's absolute value is greater than 224 (for binary single-precision IEEE floating point) or of 253 (for double-precision). Overflow or underflow may occur if |S| is very large or very small, respectively.

Hardware support

Scaling and renormalization

Typical processors do not have specific support for fixed-point arithmetic. However, most computers with binary arithmetic have fast bit shift instructions that can multiply or divide an integer by any power of 2; in particular, an arithmetic shift instruction. These instructions can be used to quickly change scaling factors that are powers of 2, while preserving the sign of the number.

Early computers like the IBM 1620 and the Burroughs B3500 used a binary-coded decimal (BCD) representation for integers, namely base 10 where each decimal digit was independently encoded with 4 bits. Some processors, such as microcontrollers, may still use it. In such machines, conversion of decimal scaling factors can be performed by bit shifts and/or by memory address manipulation.

Some DSP architectures offer native support for specific fixed-point formats, for example, signed n-bit numbers with n−1 fraction bits (whose values may range between −1 and almost +1). The support may include a multiply instruction that includes renormalization—the scaling conversion of the product from 2n−2 to n−1 fraction bits. If the CPU does not provide that feature, the programmer must save the product in a large enough register or temporary variable, and code the renormalization explicitly.

Overflow

Overflow happens when the result of an arithmetic operation is too large to be stored in the designated destination area. In addition and subtraction, the result may require one bit more than the operands. In multiplication of two unsigned integers with m and n bits, the result may have m+n bits.

In case of overflow, the high-order bits are usually lost, as the un-scaled integer gets reduced modulo 2n where n is the size of the storage area. The sign bit, in particular, is lost, which may radically change the sign and the magnitude of the value.

Some processors can set a hardware overflow flag and/or generate an exception on the occurrence of an overflow. Some processors may instead provide saturation arithmetic: if the result of an addition or subtraction were to overflow, they store instead the value with the largest magnitude that can fit in the receiving area and has the correct sign.

However, these features are not very useful in practice; it is generally easier and safer to select scaling factors and word sizes so as to exclude the possibility of overflow, or to check the operands for excessive values before executing the operation.

Computer language support

Explicit support for fixed-point numbers is provided by a few programming languages, notably PL/I, COBOL, Ada, JOVIAL, and Coral 66. They provide fixed-point data types, with a binary or decimal scaling factor. The compiler automatically generates code to do the appropriate scaling conversions when doing operations on these data types, when reading or writing variables, or when converting the values to other data types, such as floating-point.

Most of those languages were designed between 1955 and 1990. More modern languages usually do not offer any fixed-point data types or support for scaling factor conversion. That is also the case for several older languages that are still very popular, like FORTRAN, C and C++. The wide availability of fast floating-point processors, with strictly standardized behavior, has greatly reduced the demand for binary fixed-point support. Similarly, the support for decimal floating point in some programming languages, like C# and Python, has removed most of the need for decimal fixed-point support. In the few situations that call for fixed-point operations, they can be implemented by the programmer, with explicit scaling conversion, in any programming language that supports explicit integer types.

On the other hand, all relational databases and the SQL notation support fixed-point decimal arithmetic and storage of numbers. PostgreSQL has a special <samp>numeric</samp> type for exact storage of numbers with up to 1000 digits. Newer versions of Ada allow specifying an exact (including non-power-of-two) scaling factor using <code> 'Small => 0.005</code> (aspect specification), or, if the factor is a power of 10, through a decimal fixed point.

The notation <code>Bm</code> has been used to mean a fixed binary format with m bits in the integer part; the rest of the word (typically 32 bits) being fraction bits. For example, the maximum and minimum values that can be stored in a signed number are ≈32767.9999847 and −32768.0, respectively.
The VisSim company used <code>fxm.b</code> to denote a binary fixed-point value with b total bits and m bits in the integer part; that is, a b-bit integer with scaling factor 1/2b−m. Thus would mean a 16-bit number with 1 bit in the integer part and 15 in the fraction.

External links

Simple Fixed-Point Math
Fixed-Point Arithmetic - An Introduction
Fixed Point Representation and Fractional Math
A Calculated Look at Fixed-Point Arithmetic, (PDF)