Truncated binary encoding is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n. It is a slightly more general form of binary encoding when n is not a power of two.
If n is a power of two, then the coded value for 0 ≤ x < n is the simple binary code for x of length log<sub>2</sub>(n). Otherwise let k = floor(log<sub>2</sub>(n)), such that 2<sup>k</sup> < n < 2<sup>k+1</sup>and let u = 2<sup>k+1</sup> − n.
Truncated binary encoding assigns the first u symbols codewords of length k and then assigns the remaining n − u symbols the last n − u codewords of length k + 1. Because all the codewords of length k + 1 consist of an unassigned codeword of length k with a "0" or "1" appended, the resulting code is a prefix code.
History
Used since at least 1984, phase-in codes, also known as economy codes, are also known as truncated binary encoding.
Example with n = 5
For example, for the alphabet {0, 1, 2, 3, 4}, n = 5 and 2<sup>2</sup> ≤ n < 2<sup>3</sup>, hence k = 2 and u = 2<sup>3</sup> − 5 = 3. Truncated binary encoding assigns the first u symbols the codewords 00, 01, and 10, all of length 2, then assigns the last n − u symbols the codewords 110 and 111, the last two codewords of length 3.
For example, if n is 5, plain binary encoding and truncated binary encoding allocates the following codewords. Digits shown <del>struck</del> are not transmitted in truncated binary.
{| class="wikitable" style="text-align:center"
|-
! Truncated<br/>binary !!colspan=3| Encoding !! Standard<br/>binary
|-
|align=right| 0 ||bgcolor=silver| <del>0</del> || 0 || 0 ||align=left| 0
|-
|align=right| 1 ||bgcolor=silver| <del>0</del> || 0 || 1 ||align=left| 1
|-
|align=right| 2 ||bgcolor=silver| <del>0</del> || 1 || 0 ||align=left| 2
|-
|align=right| UNUSED ||bgcolor=silver| <del>0</del> ||bgcolor=silver| <del>1</del> ||bgcolor=silver| <del>1</del> ||align=left| 3
|-
|align=right| UNUSED ||bgcolor=silver| <del>1</del> ||bgcolor=silver| <del>0</del> ||bgcolor=silver| <del>0</del> ||align=left| 4
|-
|align=right| UNUSED ||bgcolor=silver| <del>1</del> ||bgcolor=silver| <del>0</del> ||bgcolor=silver| <del>1</del> ||align=left| 5/UNUSED
|-
|align=right| 3 || 1 || 1 || 0 ||align=left| 6/UNUSED
|-
|align=right| 4 || 1 || 1 || 1 ||align=left| 7/UNUSED
|}
It takes 3 bits to encode n using straightforward binary encoding, hence 2<sup>3</sup> − n = 8 − 5 = 3 are unused.
In numerical terms, to send a value x, where 0 ≤ x < n, and where there are 2<sup>k</sup> ≤ n < 2<sup>k+1</sup> symbols, there are u = 2<sup>k+1</sup> − n unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number x in truncated binary is: if x is less than u, encode it in k binary bits; if x is greater than or equal to u, encode the value x + u in k + 1 binary bits.
Example with n = 10
Another example, encoding an alphabet of size 10 (between 0 and 9) requires 4 bits, but there are 2<sup>4</sup> − 10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.)
{| class="wikitable" style="text-align:center"
|-
! Input<br/>value !! Offset !! Offset<br/>value !! Standard<br/>binary || Truncated<br />binary
|-
| 0 || 0 || 0 || <del>0</del>000 || 000
|-
| 1 || 0 || 1 || <del>0</del>001 || 001
|-
| 2 || 0 || 2 || <del>0</del>010 || 010
|-
| 3 || 0 || 3 || <del>0</del>011 || 011
|-
| 4 || 0 || 4 || <del>0</del>100 || 100
|-
| 5 || 0 || 5 || <del>0</del>101 || 101
|-
|colspan=5|
|-
| 6 || 6 || 12 || 0110 || 1100
|-
| 7 || 6 || 13 || 0111 || 1101
|-
| 8 || 6 || 14 || 1000 || 1110
|-
| 9 || 6 || 15 || 1001 || 1111
|}
To decode, read the first k bits. If they encode a value less than u, decoding is complete. Otherwise, read an additional bit and subtract u from the result.
Example with n = 7
Here is a more extreme case: with n = 7 the next power of 2 is 8, so k = 2 and u = 2<sup>3</sup> − 7 = 1:
{| class="wikitable" style="text-align:center"
|-
! Input<br/>value !! Offset !! Offset<br />value !! Standard<br/>binary || Truncated<br />binary
|-
| 0 || 0 || 0 || <del>0</del>00 || 00
|-
|colspan=5|
|-
| 1 || 1 || 2 || 001 || 010
|-
| 2 || 1 || 3 || 010 || 011
|-
| 3 || 1 || 4 || 011 || 100
|-
| 4 || 1 || 5 || 100 || 101
|-
| 5 || 1 || 6 || 101 || 110
|-
| 6 || 1 || 7 || 110 || 111
|}
This last example demonstrates that a leading zero bit does not always indicate a short code; if u < 2<sup>k</sup>, some long codes will begin with a zero bit.
Simple algorithm
Generate the truncated binary encoding for a value x, 0 ≤ x < n, where n > 0 is the size of the alphabet containing x. n need not be a power of two.
<syntaxhighlight lang="C">
string TruncatedBinary (int x, int n)
{
// Set k = floor(log2(n)), i.e., k such that 2^k <= n < 2^(k+1).
int k = 0, t = n;
while (t > 1) { k++; t >>= 1; }
// Set u to the number of unused codewords = 2^(k+1) - n.
int u = (1 << k + 1) - n;
if (x < u)
return Binary(x, k);
else
return Binary(x + u, k + 1));
}
</syntaxhighlight>
The routine <code>Binary</code> is expository; usually just the rightmost <code>len</code> bits of the variable x are desired.
Here we simply output the binary code for x using <code>len</code> bits, padding with high-order 0s if necessary.
<syntaxhighlight lang="C">
string Binary (int x, int len)
{
string s = "";
while (x != 0) {
if (even(x))
s = '0' + s;
else s = '1' + s;
x >>= 1;
}
while (s.Length < len)
s = '0' + s;
return s;
}
</syntaxhighlight>
On efficiency
If n is not a power of two, and k-bit symbols are observed with probability p, then (k + 1)-bit symbols are observed with probability 1 − p. We can calculate the expected number of bits per symbol <math>b_e</math> as
: <math>b_e = p k + (1 - p) (k + 1).</math>
Raw encoding of the symbol has <math>b_u = k + 1</math> bits. Then relative space saving s (see Data compression ratio) of the encoding can be defined as
: <math>s = 1 - \frac{b_e}{b_u} = 1 - \frac{p k + (1 - p) (k + 1)}{k + 1}.</math>
When simplified, this expression leads to
: <math>s = \frac{p}{k + 1} = \frac{p}{b_u}.</math>
This indicates that relative efficiency of truncated binary encoding increases as probability p of k-bit symbols increases, and the raw-encoding symbol bit-length <math>b_u</math> decreases.
See also
- Benford's law
- Golomb coding
