In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method must have an expected code length greater than or equal to the entropy of the source.
More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies <math>\operatorname E_{x\sim P}[\ell(d(x))] \geq \operatorname E_{x\sim P}[-\log_b(P(x))]</math>, where <math>\ell</math> is the function specifying the number of symbols in a code word, <math>d</math> is the coding function, <math>b</math> is the number of symbols used to make output codes and <math>P</math> is the probability of the source symbol. An entropy coding attempts to approach this lower bound.
Two of the most common entropy coding techniques are Huffman coding and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code may be useful. These static codes include universal codes (such as Elias gamma coding or Fibonacci coding) and Golomb codes (such as unary coding or Rice coding). This approach is grounded in the concept of normalized compression distance, a parameter-free, universal similarity metric based on compression that approximates the uncomputable normalized information distance.
See also
- Arithmetic coding
- Asymmetric numeral systems (ANS)
- Context-adaptive binary arithmetic coding (CABAC)
- Huffman coding
- Normalized compression distance
- Range coding
- Source coding theorem
References
External links
- Information Theory, Inference, and Learning Algorithms, by David MacKay (2003), gives an introduction to Shannon theory and data compression, including the Huffman coding and arithmetic coding.
- Source Coding, by T. Wiegand and H. Schwarz (2011).
