Rényi entropy

In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

The Rényi entropy is important in ecology and statistics as index of diversity. The Rényi entropy is also important in quantum information, where it can be used as a measure of entanglement. In the Heisenberg XY spin chain model, the Rényi entropy as a function of can be calculated explicitly because it is an automorphic function with respect to a particular subgroup of the modular group. In theoretical computer science, the min-entropy is used in the context of randomness extractors.

Definition

Let <math>X</math> be a discrete random variable with possible outcomes in the set <math>\mathcal{A} = \{x_1,x_2,...,x_n\}</math> and corresponding probabilities <math>p_i \doteq \Pr(X=x_i)</math> for . Then, the Rényi entropy of order , where <math>0 < \alpha < \infin</math> and , is defined as If the probabilities are all nonzero, it is simply the logarithm of the cardinality of the alphabet () of , sometimes called the Hartley entropy of ,

<math display="block">\Eta_0 (X) = \log n = \log |\mathcal{A}|\,</math>

Shannon entropy

The limiting value of <math>\Eta_\alpha</math> as <math>\alpha \to 1</math> is the Shannon entropy:

<math display="block">\Eta_1 (X) \equiv \lim_{\alpha \to 1} \Eta_{\alpha} (X) = - \sum_{i=1}^n p_i \log p_i </math>

Collision entropy

Collision entropy, sometimes just called "Rényi entropy", refers to the case ,

where <math> X </math> and <math> Y </math> are independent and identically distributed. The collision entropy is related to the index of coincidence. It is the negative logarithm of the Simpson diversity index. The quantity <math>2^{H_2} = 1/\sum p_i^2</math> is also known in random matrix theory as the participation ratio, where it measures the effective number of eigenstates contributing to a quantum state or, more generally, the effective number of eigenvalues contributing to a matrix spectrum.

Min-entropy

In the limit as , the Rényi entropy <math>\Eta_\alpha</math> converges to the min-entropy :

<math display="block">\Eta_\infty(X) \doteq \min_i (-\log p_i) = -(\max_i \log p_i) = -\log \max_i p_i\,.</math>

Equivalently, the min-entropy <math>\Eta_\infty(X)</math> is the largest real number such that all events occur with probability at most .

The name min-entropy stems from the fact that it is the smallest entropy measure in the family of Rényi entropies.

In this sense, it is the strongest way to measure the information content of a discrete random variable.

In particular, the min-entropy is never larger than the Shannon entropy.

The min-entropy has important applications for randomness extractors in theoretical computer science:

Extractors are able to extract randomness from random sources that have a large min-entropy; merely having a large Shannon entropy does not suffice for this task.

Inequalities for different orders α

That <math>\Eta_\alpha</math> is non-increasing in <math>\alpha</math> for any given distribution of probabilities ,

which can be proven by differentiation, as

<math display="block">-\frac{d \Eta_\alpha}{d\alpha}

= \frac{1}{(1-\alpha)^2} \sum_{i=1}^n z_i \log(z_i / p_i) = \frac{1}{(1-\alpha)^2} D_\text{KL}(z\| p)</math>

which is proportional to Kullback–Leibler divergence (which is always non-negative), where

<math display="inline">z_i = p_i^\alpha / \sum_{j=1}^n p_j^\alpha</math>. In particular, it is strictly positive except when the distribution is uniform.

At the <math>\alpha \to 1</math> limit, we have <math display="inline">-\frac{d \Eta_\alpha}{d\alpha} \to \frac{1}{2} \sum_i p_i {\left(\ln p_i + H(p)\right)}^2</math>.

In particular cases inequalities can be proven also by Jensen's inequality:

<math display="block">\log n=\Eta_0\geq \Eta_1 \geq \Eta_2 \geq \Eta_\infty .</math>

For values of , inequalities in the other direction also hold. In particular, we have

<math display="block"> \Eta_2 \le 2 \Eta_\infty .</math>

On the other hand, the Shannon entropy <math>\Eta_1</math> can be arbitrarily high for a random variable <math>X</math> that has a given min-entropy. An example of this is given by the sequence of random variables <math>X_n \sim \{0, \ldots, n\}</math> for <math>n \geq 1</math> such that <math>P(X_n = 0) = 1/2</math> and <math>P(X_n = x) = 1/(2n)</math> since <math>\Eta_\infty(X_n) = \log 2</math> but .

Rényi divergence

As well as the absolute Rényi entropies, Rényi also defined a spectrum of divergence measures generalising the Kullback–Leibler divergence.

The Rényi divergence of order or alpha-divergence of a distribution from a distribution is defined to be

<math display="block">\begin{align}

D_\alpha (P \Vert Q) &= \frac{1}{\alpha-1} \log\left(\sum_{i=1}^n \frac{p_i^\alpha}{q_i^{\alpha-1\right) \\[1ex]

&= \frac{1}{\alpha-1}\log \mathbb E_{i \sim p}\left[{\left(p_i/q_i\right)}^{\alpha - 1}\right] \,

\end{align} </math>

when and . We can define the Rényi divergence for the special values by taking a limit, and in particular the limit gives the Kullback–Leibler divergence.

Some special cases:

: minus the log probability under that ;
: minus twice the logarithm of the Bhattacharyya coefficient; ()
: the Kullback–Leibler divergence;
: the log of the expected ratio of the probabilities;
: the log of the maximum ratio of the probabilities.

The Rényi divergence is indeed a divergence, meaning simply that <math>D_\alpha (P \| Q)</math> is greater than or equal to zero, and zero only when . For any fixed distributions and , the Rényi divergence is nondecreasing as a function of its order , and it is continuous on the set of for which it is finite,

<math display="block">{\rm ExpectedRate} = \frac{1}{R}\, D_1(b\|m) + \frac{R-1}{R}\, D_{1/R}(b\|m) \,,</math>

where <math>m</math> is the distribution defining the official odds (i.e. the "market") for the game, <math>b</math> is the investor-believed distribution and <math>R</math> is the investor's risk aversion (the Arrow–Pratt relative risk aversion).

If the true distribution is <math>p</math> (not necessarily coinciding with the investor's belief ), the long-term realized rate converges to the true expectation which has a similar mathematical structure

\Eta_\alpha(p_F(x;\theta)) = \frac{1}{1-\alpha} \left(F(\alpha\theta)-\alpha F(\theta) + \log E_p\left[e^{(\alpha-1)k(x)}\right]\right)

</math>

and

D_\alpha(p:q) = \frac{J_{F,\alpha}(\theta:\theta')}{1-\alpha}

</math>

where

J_{F,\alpha}(\theta:\theta')= \alpha F(\theta)+(1-\alpha) F(\theta')- F(\alpha\theta+(1-\alpha)\theta')

</math>

is a Jensen difference divergence.

Quantum information

The Rényi entropy can be generalized to the quantum case as

:<math> H_\alpha(\rho) := \frac{1}{1-\alpha}\log\operatorname{tr}(\rho^\alpha)</math>

where <math>\rho</math> is normalized density matrix. Its limit as <math>\alpha\to 1</math> is the von Neumann entropy.

The Rényi relative entropy (or divergence) can be generalized in two possible ways: the straightforward quantum Rényi relative entropy

:<math>D_\alpha(\rho\|\sigma) = \frac{1}{\alpha-1}\log\operatorname{tr}(\rho^\alpha\sigma^{1-\alpha})</math>

and the sandwiched Rényi relative entropy

:<math>\tilde{D}_\alpha(\rho\|\sigma) = \frac{1}{\alpha-1}\log\operatorname{tr}\left[\left(\sigma^\frac{1-\alpha}{2\alpha}\rho\sigma^\frac{1-\alpha}{2\alpha}\right)^\alpha\right]</math>

Both converge to the quantum relative entropy in the limit <math>\alpha \to 1</math>. However, the sandwiched Rényi relative entropy has more convenient properties when used to defined conditional entropies,

Stabilizer Rényi entropy

One measure of quantum magic is the stabilizer Rényi entropy of order α:

:<math>M_\alpha = \frac{1}{1 - \alpha} \log \left[\frac{1}{2^N} \sum_{\hat{P} \in \mathbb{P}_N} \langle \hat{P}\rangle^{2\alpha}\right]</math>

with <math>\hat{P}</math> being the multi-qubit Pauli group and <math>\langle \hat{P} \rangle = \langle \psi | \hat{P} | \psi \rangle </math> is an expectation value (quantum mechanics) for an <math>N</math>-qubit system.

Ecology

Hill (1973) showed that the exponential of the Rényi entropy of order α defines a one-parameter family of diversity indices, now known as Hill numbers: <sup>α</sup>D = exp(H<sub>α</sub>). For α = 0, this gives the species richness (total count of species present); for α = 1, the exponential of the Shannon entropy; and for α = 2, the inverse Simpson index, which counts the effective number of equally common species. Hill's framework unifies several previously disparate diversity measures as members of a single parametric family, differing only in how much weight is given to rare versus common species.

Rényi entropy

Definition

Shannon entropy

Collision entropy

Min-entropy

Inequalities for different orders α

Rényi divergence

Quantum information

Stabilizer Rényi entropy

Ecology

See also

Notes

References