</math>
| kurtosis = <math>6+\frac{p^2}{1-p}</math>
| entropy = <math>\tfrac{-(1-p)\log (1-p) - p \log p}{p}</math>
| fisher = <math>\tfrac{1}{p^2 (1-p)}</math>
| mgf = <math>\frac{pe^t}{1-(1-p) e^t},</math><br />for <math>t<-\ln(1-p)</math>
| char = <math>\frac{pe^{it{1-(1-p)e^{it</math>
| pgf = <math>\frac{pz}{1-(1-p)z}</math>
| parameters2 = <math>0 < p \leq 1</math> success probability (real)
| support2 = k failures where <math>k \in \mathbb{N}_0 = \{0, 1, 2, \dotsc\}</math>
| pdf2 = <math>(1 - p)^k p</math>
| cdf2 = <math>1-(1 - p)^{\lfloor x\rfloor+1}</math> for <math>x\geq 0</math>,<br /><math>0</math> for <math>x<0</math>
| mean2 = <math>\frac{1-p}{p}</math>
| median2 = <math>\left\lceil \frac{-1}{\log_2(1-p)} \right\rceil - 1</math> <br />
(not unique if <math>-1/\log_2(1-p)</math> is an integer)
| mode2 = <math>0</math>
| variance2 = <math>\frac{1-p}{p^2}</math>
| skewness2 = <math>\frac{2-p}{\sqrt{1-p</math>
| kurtosis2 = <math>6+\frac{p^2}{1-p}</math>
| entropy2 = <math>\tfrac{-(1-p)\log (1-p) - p \log p}{p}</math>
| fisher2 = <math>\tfrac{1}{p^2 (1-p)}</math>
| mgf2 = <math>\frac{p}{1-(1-p)e^t},</math><br />for <math>t<-\ln(1-p)</math>
| char2 = <math>\frac{p}{1-(1-p)e^{it</math>
| pgf2 = <math>\frac{p}{1-(1-p)z}</math>
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
- The probability distribution of the number <math>X</math> of Bernoulli trials needed to get one success, supported on <math>\mathbb{N} = \{1,2,3,\ldots\}</math>;
- The probability distribution of the number <math>Y=X-1</math> of failures before the first success, supported on <math>\mathbb{N}_0 = \{0, 1, 2, \ldots \} </math>.
These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of <math>X</math>); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.
The geometric distribution gives the probability that the first occurrence of success requires <math>k</math> independent trials, each with success probability <math>p</math>. If the probability of success on each trial is <math>p</math>, then the probability that the <math>k</math>-th trial is the first success is
<math display="block">\Pr(X = k) = (1-p)^{k-1}p</math>
for <math>k=1,2,3,4,\dots</math>
The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:
<math display="block">\Pr(Y=k) = \Pr(X=k+1) = (1 - p)^k p</math>
for <math>k=0,1,2,3,\dots</math>
The geometric distribution gets its name because its probabilities follow a geometric sequence. It is sometimes called the Furry distribution after Wendell H. Furry.
The support may also be <math>\mathbb{N}_0</math>, defining <math>Y=X-1</math>. This alters the probability mass function into <math display="block">P(Y = k) = (1 - p)^k p</math> where <math>k = 0, 1, 2, \dotsc</math> is the number of failures before the first success.
An alternative parameterization of the distribution gives the probability mass function <math display="block">P(Y = k) = \left(\frac{P}{Q}\right)^k \left(1-\frac{P}{Q}\right)</math> where <math>P = \frac{1-p}{p}</math> and <math>Q = \frac{1}{p}</math>. It is the discrete version of the same property found in the exponential distribution. The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success.
Because there are two definitions of the geometric distribution, there are also two definitions of memorylessness for discrete random variables. Expressed in terms of conditional probability, the two definitions are
<math display="block">\Pr(X>m+n\mid X>n)=\Pr(X>m),</math>
and
<math display="block">\Pr(Y>m+n\mid Y\geq n)=\Pr(Y>m),</math>
where <math>m</math> and <math>n</math> are natural numbers, <math>X</math> is a geometrically distributed random variable defined over <math>\mathbb{N}</math>, and <math>Y</math> is a geometrically distributed random variable defined over <math>\mathbb{N}_0</math>. Note that these definitions are not equivalent for discrete random variables; <math>Y</math> does not satisfy the first equation and <math>X</math> does not satisfy the second.
Moments and cumulants
The expected value and variance of a geometrically distributed random variable <math>X</math> defined over <math>\mathbb{N}</math> is
For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is <math>\frac{1}{1/6} = 6</math> and the average number of failures is
The moment generating function of the geometric distribution when defined over <math> \mathbb{N} </math> and <math>\mathbb{N}_0</math> respectively is
The cumulant generating function of the geometric distribution defined over <math>\mathbb{N}_0</math> is and <math>\left\lfloor-\frac{\log 2}{\log(1-p)}\right\rfloor</math> when defined over <math>\mathbb{N}_0</math>. Therefore, the excess kurtosis of the geometric distribution is <math>6 + \frac{p^2}{1-p}</math>. Since <math>\frac{p^2}{1-p} \geq 0</math>, the excess kurtosis is always positive so the distribution is leptokurtic.
Entropy and Fisher's information
Entropy (geometric distribution, failures before success)
Entropy is a measure of uncertainty in a probability distribution. For the geometric distribution that models the number of failures before the first success, the probability mass function is:
<math display="block">P(X = k) = (1 - p)^k p, \quad k = 0, 1, 2, \dots</math>
The entropy <math>H(X)</math> for this distribution is defined as:
<math display="block">\begin{align}
H(X) &= - \sum_{k=0}^\infty P(X = k) \ln P(X = k) \\
&= - \sum_{k=0}^\infty (1 - p)^k p \ln \left( (1 - p)^k p \right) \\
&= - \sum_{k=0}^\infty (1 - p)^k p \left[ k \ln(1 - p) + \ln p \right] \\
&= -\log p - \frac{1 - p}{p} \log(1 - p)
\end{align}</math>
The entropy increases as the probability <math>p</math> decreases, reflecting greater uncertainty as success becomes rarer.
Fisher's information (geometric distribution, failures before success)
Fisher information measures the amount of information that an observable random variable <math>X</math> carries about an unknown parameter <math>p</math>. For the geometric distribution (failures before the first success), the Fisher information with respect to <math>p</math> is given by:
<math display="block">I(p) = \frac{1}{p^2(1 - p)}</math>
Proof:
- The likelihood function for a geometric random variable <math>X</math> is: <math display="block">L(p; X) = (1 - p)^X p</math>
- The log-likelihood function is: <math display="block">\ln L(p; X) = X \ln(1 - p) + \ln p</math>
- The score function (first derivative of the log-likelihood w.r.t. <math>p</math>) is: <math display="block">\frac{\partial}{\partial p} \ln L(p; X) = \frac{1}{p} - \frac{X}{1 - p}</math>
- The second derivative of the log-likelihood function is: <math display="block">\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{1}{p^2} - \frac{X}{(1 - p)^2}</math>
- Fisher information is calculated as the negative expected value of the second derivative: <math display="block">\begin{align}
I(p) &= -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] \\
&= - \left(-\frac{1}{p^2} - \frac{1 - p}{p (1 - p)^2} \right) \\
&= \frac{1}{p^2(1 - p)}
\end{align}</math>
Fisher information increases as <math>p</math> decreases, indicating that rarer successes provide more information about the parameter <math>p</math>.
Entropy (geometric distribution, trials until success)
For the geometric distribution modeling the number of trials until the first success, the probability mass function is:
<math display="block">P(X = k) = (1 - p)^{k - 1} p, \quad k = 1, 2, 3, \dots</math>
The entropy <math>H(X)</math> for this distribution is the same as that of version modeling trials until failure,
<math display="block">\begin{align}
H(X) &= - \log p - \frac{1 - p}{p} \log(1 - p)
\end{align}</math>
Fisher's information (geometric distribution, trials until success)
Fisher information for the geometric distribution modeling the number of trials until the first success is given by:
<math display="block">I(p) = \frac{1}{p^2(1 - p)}</math>
Proof:
- The likelihood function for a geometric random variable <math>X</math> is:
:: <math>L(p; X) = (1 - p)^{X - 1} p</math>
- The log-likelihood function is:
: <math>\ln L(p; X) = (X - 1) \ln(1 - p) + \ln p</math>
- The score function (first derivative of the log-likelihood w.r.t. <math>p</math>) is:
:: <math>\frac{\partial}{\partial p} \ln L(p; X) = \frac{1}{p} - \frac{X - 1}{1 - p}</math>
- The second derivative of the log-likelihood function is:
:: <math>\frac{\partial^2}{\partial p^2} \ln L(p; X) = -\frac{1}{p^2} - \frac{X - 1}{(1 - p)^2}</math>
- Fisher information is calculated as the negative expected value of the second derivative:
<math display="block">\begin{align}
I(p) &= -E\left[\frac{\partial^2}{\partial p^2} \ln L(p; X)\right] \\
&= - \left(-\frac{1}{p^2} - \frac{1 - p}{p (1 - p)^2} \right) \\
&= \frac{1}{p^2(1 - p)}
\end{align}</math>
General properties
- The probability generating functions of geometric random variables <math> X </math> and <math> Y </math> defined over <math> \mathbb{N} </math> and <math> \mathbb{N}_0 </math> are, respectively,<math display="block">\begin{align}
\varphi_X(t) &= \frac{pe^{it{1-(1-p)e^{it,\\[10pt]
\varphi_Y(t) &= \frac{p}{1-(1-p)e^{it.
\end{align}</math>
- The entropy of a geometric distribution with parameter <math>p</math> is
- The geometric distribution defined on <math> \mathbb{N}_0 </math> is infinitely divisible, that is, for any positive integer <math>n</math>, there exist <math>n</math> independent identically distributed random variables whose sum is also geometrically distributed. This is because the negative binomial distribution can be derived from a Poisson-stopped sum of logarithmic random variables.
Related distributions
- The sum of <math>r</math> independent geometric random variables with parameter <math>p</math> is a negative binomial random variable with parameters <math>r</math> and <math>p</math>. The geometric distribution is a special case of the negative binomial distribution, with <math>r=1</math>.
- The geometric distribution is a special case of discrete compound Poisson distribution.
- Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable X<sub>k</sub> has a Poisson distribution with expected value r<sup>k</sup>/k. Then <math display="block">\sum_{k=1}^\infty k\,X_k</math> has a geometric distribution taking values in <math>\mathbb{N}_0</math>, with expected value r/(1 − r).
- The exponential distribution is the continuous analogue of the geometric distribution. Applying the floor function to the exponential distribution with parameter <math>\lambda</math> creates a geometric distribution with parameter <math>p=1-e^{-\lambda}</math> defined over <math>\mathbb{N}_0</math>. Estimating <math>\mathrm{E}(X)</math> with <math>m_1</math> gives the sample mean, denoted <math> \bar{x} </math>. Substituting this estimate in the formula for the expected value of a geometric distribution and solving for <math> p </math> gives the estimators <math>
\hat{p} = \frac{1}{\bar{x </math> and <math> \hat{p} = \frac{1}{\bar{x}+1} </math> when supported on <math>\mathbb{N}</math> and <math>\mathbb{N}_0</math> respectively. These estimators are biased since <math> \mathrm{E}\left(\frac{1}{\bar{x\right) > \frac{1}{\mathrm{E}(\bar{x})} = p</math> as a result of Jensen's inequality.
Maximum likelihood estimation
The maximum likelihood estimator of <math>p</math> is the value that maximizes the likelihood function given a sample. If the domain is <math>\mathbb{N}_0</math>, then the estimator shifts to <math>\hat{p} = \frac{1}{\bar{x}+1}</math>. As previously discussed in § Method of moments, these estimators are biased.
Regardless of the domain, the bias is equal to
<math display="block">
b \equiv \operatorname{E}\bigg[\;(\hat p_\mathrm{mle} - p)\;\bigg]
= \frac{p\,(1-p)}{n}
</math>
which yields the bias-corrected maximum likelihood estimator,
<math display="block">
\hat{p\,}^*_\text{mle} = \hat{p\,}_\text{mle} - \hat{b\,}
</math>
Bayesian inference
In Bayesian inference, the parameter <math>p</math> is a random variable from a prior distribution with a posterior distribution calculated using Bayes' theorem after observing samples.<math display="block">p \sim \mathrm{Beta}\left(\alpha+n,\ \beta+\sum_{i=1}^n (k_i-1)\right). \!</math>Alternatively, if the samples are in <math>\mathbb{N}_0</math>, the posterior distribution is<math display="block">p \sim \mathrm{Beta}\left(\alpha+n,\beta+\sum_{i=1}^n k_i\right).</math>Since the expected value of a <math>\mathrm{Beta}(\alpha,\beta)</math> distribution is <math>\frac{\alpha}{\alpha+\beta}</math>,
Random generation can be done in constant time by truncating exponential random numbers. An exponential random variable <math>E</math> can become geometrically distributed with parameter <math>p</math> through <math>\lceil -E/\log(1-p) \rceil</math>. In turn, <math>E</math> can be generated from a standard uniform random variable <math>U</math> altering the formula into <math>\lceil \log(U) / \log(1-p)\rceil</math>.
Applications
The geometric distribution is used in many disciplines. In queueing theory, the M/M/1 queue has a steady state following a geometric distribution. In stochastic processes, the Yule Furry process is geometrically distributed. The distribution also arises when modeling the lifetime of a device in discrete contexts. It has also been used to fit data including modeling patients spreading COVID-19.
See also
- Hypergeometric distribution
- Coupon collector's problem
- Compound Poisson distribution
- Negative binomial distribution
