In statistics, completeness is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. It is opposed to the concept of an ancillary statistic. While an ancillary statistic contains no information about the model parameters, a complete statistic contains only information about the parameters, and no ancillary information. It is closely related to the concept of a sufficient statistic which contains all of the information that the dataset provides about the parameters.

Definition

Consider a random variable X whose probability distribution belongs to a parametric model P<sub>θ</sub> parametrized by&nbsp;θ.

Say T is a statistic; that is, the composition of a measurable function with a random sample X<sub>1</sub>,...,X<sub>n</sub>.

The statistic T is said to be complete for the distribution of X if, for every measurable function g, The exponential is not zero, so this can only happen if g is zero almost everywhere.

By contrast, the statistic <math display=inline> (X_1,X_2) </math> is sufficient but not complete. It admits a non-zero unbiased estimator of zero, namely <math display=inline> X_1-X_2</math>.

Sufficiency does not imply completeness

Most parametric models have a sufficient statistic which is not complete. This is important because the Lehmann–Scheffé theorem cannot be applied to such models. Galili and Meilijson 2016 propose the following didactic example.

Consider <math>n</math> independent samples from the uniform distribution:

:<math>

X_i \sim U \big( (1-k) \theta , (1+k)\theta \big)

\qquad\qquad

0 < k < 1

</math>

<math>k</math> is a known design parameter. This model is a scale family (a specific case of a location-scale family) model: scaling the samples by a multiplier <math>c</math> multiplies the parameter <math>\theta</math>.

Galili and Meilijson show that the minimum and maximum of the samples are together a sufficient statistic: <math>X_{(1)}, X_{(n)}</math> (using the usual notation for order statistics). Indeed, conditional on these two values, the distribution of the rest of the sample is simply uniform on the range they define: <math>\left[X_{(1)}, X_{(n)}\right]</math>.

However, their ratio has a distribution which does not depend on <math>\theta</math>. This follows from the fact that this is a scale family: any change of scale impacts both variables identically. Subtracting the mean <math>m</math> from that distribution, we obtain:

:<math>

\mathbb E \left[ \frac {X_{(n) {X_{(1)} } \right] - m = 0

</math>

We have thus shown that there exists a function <math>g\left(X_{(1)}, X_{(n)}\right)</math> which is not <math>0</math> everywhere but which has expectation <math>0</math>. The pair is thus not complete.

Importance of completeness

The notion of completeness has many applications in statistics, particularly in the following theorems of mathematical statistics.

Lehmann–Scheffé theorem

Completeness occurs in the Lehmann–Scheffé theorem,

Notes