In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown population parameter (for example, the population mean). More formally, it is the application of a point estimator to the data to obtain a point estimate.

Point estimation can be contrasted with interval estimation: such interval estimates are typically either confidence intervals, in the case of frequentist inference, or credible intervals, in the case of Bayesian inference. More generally, a point estimator can be contrasted with a set estimator. Examples are given by confidence sets or credible sets. A point estimator can also be contrasted with a distribution estimator. Examples are given by confidence distributions, randomized estimators, and Bayesian posteriors.

Properties of point estimates

Biasedness

“Bias” is defined as the difference between the expected value of the estimator and the true value of the population parameter being estimated. It can also be described that the closer the expected value of a parameter is to the measured parameter, the lesser the bias. When the estimated number and the true value is equal, the estimator is considered unbiased. This is called an unbiased estimator. The estimator will become a best unbiased estimator if it has minimum variance. However, a biased estimator with a small variance may be more useful than an unbiased estimator with a large variance.

If we let T = h(X<sub>1</sub>,X<sub>2</sub>, . . . , X<sub>n</sub>) be an estimator based on a random sample X<sub>1</sub>,X<sub>2</sub>, . . . , X<sub>n</sub>, the estimator T is called an unbiased estimator for the parameter θ if E[T] = θ, irrespective of the value of θ.<math display="block">\mathbb{E}_\theta[T(X)]=\mathrm{arg}\min_{\theta'}\mathbb{E}_\theta[(T(X)-\theta')^2]=\theta.</math>Thus, a more general condition for unbiasedness is given by<math display="block">\mathrm{arg}\min_{\theta'}\mathbb{E}_\theta\big[W\big(T(X),\theta'\big)\big]=\theta,</math>for some function <math>W</math>. For example, if <math>W(T(X),\theta')=|T(X)-\theta'|</math>, then the estimator is called median-unbiased, since the median is a minimiser of the mean absolute error.

Consistency

Consistency is about whether the point estimate stays close to the value when the parameter increases its size. The larger the sample size, the more accurate the estimate is. If a point estimator is consistent, its expected value and variance should be close to the true value of the parameter. An unbiased estimator is consistent if the limit of the variance of estimator T equals zero.

Efficiency

Let T<sub>1</sub> and T<sub>2</sub> be two unbiased estimators for the same parameter θ. The estimator T<sub>2</sub> would be called more efficient than estimator T<sub>1</sub> if Var(T<sub>2</sub>) < Var(T<sub>1</sub>), irrespective of the value of θ.

Estimation methods

Below are some commonly used methods for estimating unknown parameters. The methods vary in their domain of applicability and on the underlying statistical paradigm (frequentist or Bayesian).

Frequentist estimation

Maximum likelihood estimation (MLE)

The method of maximum likelihood, due to R.A. Fisher, is arguably the most important general method of estimation. This estimator method attempts to acquire unknown parameters that maximize the likelihood function. It uses a known model (ex. the normal distribution) and uses the values of parameters in the model that maximize a likelihood function to find the most suitable match for the data.

Let X = (X<sub>1</sub>, X<sub>2</sub>, ... ,X<sub>n</sub>) denote a random sample with joint probability density function or probability mass function f(x, θ) (θ may be a vector). The function f(x, θ), considered as a function of θ, is called the likelihood function. In this case, it is denoted by L(θ). The principle of maximum likelihood consists of choosing an estimate within the admissible range of θ, that maximizes the likelihood. This estimator is called the maximum likelihood estimate (MLE) of θ. In order to obtain the MLE of θ, we use the equation<math display="block">

\frac{\mathrm{d}\log L(\theta)}{\mathrm{d}\theta_i}=0, \quad i=1,\dots,k.

</math>If θ is a vector, then partial derivatives are considered to get the likelihood equations. However, due to the simplicity, this method is not always accurate and can be easily biased.

Let (X<sub>1</sub>, X<sub>2</sub>,…X<sub>n</sub>) be a random sample from a population having p.d.f. (or p.m.f) f(x,θ), θ = (θ<sub>1</sub>, θ<sub>2</sub>, …, θ<sub>k</sub>). The objective is to estimate the parameters θ<sub>1</sub>, θ<sub>2</sub>, ..., θ<sub>k</sub>. Further, let the first k population moments about zero exist as explicit function of θ, i.e. μ<sub>r</sub> = μ<sub>r</sub>(θ<sub>1</sub>, θ<sub>2</sub>,…, θ<sub>k</sub>), r = 1, 2, …, k. In the method of moments, we equate k sample moments with the corresponding population moments. Generally, the first k moments are taken because the errors due to sampling increase with the order of the moment. Thus, we get k equations μ<sub>r</sub>(θ<sub>1</sub>, θ<sub>2</sub>,…, θ<sub>k</sub>) = m<sub>r</sub>, r = 1, 2, …, k. Solving these equations we get the method of moment estimators (or estimates) as

Bayesian estimation

Bayesian inference is typically based on the posterior distribution. Many Bayesian point estimators are the posterior distribution's statistics of central tendency, e.g., its mean, median, or mode:

  • Posterior mean, which minimizes the (posterior) risk (expected loss) for a squared-error loss function; in Bayesian estimation, the risk is defined in terms of the posterior distribution, as observed by Gauss.
  • Posterior median, which minimizes the posterior risk for the absolute-value loss function, as observed by Laplace.
  • maximum a posteriori (MAP), which finds a maximum of the posterior distribution; for a uniform prior probability, the MAP estimator coincides with the maximum-likelihood estimator;

The MAP estimator has good asymptotic properties, even for many difficult problems, on which the maximum-likelihood estimator has difficulties.

For regular problems, where the maximum-likelihood estimator is consistent, the maximum-likelihood estimator ultimately agrees with the MAP estimator.

Bayesian estimators are admissible, by Wald's theorem.

The Minimum Message Length (MML) point estimator is based in Bayesian information theory and is not so directly related to the posterior distribution.

Special cases of Bayesian filters are important:

  • Kalman filter
  • Wiener filter

Several methods of computational statistics have close connections with Bayesian analysis:

  • particle filter
  • Markov chain Monte Carlo (MCMC)

Point estimate v.s. confidence interval estimate

thumb|Point estimation and confidence interval estimation.

There are two major types of estimates: point estimate and confidence interval estimate. In the point estimate we try to choose a unique point in the parameter space which can reasonably be considered as the true value of the parameter. On the other hand, instead of unique estimate of the parameter, we are interested in constructing a family of sets that contain the true (unknown) parameter value with a specified probability. In many problems of statistical inference we are not interested only in estimating the parameter or testing some hypothesis concerning the parameter, we also want to get a lower or an upper bound or both, for the real-valued parameter. To do this, we need to construct a confidence interval.

Confidence interval describes how reliable an estimate is. We can calculate the upper and lower confidence limits of the intervals from the observed data. Suppose a dataset x<sub>1</sub>, . . . , x<sub>n</sub> is given, modeled as realization of random variables X<sub>1</sub>, . . . , X<sub>n</sub>. Let θ be the parameter of interest, and γ a number between 0 and 1. If there exist sample statistics L<sub>n</sub> = g(X<sub>1</sub>, . . . , X<sub>n</sub>) and U<sub>n</sub> = h(X<sub>1</sub>, . . . , X<sub>n</sub>) such that P(L<sub>n</sub> < θ < U<sub>n</sub>) = γ for every value of θ, then (l<sub>n</sub>, u<sub>n</sub>), where l<sub>n</sub> = g(x<sub>1</sub>, . . . , x<sub>n</sub>) and u<sub>n</sub> = h(x<sub>1</sub>, . . . , x<sub>n</sub>), is called a 100γ% confidence interval for θ. The number γ is called the confidence level.

Here two limits are computed from the set of observations, say l<sub>n</sub> and u<sub>n</sub> and it is claimed with a certain degree of confidence (measured in probabilistic terms) that the true value of γ lies between l<sub>n</sub> and u<sub>n</sub>. Thus we get an interval (l<sub>n</sub> and u<sub>n</sub>) which we expect would include the true value of γ(θ). So this type of estimation is called confidence interval estimation.