Generative models are a class of computational models frequently used for classification. In machine learning, it typically models the joint distribution of inputs and outputs, such as P(X,Y), or it models how inputs are distributed within each class, such as P(X∣Y) together with a class prior P(Y). Because it describes a full data-generating process, a generative model can be used to draw new samples that resemble the observed data, a process often referred to as synthetic data generation. Generative models are used for density estimation, simulation, and learning with missing or partially labeled data. In classification, they can predict labels by combining P(X∣Y) and P(Y) and applying Bayes' rule. Generative models are often contrasted with discriminative models, which focus on predicting outputs from inputs directly.

Generative model approaches which uses a joint probability distribution instead, include naive Bayes classifiers, Gaussian mixture models, variational autoencoders, generative adversarial networks and others.

Definition

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished:

  1. A generative model is a statistical model of the joint probability distribution <math>P(X, Y)</math> on a given observable variable X and target variable Y; A generative model can be used to "generate" random instances (outcomes) of an observation x. refers to these three classes as generative learning, conditional learning, and discriminative learning, but only distinguish two classes, calling them generative classifiers (joint distribution) and discriminative classifiers (conditional distribution or no distribution), not distinguishing between the latter two classes. Analogously, a classifier based on a generative model is a generative classifier, while a classifier based on a discriminative model is a discriminative classifier, though this term also refers to classifiers that are not based on a model.

In application to classification, one wishes to go from an observation x to a label y (or probability distribution on labels). One can compute this directly, without using a probability distribution (distribution-free classifier); one can estimate the probability of a label given an observation, <math>P(Y|X=x)</math> (discriminative model), and base classification on that; or one can estimate the joint distribution <math>P(X, Y)</math> (generative model), from that compute the conditional probability <math>P(Y|X=x)</math>, and then base classification on that. These are increasingly indirect, but increasingly probabilistic, allowing more domain knowledge and probability theory to be applied. In practice different approaches are used, depending on the particular problem, and hybrids can combine strengths of multiple approaches.

An alternative division defines these symmetrically as:

  • a generative model is a model of the conditional probability of the observable X, given a target y, symbolically, <math>P(X\mid Y = y)</math>
  • a discriminative model is a model of the conditional probability of the target Y, given an observation x, symbolically, <math>P(Y\mid X = x)</math>

Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances (outcomes), either of an observation and target <math>(x, y)</math>, or of an observation x given a target value y,

Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express complex relationships between the observed and target variables. But in general, they don't necessarily perform better than generative models at classification and regression tasks. The two classes are seen as complementary or as different views of the same procedure.

Applications

  • Sampling / simulation
  • Classification
  • Density estimation and likelihood
  • Missing data and imputation
  • Anomaly detection
  • Semi-supervised learning

Examples

Simple example

Suppose the input data is <math>x \in \{1, 2\}</math>, the set of labels for <math>x</math> is <math>y \in \{0, 1\}</math>, and there are the following 4 data points:

<math>(x,y) = \{(1,0), (1,1), (2,0), (2,1)\}</math>

For the above data, estimating the joint probability distribution <math>p(x,y)</math> from the empirical measure will be the following:

{| class="wikitable"

|-

! !! <math>y=0

</math>!! <math>y=1

</math>

|-

| <math>x=1

</math> || <math>1/4

</math> ||<math>1/4

</math>

|-

| <math>x=2

</math> || <math>1/4

</math> || <math>1/4

</math>

|}

while <math>p(y|x)</math> will be following:

{| class="wikitable"

|-

! !! <math>y=0

</math> !! <math>y=1

</math>

|-

| <math>x=1

</math>

| <math>1/2

</math> || <math>1/2

</math>

|-

| <math>x=2

</math> || <math>1/2

</math> || <math>1/2

</math>

|}

Text generation

gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with "representing and speedily is an good"; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc.

Families and types

Generative models

Types of generative models are:

  • Gaussian mixture model (and other types of mixture model)
  • Hidden Markov model
  • Probabilistic context-free grammar
  • Bayesian network (e.g. Naive bayes, Autoregressive model)
  • Generative adversarial network
  • Generative artificial intelligence
  • Averaged one-dependence estimators
  • Latent Dirichlet allocation
  • Boltzmann machine (e.g. Restricted Boltzmann machine, Deep belief network)
  • Variational autoencoder
  • Flow-based generative model
  • Energy based model
  • Diffusion model
  • Linear discriminant analysis

If the observed data are truly sampled from the generative model, then fitting the parameters of the generative model to maximize the data likelihood is a common method. However, since most statistical models are only approximations to the true distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand. In such cases, it can be more accurate to model the conditional density functions directly using a discriminative model (see below), although application-specific details will ultimately dictate which approach is most suitable in any particular case.

Deep generative models

With the rise of deep learning, a new family of methods, called deep generative models (DGMs), is formed through the combination of generative models and deep neural networks. An increase in the scale of the neural networks is typically accompanied by an increase in the scale of the training data, both of which are required for good performance.

Popular DGMs include variational autoencoders (VAEs), generative adversarial networks (GANs), and auto-regressive models. Recently, there has been a trend to build very large deep generative models. are auto-regressive neural language models that contain billions of parameters, BigGAN and VQ-VAE which are used for image generation that can have hundreds of millions of parameters, and Jukebox is a very large generative model for musical audio that contains billions of parameters.

See also

  • Discriminative model
  • Graphical model

Notes

References

Sources

  • , (mirror, mirror), published as book (above)