Selection (evolutionary algorithm)

Selection is a genetic operator in an evolutionary algorithm (EA). An EA is a metaheuristic inspired by biological evolution and aims to solve challenging problems at least approximately. Selection has a dual purpose: on the one hand, it can choose individual genomes from a population for subsequent breeding (e.g., using the crossover operator). In addition, selection mechanisms are also used to choose candidate solutions (individuals) for the next generation. The biological model is natural selection.

Retaining the best individual(s) of one generation unchanged in the next generation is called elitism or elitist selection. It is a successful (slight) variant of the general process of constructing a new population.

The basis for selection is the quality of an individual, which is determined by the fitness function. In memetic algorithms, an extension of EA, selection also takes place in the selection of those offspring that are to be improved with the help of a meme (e.g. a heuristic).

A selection procedure for breeding used early on may be implemented as follows:

The fitness values that have been computed (fitness function) are normalized, such that the sum of all resulting fitness values equals 1.
Accumulated normalized fitness values are computed: the accumulated fitness value of an individual is the sum of its own fitness value plus the fitness values of all the previous individuals; the accumulated fitness of the last individual should be 1, otherwise something went wrong in the normalization step.
A random number R between 0 and 1 is chosen.
The selected individual is the first one whose accumulated normalized value is greater than or equal to R.

For many problems the above algorithm might be computationally demanding. A simpler and faster alternative uses the so-called stochastic acceptance.

If this procedure is repeated until there are enough selected individuals, this selection method is called fitness proportionate selection or roulette-wheel selection. If instead of a single pointer spun multiple times, there are multiple, equally spaced pointers on a wheel that is spun once, it is called stochastic universal sampling.

Repeatedly selecting the best individual of a randomly chosen subset is tournament selection. Taking the best half, third or another proportion of the individuals is truncation selection.

There are other selection algorithms that do not consider all individuals for selection, but only those with a fitness value that is higher than a given (arbitrary) constant. Other algorithms select from a restricted pool where only a certain percentage of the individuals are allowed, based on fitness value.

Methods of selection

The listed methods differ mainly in the selection pressure, which can be set by a strategy parameter in the rank selection described below. The higher the selection pressure, the faster a population converges against a certain solution and the search space may not be explored sufficiently. This premature convergence can be counteracted by structuring the population appropriately. There is a close correlation between the population model used and a suitable selection pressure.

Roulette wheel selection

In the roulette wheel selection, the probability of choosing an individual for breeding of the next generation is proportional to its fitness, the better the fitness is, the higher chance for that individual to be chosen.

Choosing individuals can be depicted as spinning a roulette that has as many pockets as there are individuals in the current generation, with sizes depending on their probability.

Probability of choosing individual <math>i</math> is equal to <math>p_i = \frac{f_i}{\Sigma_{j=1}^{N} f_j}</math>, where <math>f_i</math> is the fitness of <math>i</math> and <math>N</math> is the size of current generation (note that in this method one individual can be drawn multiple times).

Stochastic universal sampling

Stochastic universal sampling is a development of roulette wheel selection with minimal spread and no bias.

Rank selection

In rank selection, the probability for selection does not depend directly on the fitness, but on the fitness rank of an individual within the population. This can be particularly helpful in applications with restrictions, since it facilitates the overcoming of a restriction in several intermediate steps, i.e. via a sequence of several individuals rated poorly due to restriction violations.

Linear rank selection

Linear ranking, which goes back to Baker, is often used. It allows the selection pressure to be set by the parameter <math>sp </math>, which can take values between 1.0 (no selection pressure) and 2.0 (high selection pressure). The probability <math>P </math> for <math>n</math> rank positions <math>R_i </math> is obtained as follows:

:<math>P(R_i) =\frac{1}{n}\Bigl(sp-(2sp-2)\frac{i-1}{n-1}\Bigr) \quad \quad 1\leq i \leq n ,\quad 1 \leq sp \leq 2 \quad \mathsf{with} \quad P(R_i) \ge 0, \quad \sum_{i=1}^nP(R_i)=1 </math>

Another definition for the probability <math>P</math> for rank positions <math>i</math> is:

:<math>P(i) =\frac{2*(n-i+1)}{n*(n+1)}</math>

Exponential rank selection

Exponential rank selection is defined as follows:

Lexicase selection

Most selection algorithms select individual genomes on the basis of scalar fitness values, which in many case will have been derived from multiple training cases. By contrast, Lexicase selection considers performance on individual training cases separately, rather than aggregating performance measures across multiple cases. It considers cases in different random orders, and therefore with different priorities, for each selection event.

References

External links

Introduction to Genetic Algorithms
An outline of implementation of the stochastic-acceptance version

nl:Genetisch algoritme#Selectie