No free lunch in search and optimization

right|thumb|305px|The problem is to rapidly find a solution among candidates a, b, and c that is as good as any other, where goodness is either 0 or 1. There are eight instances ("lunch plates") fxyz of the problem, where x, y, and z indicate the goodness of a, b, and c, respectively. Procedure ("restaurant") A evaluates candidates in the order a, b, c, and B evaluates candidates in reverse that order, but each "charges" 1 evaluation in 5 cases, 2 evaluations in 2 cases, and 3 evaluations in 1 case.

<!--

This figure contradicts the text of the article, and uses terms that are not defined in the article.

right|thumb|450px|Graphical explanation of the consequences of the NFL theorems for [[heuristic algorithms: all heuristics have the same average performance. A given heuristic may outperform another one for a given problem or problem domain, but it will be worse in others problem domains. Furthermore, the better it is in that domain, the worse it will be in other domains]]

-->

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method. The name alludes to the saying "no such thing as a free lunch", that is, no method offers a "short cut". This is under the assumption that the search space is a probability density function. It does not apply to the case where the search space has underlying structure (e.g., is a differentiable function) that can be exploited more efficiently (e.g., Newton's method in optimization) than random search or even has closed-form solutions (e.g., the extrema of a quadratic polynomial) that can be determined without search at all. For such probabilistic assumptions, the outputs of all procedures solving a particular type of problem are statistically identical. A colourful way of describing such a circumstance, introduced by David Wolpert and William G. Macready in connection with the problems of search and optimization,

is to say that there is no free lunch. Wolpert had previously derived no free lunch theorems for machine learning (statistical inference).

Before Wolpert's article was published, Cullen Schaffer independently proved a restricted version of one of Wolpert's theorems and used it to critique the current state of machine learning research on the problem of induction.

In the "no free lunch" metaphor, each "restaurant" (problem-solving procedure) has a "menu" associating each "lunch plate" (problem) with a "price" (the performance of the procedure in solving the problem). The menus of restaurants are identical except in one regard – the prices are shuffled from one restaurant to the next. For an omnivore who is as likely to order each plate as any other, the average cost of lunch does not depend on the choice of restaurant. But a vegan who goes to lunch regularly with a carnivore who seeks economy might pay a high average cost for lunch. To methodically reduce the average cost, one must use advance knowledge of a) what one will order and b) what the order will cost at various restaurants. That is, improvement of performance in problem-solving hinges on using prior information to match procedures to problems.<!--

--><!--

--> This condition does not hold precisely in practice,

Overview

Some computational problems are solved by searching for good solutions in a space of candidate solutions. A description of how to repeatedly select candidate solutions for evaluation is called a search algorithm. On a particular problem, different search algorithms may obtain different results, but over all problems, they are indistinguishable. It follows that if an algorithm achieves superior results on some problems, it must pay with inferiority on other problems. In this sense there is no free lunch in search. The "no free lunch" results indicate that matching algorithms to problems gives higher average performance than does applying a fixed algorithm to all. Igel and Toussaint

Wolpert and Macready stipulate that an algorithm never reevaluates a candidate solution, and that algorithm performance is measured on outputs. For instance, if each candidate solution is encoded as a sequence of 300 0's and 1's, and the goodness values are 0 and 1, then most objective functions have Kolmogorov complexity of at least 2300 bits, and this is greater than Lloyd's bound of 1090 ≈ 2299 bits. It follows that the original "no free lunch" theorem does not apply to what can be stored in a physical computer; instead the so-called "tightened" no free lunch theorems need to be applied. It has also been shown that NFL results apply to incomputable functions.

Formal synopsis

<math>Y^X</math> is the set of all objective functions f:X→Y, where <math>X</math> is a finite solution space and <math>Y</math> is a finite poset. The set of all permutations of X is J. A random variable F is distributed on <math>Y^X</math>. For all j in J, F o j is a random variable distributed on <math>Y^X</math>, with P(F o j = f) = P(F = f o j−1) for all f in <math>Y^X</math>.

Let a(f) denote the output of search algorithm a on input f. If a(F) and b(F) are identically distributed for all search algorithms a and b, then F has an NFL distribution. This condition holds if and only if F and F o j are identically distributed for all j in J. Set-theoretic NFL theorems have recently been generalized to arbitrary cardinality <math>X</math> and <math>Y</math>.

Origin

Wolpert and Macready give two principal NFL theorems, the first regarding objective functions that do not change while search is in progress, and the second regarding objective functions that may change. Several comments are in order:

A general-purpose almost-universal optimizer exists theoretically. Each search algorithm performs well on almost all objective functions.<!--

-->.

Coevolution

Wolpert and Macready have proved that there are free lunches in coevolutionary optimization.