thumb|Occam's razor is attributed to [[William of Ockham, a Catholic priest and philosopher.]]
In philosophy, Occam's razor (also spelled Ockham's razor or Ocham's razor; ) is the problem-solving principle that recommends searching for explanations constructed with the smallest possible set of elements. It is also known as the principle of parsimony or the law of parsimony (). Attributed to William of Ockham, a 14th-century English philosopher and theologian, it is frequently cited as , which translates as "Entities must not be multiplied beyond necessity", although Occam never used those exact words. Popularly, the principle is sometimes paraphrased as "of two competing theories, the simpler explanation of an entity is to be preferred".
The philosophical razor advocates that when presented with competing hypotheses about the same prediction and both hypotheses have equal explanatory power, one should prefer the hypothesis that requires the fewest assumptions, and that this is not meant to be a way of choosing between hypotheses that make different predictions. Similarly, in science, Occam's razor is used as an abductive heuristic in the development of theoretical models rather than as a rigorous arbiter between candidate models.
History
The phrase Occam's razor did not appear until a few centuries after William of Ockham's death in 1347. Libert Froidmont, in his 1649 Philosophia Christiana de Anima (On Christian Philosophy of the Soul) claimed to have coined the phrase novacula occami when he described Gregory of Rimini, one of Ockham's critics. Ockham did not invent this principle, but its fame – and its association with him – may be due to the frequency and effectiveness with which he used it. Ockham stated the principle in various ways, but the most popular version – "Entities are not to be multiplied without necessity" () – was formulated by the Irish Franciscan philosopher John Punch in his 1639 commentary on the works of Duns Scotus.
Formulations before William of Ockham
thumb|Part of a page from [[John Duns Scotus's book Commentaria oxoniensia ad IV libros magistri Sententiarus, showing the words: "", i.e., "Plurality is not to be posited without necessity"]]
The origins of what has come to be known as Occam's razor are traceable to the works of earlier philosophers such as John Duns Scotus (1265–1308), Robert Grosseteste (1175–1253), Maimonides (Moses ben-Maimon, 1138–1204), and even Aristotle (384–322 BC). Aristotle writes in his Posterior Analytics, "We may assume the superiority [all other things being equal] of the demonstration which derives from fewer postulates or hypotheses." Ptolemy () stated, "We consider it a good principle to explain the phenomena by the simplest hypothesis possible."
Phrases such as "It is vain to do with more what can be done with fewer" and "A plurality is not to be posited without necessity" were commonplace in 13th-century scholastic writing.</blockquote>The Summa Theologica of Thomas Aquinas (1225–1274) states that "it is superfluous to suppose that what can be accounted for by a few principles has been produced by many". Aquinas uses this principle to construct an objection to God's existence, an objection that he in turn answers and refutes generally (cf. quinque viae), and specifically, through an argument based on causality. Hence, Aquinas acknowledges the principle that today is known as Occam's razor, but prefers causal explanations to other simple explanations (cf. also Correlation does not imply causation).
William of Ockham
thumb|[[Manuscript illustration of William of Ockham]]
William of Ockham () was an English Franciscan friar and theologian, an influential medieval philosopher and a nominalist. His popular fame as a great logician rests chiefly on the maxim attributed to him and known as Occam's razor. The term razor refers to distinguishing between two hypotheses either by "shaving away" unnecessary assumptions or cutting apart two similar conclusions.
While it has been claimed that Occam's razor is not found in any of William's writings, one can cite statements such as ("Plurality must never be posited without necessity"), which occurs in his theological work on the Sentences of Peter Lombard (Quaestiones et decisiones in quattuor libros Sententiarum Petri Lombardi; ed. Lugd., 1495, i, dist. 27, qu. 2, K).
Nevertheless, the precise words sometimes attributed to William of Ockham, (Entities must not be multiplied beyond necessity), are absent in his extant works; this particular phrasing comes from John Punch, who described the principle as a "common axiom" (axioma vulgare) of the Scholastics. In his Summa Totius Logicae, i. 12, William of Ockham cites the principle of economy, ("It is futile to do with more things that which can be done with fewer"; Thorburn, 1918, pp. 352–53; Kneale and Kneale, 1962, p. 243.)
Later formulations
To quote Isaac Newton, "We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. Therefore, to the same natural effects we must, as far as possible, assign the same causes." In the sentence hypotheses non fingo ("I contrive no hypotheses"), Newton affirms the success of this approach.
Bertrand Russell offers a particular version of Occam's razor: "Whenever possible, substitute constructions out of known entities for inferences to unknown entities."
Around 1960, Ray Solomonoff founded the theory of universal inductive inference, the theory of prediction based on observations for example, predicting the next symbol based upon a given series of symbols. The only assumption is that the environment follows some unknown but computable probability distribution. This theory is a mathematical formalization of Occam's razor.
Another technical approach to Occam's razor is ontological parsimony. Parsimony means spareness and is also referred to as the Rule of Simplicity. This is considered a strong version of Occam's razor. A variation used in medicine is called the "Zebra": a physician should reject an exotic medical diagnosis when a more commonplace explanation is more likely, derived from Theodore Woodward's dictum "When you hear hoofbeats, think of horses not zebras".
Ernst Mach formulated the stronger version of Occam's razor into physics, which he called the Principle of Economy stating: "Scientists must use the simplest means of arriving at their results and exclude everything not perceived by the senses."
Justifications
Aesthetic
Prior to the 20th century, it was a commonly held belief that nature itself was simple and that simpler hypotheses about nature were thus more likely to be true. Thomas Aquinas made this argument in the 13th century, writing, "If a thing can be done adequately by means of one, it is superfluous to do it by means of several; for we observe that nature does not employ two instruments [if] one suffices."
Beginning in the 20th century, epistemological justifications based on induction, logic, pragmatism, and especially probability theory have become more popular among philosophers.
Any more complex theory might still possibly be true. A study of the predictive validity of Occam's razor found 32 published papers that included 97 comparisons of economic forecasts from simple and complex forecasting methods. None of the papers provided a balance of evidence that complexity of method improved forecast accuracy. In the 25 papers with quantitative comparisons, complexity increased forecast errors by an average of 27 percent.
Practical considerations and pragmatism
Mathematical
One justification of Occam's razor is a direct result of basic probability theory. By definition, all assumptions introduce possibilities for error; if an assumption does not improve the accuracy of a theory, its only effect is to increase the probability that the overall theory is wrong.
There have also been other attempts to derive Occam's razor from probability theory, including notable attempts made by Harold Jeffreys and E. T. Jaynes. The probabilistic (Bayesian) basis for Occam's razor is elaborated by David J. C. MacKay in chapter 28 of his book Information Theory, Inference, and Learning Algorithms, where he emphasizes that a prior bias in favor of simpler models is not required.
William H. Jefferys and James O. Berger (1991) generalize and quantify the original formulation's "assumptions" concept as the degree to which a proposition is unnecessarily accommodating to possible observable data. They state, "A hypothesis with fewer adjustable parameters will automatically have an enhanced posterior probability, due to the fact that the predictions it makes are sharp."
Other philosophers
Karl Popper
Karl Popper argues that a preference for simple theories need not appeal to practical or aesthetic considerations. Our preference for simplicity may be justified by the falsifiability criterion: we prefer simpler theories to more complex ones "because their empirical content is greater; and because they are better testable". The idea here is that a simple theory applies to more cases than a more complex one, and is thus more easily falsifiable. This is again comparing a simple theory to a more complex theory where both explain the data equally well.
Elliott Sober
The philosopher of science Elliott Sober once argued along the same lines as Popper, tying simplicity with "informativeness": The simplest theory is the more informative, in the sense that it requires less information to a question. He has since rejected this account of simplicity, purportedly because it fails to provide an epistemic justification for simplicity. He now believes that simplicity considerations (and considerations of parsimony in particular) do not count unless they reflect something more fundamental. Philosophers, he suggests, may have made the error of hypostatizing simplicity (i.e., endowed it with a sui generis existence), when it has meaning only when embedded in a specific context (Sober 1992). If we fail to justify simplicity considerations on the basis of the context in which we use them, we may have no non-circular justification: "Just as the question 'why be rational?' may have no non-circular answer, the same may be true of the question 'why should simplicity be considered in evaluating the plausibility of hypotheses?
Richard Swinburne
Richard Swinburne argues for simplicity on logical grounds:
According to Swinburne, since our choice of theory cannot be determined by data (see Underdetermination and Duhem–Quine thesis), we must rely on some criterion to determine which theory to use. Since it is absurd to have no logical method for settling on one hypothesis amongst an infinite number of equally data-compliant hypotheses, we should choose the simplest theory: "Either science is irrational [in the way it judges theories and predictions probable] or the principle of simplicity is a fundamental synthetic a priori truth."
Ludwig Wittgenstein
From the Tractatus Logico-Philosophicus:
- 3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor."
: (If everything in the symbolism works as though a sign had meaning, then it has meaning.)
- 4.04 "In the proposition, there must be exactly as many things distinguishable as there are in the state of affairs, which it represents. They must both possess the same logical (mathematical) multiplicity (cf. Hertz's Mechanics, on Dynamic Models)."
- 5.47321 "Occam's Razor is, of course, not an arbitrary rule nor one justified by its practical success. It simply says that unnecessary elements in a symbolism mean nothing. Signs which serve one purpose are logically equivalent; signs which serve no purpose are logically meaningless."
and on the related concept of "simplicity":
- 6.363 "The procedure of induction consists in accepting as true the simplest law that can be reconciled with our experiences."
Uses
Science and the scientific method
thumb|250px|right|[[Andreas Cellarius's illustration of the Copernican system, from the Harmonia Macrocosmica (1660). Future positions of the sun, moon and other solar system bodies can be calculated using a geocentric model (the earth is at the centre) or using a heliocentric model (the sun is at the centre). Both work, but the geocentric model requires a much more complex system of calculations than the heliocentric model. This was pointed out in a preface to Copernicus's first edition of De revolutionibus orbium coelestium.]]
In science, Occam's razor is used as a heuristic to guide scientists in developing theoretical models rather than as an arbiter between published models. in Albert Einstein's formulation of special relativity, and in the development of quantum mechanics by Max Planck, Werner Heisenberg and Louis de Broglie.
In chemistry, Occam's razor is often an important heuristic when developing a model of a reaction mechanism. Although it is useful as a heuristic in developing models of reaction mechanisms, it has been shown to fail as a criterion for selecting among some selected published models. An often-quoted version of this constraint (which cannot be verified as posited by Einstein himself) reduces this to "Everything should be kept as simple as possible, but not simpler."
In the scientific method, Occam's razor is not considered an irrefutable principle of logic or a scientific result; the preference for simplicity in the scientific method is based on the falsifiability criterion. For each accepted explanation of a phenomenon, there may be an extremely large, perhaps even incomprehensible, number of possible and more complex alternatives. Since failing explanations can always be burdened with ad hoc hypotheses to prevent them from being falsified, simpler theories are preferable to more complex ones because they tend to be more testable. As a logical principle, Occam's razor would demand that scientists accept the simplest possible theoretical explanation for existing data. However, science has shown repeatedly that future data often support more complex theories than do existing data. Science prefers the simplest explanation that is consistent with the data available at a given time, but the simplest explanation may be ruled out as new data become available. One can argue for atomic building blocks for matter, because it provides a simpler explanation for the observed reversibility of both and chemical reactions as simple separation and rearrangements of atomic building blocks. At the time, however, the atomic theory was considered more complex because it implied the existence of invisible particles that had not been directly detected. Ernst Mach and the logical positivists rejected John Dalton's atomic theory until the reality of atoms was more evident in Brownian motion, as shown by Albert Einstein.
In the same way, postulating the aether is more complex than transmission of light through a vacuum. At the time, however, all known waves propagated through a physical medium, and it seemed simpler to postulate the existence of a medium than to theorize about wave propagation without a medium. Likewise, Isaac Newton's idea of light particles seemed simpler than Christiaan Huygens's idea of waves, so many favored it. In this case, as it turned out, neither the wave – nor the particle – explanation alone suffices, as light behaves like waves and like particles.
Three axioms presupposed by the scientific method are realism (the existence of objective reality), the existence of natural laws, and the constancy of natural law. Rather than depend on provability of these axioms, science depends on the fact that they have not been objectively falsified. Occam's razor and parsimony support, but do not prove, these axioms of science. The general principle of science is that theories (or models) of natural law must be consistent with repeatable experimental observations. This ultimate arbiter (selection criterion) rests upon the axioms mentioned above.
It is among the cladists that Occam's razor is applied, through the method of cladistic parsimony. Cladistic parsimony (or maximum parsimony) is a method of phylogenetic inference that yields phylogenetic trees (more specifically, cladograms). Cladograms are branching, diagrams used to represent hypotheses of relative degree of relationship, based on synapomorphies. Cladistic parsimony is used to select as the preferred hypothesis of relationships the cladogram that requires the fewest implied character state transformations (or smallest weight, if characters are differentially weighted). Critics of the cladistic approach often observe that for some types of data, parsimony could produce the wrong results, regardless of how much data is collected (this is called statistical inconsistency, or long branch attraction). However, this criticism is also potentially true for any type of phylogenetic inference, unless the model used to estimate the tree reflects the way that evolution actually happened. Because this information is not empirically accessible, the criticism of statistical inconsistency against parsimony holds no force. For a book-length treatment of cladistic parsimony, see Elliott Sober's Reconstructing the Past: Parsimony, Evolution, and Inference (1988). For a discussion of both uses of Occam's razor in biology, see Sober's article "Let's Razor Ockham's Razor" (1990).
Other methods for inferring evolutionary relationships use parsimony in a more general way. Likelihood methods for phylogeny use parsimony as they do for all likelihood tests, with hypotheses requiring fewer differing parameters (i.e., numbers or different rates of character change or different frequencies of character state transitions) being treated as null hypotheses relative to hypotheses requiring more differing parameters. Thus, complex hypotheses must predict data much better than do simple hypotheses before researchers reject the simple hypotheses. Recent advances employ information theory, a close cousin of likelihood, which uses Occam's razor in the same way. The choice of the "shortest tree" relative to a not-so-short tree under any optimality criterion (smallest distance, fewest steps, or maximum likelihood) is always based on parsimony.
Francis Crick has commented on potential limitations of Occam's razor in biology. He advances the argument that because biological systems are the products of (an ongoing) natural selection, the mechanisms are not necessarily optimal in an obvious sense. He cautions: "While Ockham's razor is a useful tool in the physical sciences, it can be a very dangerous implement in biology. It is thus very rash to use simplicity and elegance as a guide in biological research." This is an ontological critique of parsimony.
In biogeography, parsimony is used to infer ancient vicariant events or migrations of species or populations by observing the geographic distribution and relationships of existing organisms. Given the phylogenetic tree, ancestral population subdivisions are inferred to be those that require the minimum amount of change.
Religion
In the philosophy of religion, Occam's razor is sometimes applied to the existence of God. William of Ockham himself was a Christian. He believed in God, and in the authority of Christian scripture; he writes that "nothing ought to be posited without a reason given, unless it is self-evident (literally, known through itself) or known by experience or proved by the authority of Sacred Scripture." Ockham believed that an explanation has no sufficient basis in reality when it does not harmonize with reason, experience, or the Bible. Unlike many theologians of his time, though, Ockham did not believe God could be logically proven with arguments. To Ockham, science was a matter of discovery; theology was a matter of revelation and faith. He states: "Only faith gives us access to theological truths. The ways of God are not open to reason, for God has freely chosen to create a world and establish a way of salvation within it apart from any necessary laws that human logic or rationality can uncover."
Thomas Aquinas, in the Summa Theologica, uses a formulation of Occam's razor to construct an objection to the idea that God exists, which he refutes directly with a counterargument:
<blockquote>Further, it is superfluous to suppose that what can be accounted for by a few principles has been produced by many. But it seems that everything we see in the world can be accounted for by other principles, supposing God did not exist. For all natural things can be reduced to one principle which is nature; and all voluntary things can be reduced to one principle which is human reason, or will. Therefore there is no need to suppose God's existence.</blockquote>
In turn, Aquinas answers this with the quinque viae, and addresses the particular objection above with the following answer:
<blockquote>Since nature works for a determinate end under the direction of a higher agent, whatever is done by nature must needs be traced back to God, as to its first cause. So also whatever is done voluntarily must also be traced back to some higher cause other than human reason or will, since these can change or fail; for all things that are changeable and capable of defect must be traced back to an immovable and self-necessary first principle, as was shown in the body of the Article.</blockquote>
Rather than argue for the necessity of a god, some theists base their belief upon grounds independent of, or prior to, reason, making Occam's razor irrelevant. This was the stance of Søren Kierkegaard, who viewed belief in God as a leap of faith that sometimes directly opposed reason. This is also the doctrine of Gordon Clark's presuppositional apologetics, with the exception that Clark never thought the leap of faith was contrary to reason (see also Fideism).
Various arguments in favor of God establish God as a useful or even necessary assumption. Contrastingly some anti-theists hold firmly to the belief that assuming the existence of God introduces unnecessary complexity (e.g., the Ultimate Boeing 747 gambit from Dawkins's The God Delusion).
Another application of the principle is to be found in the work of George Berkeley (1685–1753). Berkeley was an idealist who believed that all of reality could be explained in terms of the mind alone. He invoked Occam's razor against materialism, stating that matter was not required by his metaphysics and was thus eliminable. One potential problem with this belief is that it's possible, given Berkeley's position, to find solipsism itself more in line with the razor than a God-mediated world beyond a single thinker.
Occam's razor may also be recognized in the apocryphal story about an exchange between Pierre-Simon Laplace and Napoleon. It is said that in praising Laplace for one of his recent publications, the emperor asked how it was that the name of God, which featured so frequently in the writings of Lagrange, appeared nowhere in Laplace's. At that, he is said to have replied, "It's because I had no need of that hypothesis." Though some points of this story illustrate Laplace's atheism, more careful consideration suggests that he may instead have intended merely to illustrate the power of methodological naturalism, or even simply that the fewer logical premises one assumes, the stronger is one's conclusion.
Philosophy of mind
In his article "Sensations and Brain Processes" (1959), J. J. C. Smart invoked Occam's razor with the aim to justify his preference of the mind–brain identity theory over spirit–body dualism. Dualists state that there are two kinds of substances in the universe: physical (including the body) and spiritual, which is non-physical. In contrast, identity theorists state that everything is physical, including consciousness, and that there is nothing nonphysical. Though it is impossible to appreciate the spiritual when limiting oneself to the physical, Smart maintained that identity theory explains all phenomena by assuming only a physical reality. Subsequently, Smart has been severely criticized for his use (or misuse) of Occam's razor and ultimately retracted his advocacy of it in this context. Paul Churchland (1984) states that by itself Occam's razor is inconclusive regarding duality. In a similar way, Dale Jacquette (1994) stated that Occam's razor has been used in attempts to justify eliminativism and reductionism in the philosophy of mind. Eliminativism is the thesis that the ontology of folk psychology including such entities as "pain", "joy", "desire", "fear", etc., are eliminable in favor of an ontology of a completed neuroscience.
Penal ethics
In penal theory and the philosophy of punishment, parsimony refers specifically to taking care in the distribution of punishment in order to avoid excessive punishment. In the utilitarian approach to the philosophy of punishment, Jeremy Bentham's "parsimony principle" states that any punishment greater than is required to achieve its end is unjust. The concept is related but not identical to the legal concept of proportionality. Parsimony is a key consideration of the modern restorative justice, and is a component of utilitarian approaches to punishment, as well as the prison abolition movement. Bentham believed that true parsimony would require punishment to be individualised to take account of the sensibility of the individual – an individual more sensitive to punishment should be given a proportionately lesser one, since otherwise needless pain would be inflicted. Later utilitarian writers have tended to abandon this idea, in large part due to the impracticality of determining each alleged criminal's relative sensitivity to specific punishments.
Probability theory and statistics
Marcus Hutter's universal artificial intelligence builds upon Solomonoff's mathematical formalization of the razor to calculate the expected value of an action.
There are various papers in scholarly journals deriving formal versions of Occam's razor from probability theory, applying it in statistical inference, and using it to come up with criteria for penalizing complexity in statistical inference. Papers have suggested a connection between Occam's razor and Kolmogorov complexity.
One of the problems with the original formulation of the razor is that it only applies to models with the same explanatory power (i.e., it only tells us to prefer the simplest of equally good models). A more general form of the razor can be derived from Bayesian model comparison, which is based on Bayes factors and can be used to compare models that do not fit the observations equally well. These methods can sometimes optimally balance the complexity and power of a model. Generally, the exact Occam factor is intractable, but approximations such as Akaike information criterion, Bayesian information criterion, Variational Bayesian methods, false discovery rate, and Laplace's method are used. Many artificial intelligence researchers are now employing such techniques, for instance through work on Occam Learning or more generally on the Free energy principle.
Statistical versions of Occam's razor have a more rigorous formulation than what philosophical discussions produce. In particular, they must have a specific definition of the term simplicity, and that definition can vary. For example, in the Kolmogorov–Chaitin minimum description length approach, the subject must pick a Turing machine whose operations describe the basic operations believed to represent "simplicity" by the subject. However, one could always choose a Turing machine with a simple operation that happened to construct one's entire theory and would hence score highly under the razor. This has led to two opposing camps: one that believes Occam's razor is objective, and one that believes it is subjective.
Objective razor
The minimum instruction set of a universal Turing machine requires approximately the same length description across different formulations, and is small compared to the Kolmogorov complexity of most practical theories. Marcus Hutter has used this consistency to define a "natural" Turing machine of small size as the proper basis for excluding arbitrarily complex instruction sets in the formulation of razors. Describing the program for the universal program as the "hypothesis", and the representation of the evidence as program data, it has been formally proven under Zermelo–Fraenkel set theory that "the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized." Interpreting this as minimising the total length of a two-part message encoding model followed by data given model gives us the minimum message length (MML) principle.
According to Jürgen Schmidhuber, the appropriate mathematical theory of Occam's razor already exists, namely, Solomonoff's theory of optimal inductive inference and its extensions. See discussions in David L. Dowe's "Foreword re C. S. Wallace" for the subtle distinctions between the algorithmic probability work of Solomonoff and the MML work of Chris Wallace, and see Dowe's "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness" both for such discussions and for (in section 4) discussions of MML and Occam's razor. For a specific example of MML as Occam's razor in the problem of decision tree induction, see Dowe and Needham's "Message Length as an Effective Ockham's Razor in Decision Tree Induction".
Mathematical arguments against Occam's razor
The no free lunch (NFL) theorems for inductive inference prove that Occam's razor must rely on ultimately arbitrary assumptions concerning the prior probability distribution found in our world. Specifically, suppose one is given two inductive inference algorithms, A and B, where A is a Bayesian procedure based on the choice of some prior distribution motivated by Occam's razor (e.g., the prior might favor hypotheses with smaller Kolmogorov complexity). Suppose that B is the anti-Bayes procedure, which calculates what the Bayesian algorithm A based on Occam's razor will predict – and then predicts the exact opposite. Then there are just as many actual priors (including those different from the Occam's razor prior assumed by A) in which algorithm B outperforms A as priors in which the procedure A based on Occam's razor comes out on top. In particular, the NFL theorems show that the "Occam factors" Bayesian argument for Occam's razor must make ultimately arbitrary modeling assumptions.
Software development
In software development, the rule of least power argues the correct programming language to use is the one that is simplest while also solving the targeted software problem. In that form the rule is often credited to Tim Berners-Lee since it appeared in his design guidelines for the original Hypertext Transfer Protocol. Complexity in this context is measured either by placing a language into the Chomsky hierarchy or by listing idiomatic features of the language and comparing according to some agreed to scale of difficulties between idioms. Many languages once thought to be of lower complexity have evolved or later been discovered to be more complex than originally intended; so, in practice this rule is applied to the relative ease of a programmer to obtain the power of the language, rather than the precise theoretical limits of the language.
Artificial intelligence
Scientists have discovered that deep neural networks (DNN) prefer simpler mathematical functions while learning. This simplicity bias enables DNNs to overcome overfitting – a scenario where the model gets overwhelmed with noise due to the presence of too many parameters.
Controversial aspects
Occam's razor is not an embargo against the positing of any kind of entity, or a recommendation of the simplest theory come what may. Occam's razor is used to adjudicate between theories that have already passed "theoretical scrutiny" tests and are equally well-supported by evidence.
