Corecursion

In computer science, corecursion is a type of operation that is dual to (structural) recursion. Whereas recursion consumes a data structure by first handling the topmost layer before descending into its inner parts, corecursion produces a data structure by first defining the topmost layer before defining its inner parts. Corecursion is a particularly important in total languages, as it allows encoding potentially non-terminating computations in a context where every function must terminate. It is supported by theorem provers Agda and Rocq.

Both corecursion and recursion can be thought of as operating on trees, which include data structures like lists and streams as special cases. Since recursion must terminate, it only works on trees that are well-founded, i.e. not infinitely deep, which are called data or initial data types; on the other hand, corecursion produces codata or final data types, which includes infinitely deep trees. Codata cannot be represented in memory directly, so is often implemented using self-referential data structures or lazy evaluation.

Data and codata

In total programming languages, the natural numbers may be defined as follows (using Haskell syntax):

This states that every natural number is either zero or the successor of an existing natural number. For example, the number one is represented as <code>Succ Zero</code>, two as <code>Succ (Succ Zero)</code>, three as <code>Succ (Succ (Succ Zero))</code> and so on.

If we interpret the above declaration in an inductive way, then all natural numbers are generated like this, and we get our familiar set of natural numbers. Importantly, we can do recursion on this set: for example, we can use it to make a list that repeats a value <code>n</code> times:

repeat :: a -> Nat -> [a]

repeat x Zero = [] -- Base case

repeat x (Succ n) = x : repeat x n -- Inductive step

</syntaxhighlight>

Note that the use of <code>repeat x n</code> is crucial—we could not have written <code>repeat x (Succ n)</code>, because that is the function we are trying to define, and therefore it would loop forever. More specifically, the input must get smaller—or rather deeper, if seen as a data structure—each time we recurse.

The declaration may also be interpreted in a coinductive way, which may be denoted as:

codata CoNat = Zero | Succ CoNat

</syntaxhighlight>

<code>CoNat</code> has all the natural numbers that <code>Nat</code> has automatically. But because codata can be infinitely deep, it has an extra term <code>Succ (Succ (Succ (Succ …)))</code> that continues on indefinitely—often denoted ∞. This is unique to the conatural numbers, so it can only be constructed using corecursion:

infinity :: CoNat

infinity = Succ infinity

</syntaxhighlight>

While recursion requires that the input of the recursive call must get smaller each time, corecursion requires that the output of the recursive call must get larger each time. This is why we must write <code>Succ infinity</code>; just <code>infinity = infinity</code> would be disallowed as it loops forever.

To illustrate the duality between recursion and corecursion, we can encapsulate them in two functions, <code>rec</code> and <code>corec</code>. These functions' existence, together with the regular two constructors of <code>Nat</code> and <code>CoNat</code>, uniquely identify their respective types, and therefore allow rewriting all forms of recursion and corecursion in terms of them.

Mathematical description

The above section showed that there is an intimate link between natural and conatural numbers and <code>Maybe c</code>, and similarly between binary trees and <code>Either a (c, c)</code>. This link is made precise by showing that <code>Nat</code> is in bijection with <code>Maybe Nat</code>, and <code>CoNat</code> with <code>Maybe CoNat</code>; explicitly, this bjiection associates <code>Zero</code> with <code>Nothing</code> and <code>Succ n</code> with <code>Just n</code>. In other words, <code>Nat</code> and <code>CoNat</code> are fixed points (up to isomorphism) of the map <math>X \mapsto X + 1</math>.

The difference between data and codata is that data is the least such fixed point, while codata is the greatest. In this way, recursion can be summarized as the statement that "every element of the type was generated only by the given constructors (<code>Zero</code> and <code>Succ</code> in the case of <code>Nat</code>/<code>CoNat</code>)", while corecursion states that "every value that can be analysed using the constructors as cases exists".

From a category-theoretic perspective, <code>CoNat</code> can be precisely defined as the final coalgebra of the endofunctor<math>X \mapsto X + 1</math>. Unfolding this definition, we have that:

There is a function <code>pred :: CoNat -> Maybe CoNat</code>. "Pred" stands for "predecessor", and the intuition is that it subtracts one from the conatural number or gives <code>Nothing</code> otherwise. In other words, <code>pred Zero = Nothing</code> and <code>pred (Succ n) = Just n</code>. This shows that <code>CoNat</code> is a coalgebra of the functor, but not necessarily the final one.
For every type <code>c</code> and function <code>f :: c -> Maybe c</code>, there is a function <code>corec f :: c -> CoNat</code> such that <code>pred (corec f x) = fmap (corec f) (f x)</code>.<br/><br/>Breaking this down further, if <code>f x = Just y</code> then <code>corec f x = Succ (corec f y)</code>, while if <code>f x = Nothing</code> then <code>corec f x = Zero</code>—this matches the definition of <code>corec</code> we are familiar with.
<code>corec</code> is unique in the following sense: for any other function <code>g :: c -> CoNat</code> that also satisfies <code>pred (g x) = fmap g (f x)</code>, <code>g x = corec f x</code>. This ensures that <code>CoNat</code> is a final coalgebra.

<code>Maybe c</code> can be replaced with any other functor in the above description to obtain the definition of any other coinductive type.

Not all functors have final coalgebras. For example, Cantor's theorem tells us that no set can be in bijection with its powerset, and therefore the powerset functor <code>a -> Bool</code> has no final coalgebra. However, in the case of polynomial functors or quotient polynomial functors, final coalgebras always exist; polynomial functors are functors that can be expressed in the form <math>\textstyle X \mapsto \sum_{a : \alpha} X^{\beta(a)}</math> for a type α and α-indexed type family β(a). The M-type is exactly the construction of this final coalgebra.

In programming languages

If the domain of discourse is the category of sets and total functions (such as in theorem provers like Agda and Rocq), then final types (codata) may contain infinite, non-wellfounded values, whereas initial types (data) do not. On the other hand, if the domain of discourse is the category of complete partial orders and continuous functions, which corresponds roughly to the Haskell programming language, then final types coincide with initial types, and the corresponding final coalgebra and initial algebra form an isomorphism.

History

Corecursion, referred to as circular programming, dates at least to , who credits John Hughes and Philip Wadler; more general forms were developed in . The original motivations included producing more efficient algorithms (allowing a single pass over data in some cases, instead of requiring multiple passes) and implementing classical data structures, such as doubly linked lists and queues, in functional languages.

Data and codata

Mathematical description

In programming languages

History

See also

Notes

References