thumb|Literate Programming by [[Donald Knuth is the seminal book on literate programming.]]

Literate programming (LP) is a programming paradigm introduced in 1984 by Donald Knuth in which a computer program is given as an explanation of how it works in a natural language, such as English, interspersed (embedded) with snippets of macros and traditional source code, from which compilable source code can be generated. The approach is used in scientific computing and in data science routinely for reproducible research and open access purposes. Literate programming tools are used by millions of programmers today.

The literate programming paradigm, as conceived by Donald Knuth, represents a move away from writing computer programs in the manner and order imposed by the compiler, and instead gives programmers macros to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written as an exposition of logic in more natural language in which macros are used to hide abstractions and traditional source code, more like the text of an essay.

Literate programming tools are used to obtain two representations from a source file: one understandable by a compiler or interpreter, the "tangled" code, and another for viewing as formatted documentation, which is said to be "woven" from the literate source. While the first generation of literate programming tools were computer language-specific, the later ones are language-agnostic and exist beyond the individual programming languages.

History and philosophy

Literate programming was first introduced in 1984 by Donald Knuth, who intended it to create programs that were suitable literature for human beings. He implemented it at Stanford University as a part of his research on algorithms and digital typography. The implementation was called "WEB" since he believed that it was one of the few three-letter words of English that had not yet been applied to computing. However, it resembles the complicated nature of software delicately pieced together from simple materials. to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.

Advantages

According to Knuth,

literate programming provides higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program's creation. The resulting documentation allows the author to restart their own thought processes at any later time, and allows other programmers to understand the construction of the program more easily. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs, is proven by an edition of TeX code as a literate program.

Contrast with documentation generation

Literate programming is very often misunderstood to refer only to formatted documentation produced from a common file with both source code and comments – which is properly called documentation generation – or to voluminous commentaries included with code. This is the converse of literate programming: well-documented code or documentation extracted from code follows the structure of the code, with documentation embedded in the code; while in literate programming, code is embedded in documentation, with the code following the structure of the documentation.

This misconception has led to claims that comment-extraction tools, such as the Perl Plain Old Documentation or Java Javadoc systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.

Workflow

Implementing literate programming consists of two steps:

  1. Weaving: Generating a comprehensive document about the program and its maintenance.
  2. Tangling: Generating machine executable code

Weaving and tangling are done on the same source so that they are consistent with each other.

Example

A classic example of literate programming is the literate implementation of the standard Unix <code>wc</code> word counting program. Knuth presented a CWEB version of this example in Chapter 12 of his Literate Programming book. The same example was later rewritten for the noweb literate programming tool. This example provides a good illustration of the basic elements of literate programming.

Creation of macros

The following snippet of the <code>wc</code> literate program

Remarkable examples

  • TeX and METAFONT, Knuth's respective typesetting and font description languages. Written in WEB, these were literate programming's proofs of concept.
  • Physically Based Rendering "describes both the mathematical theory behind a modern photorealistic rendering system and its practical implementation." This book won an Academy Award.
  • Understanding MP3, a complete implementation of MPEG bit streams that is also an excellent tutorial. (See also Ruckert's literate programs related to HINT.)
  • Principia Softwarica, a literate version of Plan 9 from Bell Labs.
  • The Stanford GraphBase, 30 short CWEB essays from Knuth that define a combinatorial computing platform, describe state-of-the-art algorithms and data structures, and provide examples of their use.
  • MMIXware, Knuth's software to support MMIX programming and run simulations on different architectures.
  • C Interfaces and Implementations, a demonstration of the titular design methodology in 24 examples.
  • A Retargetable C Compiler, describes the little C compiler LCC.
  • Data Structures and Algorithms in C++: Pocket Primer, a concise pedagogic tour.
  • Inform, a language and system for writing interactive fiction. One of the largest literate programs to date, Inform is written in inweb (itself a remarkable example).
  • Axiom, which is evolved from scratchpad, a computer algebra system developed by IBM. It is now being developed by Tim Daly, one of the developers of scratchpad, Axiom is totally written as a literate program.

Literate programming practices

The first published literate programming environment was WEB, introduced by Knuth in 1981 for his TeX typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting. Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of Pierre-Arnoul de Marneffe. The free CWEB, written by Knuth and Silvio Levy, is WEB adapted for C and C++, runs on most operating systems, and can produce TeX and PDF documentation.

There are various other implementations of the literate programming concept as given below. Many of the newer among these do not have macros and hence do not comply with the order of human logic principle, which makes them perhaps "semi-literate" tools. These, however, allow cellular execution of code which makes them more along the lines of exploratory programming tools.

{| class="wikitable sortable" style="text-align: center;"

! Name

! Supported languages

! Written in

! Markup language

! Macros & custom order

! Cellular execution

! Comments

|-

| WEB || Pascal || Pascal || TeX

|

||| The first published literate programming environment.

|-

| CWEB || C++ and C || C || TeX

|

||| Is WEB adapted for C and C++.

|-

| NoWEB || || C, AWK, and Icon || LaTeX, TeX, HTML and troff

|

||| It is well known for its simplicity and for allowing text formatting in HTML rather than going through the TeX system.

|-

| Emacs org-mode || || Emacs Lisp || Plain text

|

| || Requires Babel, which allows embedding blocks of source code from multiple programming languages within one text document. Blocks of code can share data with each other, display images inline, or be parsed into pure source code using the noweb reference syntax.

|-

| CoffeeScript || CoffeeScript || CoffeeScript, JavaScript || Markdown

|

| || CoffeeScript supports a "literate" mode, which enables programs to be compiled from a source document written in Markdown with indented blocks of code.

|-

| Maple worksheets || Maple (software) || || XML

|

| || Maple worksheets are a platform-agnostic literate programming environment that combines text and graphics with live code for symbolic computation.

|-

| Wolfram Notebooks || Wolfram Language || || Wolfram Language

|

| || Wolfram notebooks are a platform-agnostic literate programming method that combines text and graphics with live code.

|-

| Jupyter Notebook, formerly IPython Notebook || Python and any with a Jupyter Kernel || || JSON format Specification for ipynb

|

||| Works in the format of notebooks, which combine headings, text (including LaTeX), plots, etc. with the written code.

|-

|nbdev

|Python and Jupyter Notebook

|

|<code>nbdev</code> is a library that allows developing a python library in Jupyter Notebooks, putting all code, tests and documentation in one place.

|

|

|-

| Julia || || ||

| Pluto.jl is a reactive notebook environment allowing custom order. But web-like macros aren't supported.

||| Supports the iJulia mode of development which was inspired by iPython.

|-

| Agda || || ||

|

| || Supports a limited form of literate programming out of the box.

|-

| Sweave || R || || PDF

|

| ||

|-

| Knitr || R || || LaTeX, PDF, LyX, HTML, Markdown, AsciiDoc, and reStructuredText

|

| ||

|-

| Literate || || D || Markdown

|

||| Supports TeX equations. Compatible with Vim.

|}

Other useful tools include:

<!--End bulleted list.-->

See also

  • Documentation generator – the inverse on literate programming where documentation is embedded in and generated from source code
  • Notebook interface – virtual notebook environment used for literate programming
  • Sweave and Knitr – examples of use of the "noweb"-like Literate Programming tool inside the R language for creation of dynamic statistical reports
  • Self-documenting code – source code that can be easily understood without documentation
  • Vibe coding

References

Further reading

  • (includes software)
  • LiterateProgramming at WikiWikiWeb
  • Literate Programming FAQ at CTAN

<!-- Hidden categories below -->