Scheme (programming language)

Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It was the first dialect of Lisp to choose lexical scope and the first to require implementations to perform tail-call optimization, giving stronger support for functional programming and associated techniques such as recursive algorithms. It was also one of the first programming languages to support first-class continuations. It had a significant influence on the effort that led to the development of Common Lisp.

The Scheme language is standardized in the official Institute of Electrical and Electronics Engineers (IEEE) standard <!--

-->and a de facto standard called the Revised Report on the Algorithmic Language Scheme (RnRS). A widely implemented standard is R5RS (1998). <!--

-->The most recently ratified standard of Scheme<!--

--> is "R7RS-small" (2013). The more expansive and modular R6RS was ratified in 2007. Both trace their descent from R5RS; the timeline below reflects the chronological order of ratification.<!--

-->

History

Origins

Scheme started in the 1970s as an attempt to understand Carl Hewitt's Actor model, for which purpose Steele and Sussman wrote a "tiny Lisp interpreter" using Maclisp and then "added mechanisms for creating actors and sending messages". Scheme was originally called "Schemer", in the tradition of other Lisp-derived languages such as Planner or Conniver. The current name resulted from the authors' use of the ITS operating system, which limited filenames to two components of at most six characters each. Currently, "Schemer" is commonly used to refer to a Scheme programmer.

R6RS

A new language standardization process began at the 2003 Scheme workshop, with the goal of producing an R6RS standard in 2006. This process broke with the earlier RnRS approach of unanimity.

R6RS features a standard module system, allowing a split between the core language and libraries. Several drafts of the R6RS specification were released, the final version being R5.97RS. A successful vote resulted in ratifying the new standard, announced on August 28, 2007. support the R6RS standard. There is a portable reference implementation of the proposed implicitly phased libraries for R6RS, called psyntax, which loads and bootstraps itself properly on various older Scheme implementations.

A feature of R6RS is the record-type descriptor (RTD). When an RTD is created and used, the record type representation can show the memory layout. It also calculated object field bit mask and mutable Scheme object field bit masks, and helped the garbage collector know what to do with the fields without traversing the whole fields list that are saved in the RTD. RTD allows users to expand the basic RTD to create a new record system.

R6RS introduces numerous significant changes to the language. The source code is now specified in Unicode, and a large subset of Unicode characters may now appear in Scheme symbols and identifiers, and there are other minor changes to the lexical rules. Character data is also now specified in Unicode. Many standard procedures have been moved to the new standard libraries, which themselves form a large expansion of the standard, containing procedures and syntactic forms that were formerly not part of the standard. A new module system has been introduced, and systems for exception handling are now standardized. Syntax-rules has been replaced with a more expressive syntactic abstraction facility (syntax-case) which allows the use of all of Scheme at macro expansion time. Compliant implementations are now required to support Scheme's full numeric tower, and the semantics of numbers have been expanded, mainly in the direction of support for the IEEE 754 standard for floating point numerical representation.

R7RS

The R6RS standard has caused controversy because some see it as a departure from the minimalist philosophy. In August 2009, the Scheme Steering Committee, which oversees the standardization process, announced its intention to recommend splitting Scheme into two languages: a large modern programming language for programmers; and a small version, a subset of the large version retaining the minimalism praised by educators and casual implementors. Two working groups were created to work on these two new versions of Scheme. The Scheme Reports Process site has links to the working groups' charters, public discussions and issue tracking system.

The ninth draft of R7RS (small language) was made available on April 15, 2013. A vote ratifying this draft closed on May 20, 2013, and the final report has been available since August 6, 2013, describing "the 'small' language of that effort: therefore it cannot be considered in isolation as the successor to R6RS". This ease is attributable to the use of lambda calculus to derive much of the syntax of the language from more primitive forms. For instance of the 23 s-expression-based syntactic constructs defined in the R5RS Scheme standard, 14 are classed as derived or library forms, which can be written as macros involving more fundamental forms, principally lambda. As R5RS (§3.1) says: "The most fundamental of the variable binding constructs is the lambda expression, because all other variable binding constructs can be explained in terms of lambda expressions." where they adopted the concept of the lexical closure (on page 21), which had been described in an AI Memo in 1970 by Joel Moses, who attributed the idea to Peter J. Landin.<!--

-->

Lambda calculus

Alonzo Church's mathematical notation, the lambda calculus, has inspired Lisp's use of "lambda" as a keyword for introducing a procedure, as well as influencing the development of functional programming techniques involving the use of higher-order functions in Lisp. But early Lisps were not suitable expressions of the lambda calculus because of their treatment of free variables. The function of lambda calculation includes: First, serve as a starting point of powerful mathematical logic. Second, it can reduce the requirement of programmers to consider the implementation details, because it can be used to imitate machine evaluation. Finally, the lambda calculation created a substantial meta-theory.

The introduction of lexical scope resolved the problem by making an equivalence between some forms of lambda notation and their practical expression in a working programming language. Sussman and Steele showed that the new language could be used to elegantly derive all the imperative and declarative semantics of other programming languages including ALGOL and Fortran, and the dynamic scope of other Lisps, by using lambda expressions not as simple procedure instantiations but as "control structures and environment modifiers". They introduced continuation-passing style along with their first description of Scheme in the first of the Lambda Papers, and in subsequent papers, they proceeded to demonstrate the raw power of this practical use of lambda calculus.

Block structure

Scheme inherits its block structure from earlier block structured languages, particularly ALGOL. In Scheme, blocks are implemented by three binding constructs: <code>let</code>, <code>let*</code> and <code>letrec</code>. For instance, the following construct creates a block in which a symbol called <code>var</code> is bound to the number 10:

(define var "goose")

;; Any reference to var here will be bound to "goose"

(let ((var 10))

;; statements go here. Any reference to var here will be bound to 10.

)

;; Any reference to var here will be bound to "goose"

</syntaxhighlight>

Blocks can be nested to create arbitrarily complex block structures according to the need of the programmer. The use of block structuring to create local bindings alleviates the risk of namespace collision that can otherwise occur.

One variant of <code>let</code>, <code>let*</code>, permits bindings to refer to variables defined earlier in the same construct, thus:

(let* ((var1 10)

(var2 (+ var1 12)))

;; But the definition of var1 could not refer to var2

)

</syntaxhighlight>

The other variant, <code>letrec</code>, is designed to enable mutually recursive procedures to be bound to one another.

;; Calculation of Hofstadter's male and female sequences as a list of pairs

(define (hofstadter-male-female n)

(letrec ((female (lambda (n)

(if (= n 0)

(- n (male (female (- n 1)))))))

(male (lambda (n)

(if (= n 0)

(- n (female (male (- n 1))))))))

(let loop ((i 0))

(if (> i n)

'()

(cons (cons (female i)

(male i))

(loop (+ i 1)))))))

(hofstadter-male-female 8)

===> ((1 . 0) (1 . 0) (2 . 1) (2 . 2) (3 . 2) (3 . 3) (4 . 4) (5 . 4) (5 . 5))

</syntaxhighlight>

(See Hofstadter's male and female sequences for the definitions used in this example.)

All procedures bound in a single <code>letrec</code> may refer to one another by name, as well as to values of variables defined earlier in the same <code>letrec</code>, but they may not refer to values defined later in the same <code>letrec</code>.

A variant of <code>let</code>, the "named let" form, has an identifier after the <code>let</code> keyword. This binds the let variables to the argument of a procedure whose name is the given identifier and whose body is the body of the let form. The body may be repeated as desired by calling the procedure. The named let is widely used to implement iteration.

Example: a simple counter

(let loop ((n 1))

(if (> n 10)

'()

(cons n

(loop (+ n 1)))))

===> (1 2 3 4 5 6 7 8 9 10)

</syntaxhighlight>

Like any procedure in Scheme, the procedure created in the named let is a first-class object.

Proper tail recursion

Scheme has an iteration construct, <code>do</code>, but it is more idiomatic in Scheme to use tail recursion to express iteration. Standard-conforming Scheme implementations are required to optimize tail calls so as to support an unbounded number of active tail calls (R5RS sec. 3.5)

In Scheme, the same primitives that are used to manipulate and bind data can be used to bind procedures. There is no equivalent of Common Lisp's <code>defun</code> and <code>#'</code> primitives.

;; Variable bound to a number:

(define f 10)

===> 10

;; Mutation (altering the bound value)

(set! f (+ f f 6))

===> 26

;; Assigning a procedure to the same variable:

(set! f (lambda (n) (+ n 12)))

(f 6)

===> 18

;; Assigning the result of an expression to the same variable:

(set! f (f 1))

===> 13

;; functional programming:

(apply + '(1 2 3 4 5 6))

===> 21

(set! f (lambda (n) (+ n 100)))

(map f '(1 2 3))

===> (101 102 103)

</syntaxhighlight>

Implementation standards

This subsection documents design decisions that have been taken over the years which have given Scheme a particular character, but are not the direct outcomes of the original design.

Numerical tower

Scheme specifies a comparatively full set of numerical datatypes including complex and rational types, which is known in Scheme as the numerical tower (R5RS sec. 6.2

In the R6RS standard, these are no longer primitives, but instead, are provided as part of the R5RS compatibility library (rnrs r5rs (6)).

In R5RS, a suggested implementation of <code>delay</code> and <code>force</code> is given, implementing the promise as a procedure with no arguments (a thunk) and using memoization to ensure that it is only ever evaluated once, irrespective of the number of times <code>force</code> is called (R5RS sec. 6.4).

Implementations of the hygienic macro system, also called <code>syntax-rules</code>, are required to respect the lexical scoping of the rest of the language. This is assured by special naming and scoping rules for macro expansion and avoids common programming errors that can occur in the macro systems of other programming languages. R6RS specifies a more sophisticated transformation system, <code>syntax-case</code>, which has been available as a language extension to R5RS Scheme for some time.

;; Define a macro to implement a variant of "if" with a multi-expression

;; true branch and no false branch.

(define-syntax when

(syntax-rules ()

((when pred exp exps ...)

(if pred (begin exp exps ...)))))

</syntaxhighlight>

Invocations of macros and procedures bear a close resemblance—both are s-expressions—but they are treated differently. When the compiler encounters an s-expression in the program, it first checks to see if the symbol is defined as a syntactic keyword within the current lexical scope. If so, it then attempts to expand the macro, treating the items in the tail of the s-expression as arguments without compiling code to evaluate them, and this process is repeated recursively until no macro invocations remain. If it is not a syntactic keyword, the compiler compiles code to evaluate the arguments in the tail of the s-expression and then to evaluate the variable represented by the symbol at the head of the s-expression and call it as a procedure with the evaluated tail expressions passed as arguments to it.

Most Scheme implementations also provide additional macro systems. Among popular ones are syntactic closures, explicit renaming macros and <code>define-macro</code>, a non-hygienic macro system similar to <code>defmacro</code> system provided in Common Lisp.

The inability to specify whether or not a macro is hygienic is one of the shortcomings of the macro system. Alternative models for expansion such as scope sets provide a potential solution.

Environments and eval

Prior to R5RS, Scheme had no standard equivalent of the <code>eval</code> procedure which is ubiquitous in other Lisps, although the first Lambda Paper had described <code>evaluate</code> as "similar to the LISP function EVAL"

(let ((name '+))

(let ((+ *))

(evaluate (list name 2 3))))

</syntaxhighlight>

If it is evaluated in the outer environment, where <code>name</code> is defined, the result is the sum of the operands. If it is evaluated in the inner environment, where the symbol "+" has been bound to the value of the procedure "*", the result is the product of the two operands.

R5RS resolves this confusion by specifying three procedures that return environments and providing a procedure <code>eval</code> that takes an s-expression and an environment and evaluates the expression in the environment provided. (R5RS sec. 6.5)) and a multiline comment or "block comment" may be produced by surrounding text with <code><nowiki>#</nowiki>|</code> and <code>|#</code>.

Input/output

Scheme's input and output is based on the port datatype. (R5RS sec 6.6) The R6RS standard specifies much more sophisticated and capable port procedures and many new types of port.

The following examples are written in strict R5RS Scheme.

Example 1: With output defaulting to (current-output-port):

(let ((hello0 (lambda() (display "Hello world") (newline))))

(hello0))

</syntaxhighlight>

Example 2: As 1, but using optional port argument to output procedures

(let ((hello1 (lambda (p) (display "Hello world" p) (newline p))))

(hello1 (current-output-port)))

</syntaxhighlight>

Example 3: As 1, but output is redirected to a newly created file

;; NB: with-output-to-file is an optional procedure in R5RS

(let ((hello0 (lambda () (display "Hello world") (newline))))

(with-output-to-file "helloworldoutputfile" hello0))

</syntaxhighlight>

Example 4: As 2, but with explicit file open and port close to send output to file

(let ((hello1 (lambda (p) (display "Hello world" p) (newline p)))

(output-port (open-output-file "helloworldoutputfile")))

(hello1 output-port)

(close-output-port output-port))

</syntaxhighlight>

Example 5: As 2, but with using call-with-output-file to send output to a file.

(let ((hello1 (lambda (p) (display "Hello world" p) (newline p))))

(call-with-output-file "helloworldoutputfile" hello1))

</syntaxhighlight>

Similar procedures are provided for input. R5RS Scheme provides the predicates <code>input-port?</code> and <code>output-port?</code>. For character input and output, <code>write-char</code>, <code>read-char</code>, <code>peek-char</code> and <code>char-ready?</code> are provided. For writing and reading Scheme expressions, Scheme provides <code>read</code> and <code>write</code>. On a read operation, the result returned is the end-of-file object if the input port has reached the end of the file, and this can be tested using the predicate <code>eof-object?</code>.

With the standard, SRFI 28 also defines a basic formatting procedure resembling Common Lisp's <code>format</code> function, after which it is named.

Redefinition of standard procedures

In Scheme, procedures are bound to variables. At R5RS the language standard formally mandated that programs may change the variable bindings of built-in procedures, effectively redefining them. (R5RS "Language changes")

Review of standard forms and procedures

The language is formally defined in the standards R5RS (1998)

0: feature-based conditional expansion construct
1: list library
4: homogeneous numeric vector datatypes
6: basic string ports
8: receive, binding to multiple values
9: defining record types
13: string library
14: character-set library
16: syntax for procedures of variable arity
17: generalized set!
18: Multithreading support
19: time data types and procedures
25: multi-dimensional array primitives
26: notation for specializing parameters without currying
27: sources of random bits
28: basic format strings
29: localization
30: nested multi-line comments
31: a special form for recursive evaluation
37: args-fold: a program argument processor
39: parameter objects
41: streams
42: eager comprehensions
43: vector library
45: primitives for expressing iterative lazy algorithms
60: integers as bits
61: a more general cond clause
66: octet vectors
67: compare procedures

Implementations

The elegant, minimalist design has made Scheme a popular target for language designers, hobbyists, and educators, and because of its small size, that of a typical interpreter, it is also a popular choice for embedded systems and scripting. This has resulted in scores of implementations, most of which differ from each other so much that porting programs from one implementation to another is quite difficult, and the small size of the standard language means that writing a useful program of any great complexity in standard, portable Scheme is almost impossible. schools; in particular, several introductory computer science courses use Scheme in conjunction with the textbook Structure and Interpretation of Computer Programs (SICP). For the past 12 years, PLT has run the ProgramByDesign (formerly TeachScheme!) project, which has exposed close to 600 high school teachers and thousands of high school students to rudimentary Scheme programming. MIT's old introductory programming class 6.001 was taught in Scheme, Although 6.001 has been replaced by more modern courses, SICP continues to be taught at MIT. Likewise, the introductory class at UC Berkeley, CS 61A, was until 2011 taught entirely in Scheme, save minor diversions into Logo to demonstrate dynamic scope. Today, like MIT, Berkeley has replaced the syllabus with a more modern version that is primarily taught in Python 3, but the current syllabus is still based on the old curriculum, and parts of the class are still taught in Scheme.

The textbook How to Design Programs is used by some institutes of higher education for their introductory computer science courses. Worcester Polytechnic Institute uses Scheme exclusively for its introductory course Introduction to Program Design (CS1101). Rose-Hulman Institute of Technology uses Scheme in its more advanced Programming Language Concepts course. Brandeis University's core course, Structure and Interpretations of Computer Programs (COSI121b), is also taught exclusively in Scheme by theoretical computer scientist Harry Mairson. Indiana University's introductory class, C211, is taught entirely in Scheme. A self-paced version of the course, CS 61AS, continues to use Scheme. The introductory computer science courses at Yale and Grinnell College are also taught in Scheme.

The former introductory computer science course at the University of Minnesota - Twin Cities, CSCI 1901, also used Scheme as its primary language, followed by a course that introduced students to the Java language; however, following the example of MIT, the department replaced 1901 with the Python-based CSCI 1133, while functional programming is covered in detail in the third-semester course CSCI 2041.

Scheme is/was also used for the following:

The Document Style Semantics and Specification Language (DSSSL), which provides a method of specifying SGML stylesheets, uses a Scheme subset.
The well-known open source raster graphics editor GIMP uses TinyScheme as a scripting language.
Guile has been adopted by the GNU project as its official scripting language, and that implementation of Scheme is embedded in such applications as GNU LilyPond and GnuCash as a scripting language for extensions. Likewise, Guile used to be the scripting language for the desktop environment GNOME, and GNOME still has a project that provides Guile bindings to its library stack. There is a project to incorporate Guile into GNU Emacs, GNU's flagship program, replacing the current Emacs Lisp interpreter.
Elk Scheme is used by Synopsys as a scripting language for its technology CAD (TCAD) tools.
Shiro Kawai, senior programmer on the movie Final Fantasy: The Spirits Within, used Scheme as a scripting language for managing the real-time rendering engine.
Google App Inventor for Android uses Scheme, where Kawa is used to compile the Scheme code down to bytecodes for the Java virtual machine running on Android devices.

References

External links

scheme.org provides links to many Scheme resources, including the specifications
Introduction to Scheme
Scheme Weekly
Bookmarklet that add Interactive Scheme REPL to any website

Scheme (programming language)

History

Origins

R6RS

R7RS

Lambda calculus

Block structure

Proper tail recursion

Implementation standards

Numerical tower

Environments and eval

Input/output

Redefinition of standard procedures

Review of standard forms and procedures

Implementations

See also

References

Further reading

External links