Foundations of Functional Programming/Untyped λ-calculus

The untyped λ-calculus is a λ-calculus which does not include any notion of types. It is a bare-bones model of computation, based entirely on the notion of functions. It is Turing complete.

In the untyped λ-calculus (henceforth just λ-calculus), functions are the only form of data. Functions take functions as arguments, and return functions. The functions in λ-calculus are pure functions, which compute an output from an input without producing side effects.

Syntax
In the λ-calculus there are three kinds of expressions: variables, function applications, and λ-abstractions. We use upper case letters $$A,B,C,...$$ to stand for expressions. We write variables as lower case letters $$a,b,c,...$$. The application of the function $$A$$ to the argument $$B$$ is written $$AB$$. A λ-abstraction is an expression of the form $$\lambda x. A$$, which represents the function which takes the argument $$x$$ and returns the result $$A$$ (where $$A$$ is an expression that might involve $$x$$).

Here is a BNF grammar for the λ-calculus:

$$ \begin{array}{rcl} \text{Var} &::=& a\ |\ b\ |\ c\ |\ \cdots \\ \text{Expr} &::=& \text{Var} \\ &|&  \text{Expr}\ \text{Expr} \\ &|&  \lambda\ \text{Var}\ .\ \text{Expr} \end{array} $$

One may add parentheses to expressions to disambiguate scope. $$\lambda$$ binds everything after it, and application is left associative. We use $$\lambda x_1,...,x_n. A$$ to abbreviate $$\lambda x_1 .\lambda x_2 .\cdots \lambda x_n. A$$. Multi-argument functions are curried.

Variables and substitution
We say that a given occurrence of a variable $$x$$ in an expression $$A$$ is a "free occurrence" if $$x$$ is not in the scope of a λ-abstraction whose argument is named $$x$$. Otherwise, it is a "bound occurrence."

A bound variable $$x$$ binds to the innermost λ-abstraction over $$x$$ which surrounds it, if there are multiple such λ-abstractions. For example, in $$\lambda x \lambda y \lambda x. xy$$, the $$x$$ in $$xy$$ binds to the innermost λ-abstraction, rather than the outermost one.

One can rename bound variables in an expression without changing the meaning of the expression. The process of renaming bound variables is sometimes called α-conversion. We will usually regard expressions which can be produced from each other by α-conversion as being the same expression; this is convenient for mathematical exposition, making certain theorems simpler to state, for example.

Renaming free variables in an expression does, in general, change its meaning. To see why this makes sense, consider that in a programming language, while you can rename an argument to a function without changing what the function does, you cannot replace the name of a global variable or function it references with some other name without changing what the function does.

We write $$A[x/B]$$ to denote the result of replacing all free occurrences of $$x$$ in expression $$A$$ with expression $$B$$. When we do this replacement, if any free occurrences of variables in $$B$$ would become bound by λ-abstractions in $$A$$, we rename the bound variables in $$A$$ to avoid the conflict.

β-reduction
The notion of program execution corresponds, in the λ-calculus, to the notion of β-reduction. The untyped λ-calculus is a language with non-strict evaluation, which admits of multiple possible evaluation strategies.

The basic relationship of β-reduction is:

$$(\lambda x. A)B \to_\beta A[x/B]$$

One can β-reduce an expression by performing the above transformation on it. One can also β-reduce an expression by performing the above transformation on any of its subexpressions. This rule is stated more formally as follows.


 * Suppose $$A \to_\beta B$$. Then, for all expressions $$C$$ and variables $$x$$:
 * $$AC \to_\beta BC$$.
 * $$CA \to_\beta CB$$.
 * $$\lambda x. A \to_\beta \lambda x. B$$

Note that this rule is to be applied recursively, so that one can β-reduce expressions nested arbitrarily deep in an expression in order to β-reduce the whole expression.

It is clear that the β-reduction relationship allows for multiple evaluation strategies, since for example one can β-reduce an expression $$AB$$ by first β-reducing $$A$$ and then β-reducing $$B$$, or the other way around.

Normalization
The notion of halting computation corresponds in the λ-calculus to the notion of "normalization." We say that an expression is a "β-normal form" if no further β-reduction steps on it are possible. This is equivalent to saying that it contains no expression of the form $$(\lambda x. A)B$$. A β-normal form is analogous to the output of a computation that has halted.

We say that an expression has a β-normal form if there is a series of β-reduction steps which can be performed on it to turn it into a β-normal form. To say that an expression has a β-normal form means that it expresses a halting computation.

Not all expressions have β-normal forms; not all computations halt. An example of an expression with no β-normal form is the expression:

$$\Omega = (\lambda x. xx)(\lambda x. xx)$$

You can easily see that the only β-reduction step on $$\Omega$$ which is possible reduces it to itself. Since it is not itself a β-normal form, and β-reduction can only turn it into itself, $$\Omega$$ has no β-normal form.

Evaluation strategy and uniqueness of β-normal forms
One question one might ask is this: does every expression which has a β-normal form, have a unique β-normal form? There are in general many different strategies for β-reducing an expression, and a priori it is not obvious that they should all produce the same result in the end. But, we would like it to be true that they all produce the same result in the end, because we don't want the output of a computation to depend on the evaluation strategy used to reduce it.

The Church-Rosser theorem implies that there is a unique β-normal form for any expression which has one. The Church-Rosser theorem states that if $$A$$ is an expression, and $$B_1$$ and $$B_2$$ can both be produced from $$A$$ by a series of β-reduction steps, then there is another expression $$C$$ such that both $$B_1$$ and $$B_2$$ can be β-reduced to $$C$$.

It follows from the Church-Rosser theorem that every expression has at most one β-normal form. This means that the result of a computation is independent of evaluation strategy.

One qualifier should be made to this statement. Some evaluation strategies might never reduce an expression to a β-normal form, even though the expression has a β-normal form which can be reached through a different evaluation strategy. For instance, eager evaluation, which always tries to β-reduce the arguments to a function before applying the function, will fail to reach a β-normal form on this expression:


 * $$(\lambda x\lambda y. x)(\lambda z. z)\Omega$$

This expression consists of the function $$\lambda x\lambda y. x$$ which just returns its first argument, applied to the identity function $$\lambda z. z$$ and the non-terminating computation $$\Omega$$. Its β-normal form is $$\lambda z. z$$, but this normal form will never be reached by eager evaluation, because eager evaluation will try to finish β-reducing $$\Omega$$ before calling $$\lambda x\lambda y. x$$.

The following evaluation strategy will always reach the β-normal form of an expression, if it has one: look for the first subexpression of the form $$(\lambda x. A)B$$ (i.e., the one whose λ is leftmost in the whole expression), β-reduce that subexpression, and repeat.

This evaluation strategy bears some resemblance to lazy evaluation, but it is not the same thing. Lazy evaluation, or at least one version of it which makes sense for the λ-calculus, is the following strategy: if the whole expression is of the form $$(\lambda x. A)B$$, then β-reduce it to $$A[x/B]$$, and otherwise do nothing.

Lazy evaluation, at least on this presentation, will stop evaluating an expression if the whole expression is a λ-abstraction, even though the expression might not be in normal form, because it might have subexpressions which are not in normal form. One thing this illustrates is that lazy evaluation isn't even necessarily a normalizing evaluation strategy; it may stop and consider the expression evaluated before the expression reaches a normal form. The same is true of eager evaluation: eager evaluation will generally halt when the whole expression is a λ-abstraction, even if the λ-abstraction contains subexpressions which are not normalized.

Programming in the λ-calculus
The λ-calculus is Turing complete; all Turing computable functions can be expressed in it. This might be a surprise, since it does not even include basic notions like numbers. In fact, all of the ordinary data types can be coded as functions. This is not an efficient way to represent ordinary data types in a real programming language, but conceptually it is perfectly possible. In this section we explain how to do this for booleans, lists, and natural numbers.

Booleans
We can code the boolean values as follows:


 * $$\mathbf{true} = \lambda x \lambda y. x$$
 * $$\mathbf{false} = \lambda x \lambda y. y$$

A boolean is a function which takes two arguments, representing what to do if the value is true, and what to do if the value is false. Given this definition, if-then-else can be defined as follows:


 * $$\mathbf{if} = \lambda b \lambda t \lambda e. bte$$

That is, "if $$b$$ then $$t$$, else $$e$$" can be coded as $$bte$$. Boolean negation can be defined as follows:


 * $$\mathbf{not} = \lambda b \lambda t \lambda e. bet$$

Logical "and" and "or" can be defined as follows:


 * $$\mathbf{and} = \lambda a \lambda b. a\ b\ \mathbf{false}$$
 * $$\mathbf{or} = \lambda a \lambda b. a\ \mathbf{true}\ b$$

Lists
We code a Lisp-style linked list as a function $$l$$ which takes two arguments $$f$$ and $$g$$. If $$l$$ is empty, it returns $$g$$. If $$l$$ is nonempty, it calls $$f$$ with two arguments: the first element of the list, and the rest of the list. We can define the basic list operations as follows:


 * $$\mathbf{nil} = \lambda f \lambda g. g$$
 * $$\mathbf{cons} = \lambda a \lambda b \lambda f \lambda g. fab$$
 * $$\mathbf{head} = \lambda l. l(\lambda h\lambda t. h)\Omega$$
 * $$\mathbf{tail} = \lambda l. l(\lambda h\lambda t. t)\Omega$$
 * $$\mathbf{isEmpty} = \lambda l. l(\lambda h\lambda t. \mathbf{false})\mathbf{true}$$

Note: we have defined taking the head or tail of an empty list to result in non-termination.

Natural numbers
Once we have lists, defining natural numbers is easy. One way to define them is to code 0 as $$\mathbf{nil}$$, 1 as $$\mathbf{cons}\ \mathbf{nil}\ \mathbf{nil}$$, and in general we code $$n+1$$ as $$\mathbf{cons}\ \mathbf{nil}\ n$$:


 * $$\mathbf{0} = \mathbf{nil}$$
 * $$\mathbf{isZero} = \mathbf{isEmpty}$$
 * $$\mathbf{succ} = \lambda n. \mathbf{cons}\ \mathbf{nil}\ n$$

We can then define addition and multiplication by recursion.

Recursion
One sticking point remains: how do we do recursion in the λ-calculus? Since functions don't have names, the definition of a function can't refer to the function itself. We can get around this using a function called the Y combinator, which gives a function as an argument to itself:

$$\mathbf{Y} = \lambda f. (\lambda x. f(xx)) (\lambda x. f(xx))$$

The key property of $$\mathbf{Y}$$ is that for all functions $$f$$, we have $$\mathbf{Y}f = f(\mathbf{Y}f)$$. Let us prove this:

$$\mathbf{Y}(f) = (\lambda x. f(xx)) (\lambda x. f(xx)) = f((\lambda x. f(xx)) (\lambda x. f(xx)) = f(\mathbf{Y}f)$$

Now let us see how to use the $$\mathbf{Y}$$ combinator to implement a recursive function. We will take the example of addition. Addition is defined recursively as follows:


 * $$n+0 = n$$
 * $$n+(\mathbf{succ}\ m) = \mathbf{succ}\ (n+m)$$

We will write the addition function as a function which takes itself as an argument. (More precisely, it takes the recursive version of itself, which just takes two numbers, as an argument.) Here is how it is defined:


 * $$\mathbf{plus}_0 = \lambda p \lambda n \lambda m.\ \mathbf{if}\ (\mathbf{isZero}\ m)\ n\ (\mathbf{succ}\ (p\ n\ (\mathbf{tail}\ m)))$$
 * $$\mathbf{plus} = \mathbf{Y}\ \mathbf{plus}_0$$

Let us take apart how this works. The definition of $$\mathbf{plus}_0$$ is just a straightforward transcription of the recursive definition of addition, except that it uses its argument $$p$$ in place of the addition function. When we take $$\mathbf{plus} = \mathbf{Y}\ \mathbf{plus}_0$$, we get a function which is equal to $$\mathbf{plus}_0(\mathbf{Y}\ \mathbf{plus}_0) = \mathbf{plus}_0(\mathbf{plus})$$; so we have succeeded in defining $$\mathbf{plus}$$ in terms of itself.