Vicare Scheme: stdlib syntax-case hygiene

5.12.2 Hygiene

Barendregt’s hygiene condition for the lambda calculus is an informal notion that requires the free variables of an expression N that is to be substituted into another expression M not to be captured by bindings in M when such capture is not intended.

Kohlbecker, et al. propose a corresponding hygiene condition for macro expansion that applies in all situations where capturing is not explicit: “Generated identifiers that become binding instances in the completely expanded program must only bind variables that are generated at the same transcription step”. In the terminology of this document, the “generated identifiers” are those introduced by a transformer rather than those present in the form passed to the transformer, and a “macro transcription step” corresponds to a single call by the expander to a transformer. Also, the hygiene condition applies to all introduced bindings rather than to introduced variable bindings alone.

This leaves open what happens to an introduced identifier that appears outside the scope of a binding introduced by the same call. Such an identifier refers to the lexical binding in effect where it appears (within a syntax ?template) inside the transformer body or one of the helpers it calls. This is essentially the referential transparency property described by Clinger and Rees. Thus, the hygiene condition can be restated as follows:

A binding for an identifier introduced into the output of a transformer call from the expander must capture only references to the identifier introduced into the output of the same transformer call.

A reference to an identifier introduced into the output of a transformer refers to the closest enclosing binding for the introduced identifier or, if it appears outside of any enclosing binding for the introduced identifier, the closest enclosing lexical binding where the identifier appears (within a syntax ?template) inside the transformer body or one of the helpers it calls.

Explicit captures are handled via datum->syntax.

Operationally, the expander can maintain hygiene with the help of marks. Marks are applied selectively by the expander to the output of each transformer it invokes, and substitutions are applied to the portions of each binding form that are supposed to be within the scope of the bound identifiers. Marks are used to distinguish like–named identifiers that are introduced at different times (either present in the source or introduced into the output of a particular transformer call), and substitutions are used to map identifiers to their expand-time values.

Each time the expander encounters a macro use, it applies an antimark to the input form, invokes the associated transformer, then applies a fresh mark to the output. Marks and antimarks cancel, so the portions of the input that appear in the output are effectively left unmarked, while the portions of the output that are introduced are marked with the fresh mark.

Each time the expander encounters a binding form it creates a set of substitutions, each mapping one of the (possibly marked) bound identifiers to information about the binding. (For a lambda expression, the expander might map each bound identifier to a representation of the formal parameter in the output of the expander. For a let-syntax form, the expander might map each bound identifier to the associated transformer.) These substitutions are applied to the portions of the input form in which the binding is supposed to be visible.

Marks and substitutions together form a wrap that is layered on the form being processed by the expander and pushed down toward the leaves as necessary. A wrapped form is referred to as a wrapped syntax object. Ultimately, the wrap may rest on a leaf that represents an identifier, in which case the wrapped syntax object is also referred to as an identifier. An identifier contains a name along with the wrap. (Names are typically represented by symbols.)

When a substitution is created to map an identifier to an expand–time value, the substitution records the name of the identifier and the set of marks that have been applied to that identifier, along with the associated expand–time value. The expander resolves identifier references by looking for the latest matching substitution to be applied to the identifier, i.e., the outermost substitution in the wrap whose name and marks match the name and marks recorded in the substitution. The name matches if it is the same name (if using symbols, then by eq?), and the marks match if the marks recorded with the substitution are the same as those that appear below the substitution in the wrap, i.e., those that were applied before the substitution. Marks applied after a substitution, i.e., appear over the substitution in the wrap, are not relevant and are ignored.

An algebra that defines how marks and substitutions work more precisely is given in section 2.4 of Oscar Waddell’s PhD thesis. Note, however, that Waddell’s thesis describes slightly different semantics for bound-identifier=?, it specifies that for two identifiers to be equal in the sense of bound-identifier=?, they must have the same marks and be equal in the sense of free-identifier=?, whereas this report requires instead that they must have the same marks and have the same name.