Vicare Scheme: expander intro lex

15.1.4 Lexical variables, labels, location gensyms

Let’s consider the library:

(library (demo)
  (export this)
  (import (rnrs (6)))
  (define this 8)
  (define that 9)
  (let ((a 1))
    (let ((a 2))
      (list a this that))))

and concentrate on the body:

(define this 8)
(define that 9)
(let ((a 1))
  (let ((a 2))
    (list a this that)))

This code defines 4 syntactic bindings: ‘this’ and ‘that’ as top level variable bindings, of which ‘this’ is also exported; outer ‘a’ as local variable binding; inner ‘a’ as local variable binding.

The purpose of the expansion process is to transform the input code into output code expressed in the core language. After the expansion process, every syntactic binding is renamed so that its name is unique in the whole library body. For example, we can imagine this pseudo–code:

(define lex.this 8)
(define lex.that 9)
(let ((lex.a.1 1))
  (let ((lex.a.2 2))
    ((primitive list) lex.a.2 lex.this lex.that)))

notice that the original identifier list, in reference position, has been replaced by the symbolic expression (primitive list) because it is captured by the core primitive binding of the initial lexical environment. The code undergoes the following lexical variable name substitutions:

original name	lexical variable name
this	lex.this
that	lex.that
outer a	lex.a.1
inner a	lex.a.2

where the lex.* symbols are gensyms; such gensyms are named lexical gensyms or lex gensyms. They uniquely identify a syntactic binding established in the library.

Renaming bindings is one of the core purposes of the expansion process; it is performed while visiting the source code as a tree in breadth–first order.

Lexical contours and `rib` objects

To distinguish among different bindings with the same name (like the two local bindings both named ‘a’ in the example) we must distinguish among different lexical contours: different regions of visibility for a set of syntactic bindings.

Every let syntax defines a new lexical contour; lexical contours can be nested by nesting let syntaxes; the library body is a lexical contour itself.

NOTE For simplicity, here we ignore the fact that let, in truth, defines 2 lexical contours: one for the bindings established by its first argument and one for the internal definitions. In the example there are no internal definitions, so the internal contour is not used.

 -------------------------------------------------
| (define this 8)              ;top-level contour |
| (define that 9)                                 |
| (let ((a 1))                                    |
|  -----------------------------------------      |
| |                            ;contour 1   |     |
| | (let ((a 2))                            |     |
| |  -------------------------------------  |     |
| | |                          ;contour 2 | |     |
| | | (list a this that)                  | |     |
| |  -------------------------------------  |     |
| |   )                                     |     |
|  -----------------------------------------      |
|   )                                             |
 -------------------------------------------------

Figure 15.1: Picture of lexical contours.

An eq?-unique object is assigned to each lexical contour; such objects are called marks. In practise each syntactic binding is associated to the mark representing its visibility region. So the original code is accompanied by the associations:

original name	lexical contour mark
this	src-mark
that	src-mark
outer a	1-mark
inner a	2-mark

which are registered in a component of the lexical environment: a record of type rib. Every lexical contour is described by a rib; the rib for the top level contour holds the associations:

original name	lexical contour mark
this	src-mark
that	src-mark

the rib of the outer let holds the associations:

original name	lexical contour mark
outer a	1-mark

the rib of the inner let holds the associations:

original name	lexical contour mark
inner a	2-mark

Syntax objects and syntax identifiers

While the code is being visited by the expander: data structures called syntax objects are created to keep track of the lexical contours.

At first, the whole code is in a syntax object referencing the top rib structure:
```
#<syntax-object
   expr=(begin
          (define this 8)
          (define that 9)
          (let ((a 1))
            (let ((a 2))
              (list a this that))))
   rib=#<rib mark=src-mark>>
```
syntactic bindings established in this contour will get the src–mark; expressions in the right–hand sides of binding definitions are expanded in the context of the src-mark.
After the outer contour has been processed, the outer let is in a syntax object:
```
#<syntax-object
   expr=(let ((a 1))
          (let ((a 2))
            (list a this that)))
   rib=#<rib mark=1-mark>>
```
syntactic bindings established in this contour will get the 1-mark; expressions in the right–hand sides of binding definitions are expanded in the context of the src-mark.
After the outer let has been processed, the inner let is in a syntax object:
```
#<syntax-object
   expr=(let ((a 2))
          (list a this that))
   rib=#<rib mark=2-mark>>
```
syntactic bindings established in this contour will get the 2-mark; expressions in the right–hand sides of binding definitions are expanded in the context of the 1-mark.
After the inner let has been processed, the expression is in a syntax object:
```
#<syntax-object
   expr=(list a this that)
   rib=#<rib mark=2-mark>>
```
the expression is expanded in the context of the 2-mark.

A syntax object having a syntactic binding name as source code expression is called syntactic identifiers; an identifier is a data structure holding the mark of its visibility region/lexical contour among its fields.

Label gensyms and `rib` objects

An eq?-unique object is assigned to each syntactic binding: a gensym indicated as label gensym or just label; such associations are also stored in the rib representing a lexical contour:

original name	lexical contour mark	label
this	src-mark	lab.this
that	src-mark	lab.that
outer a	1-mark	lab.a.1
inner a	2-mark	lab.a.2

where the symbols lab.* are gensyms.

Lexical variable gensyms and the LEXENV

The fact that the lex gensyms in the expanded code are syntactic bindings representing variables is registered in a portion of the lexical environment indicated as LEXENV. So the expanded code is accompanied by the association:

label	lexical variables
lab.this	lex.this
lab.that	lex.that
lab.a.1	lex.a.1
lab.a.2	lex.a.2

Notice that, after the expansion: the original names of the internal bindings (those defined by let) do not matter anymore; the original names of the non–exported top level bindings do not matter anymore; only the original name of the exported top level bindings is still important.

Storage location gensyms and GLOBAL-ENV

About the value of syntactic bindings:

The value of local variables goes on the Scheme stack and it exists only while the code is being evaluated.
The value of local keywords goes on the Scheme stack and it exists only while the code is being expanded.
The value of top level bindings must be stored in some persistent location, because it must exist for the whole time the library is loaded in a running Vicare process.

But where is a top level binding value stored? The answer is: gensyms are created for the sole purpose of acting as storage locations for top level bindings; such gensyms are indicated as location gensyms or loc gensyms. Under Vicare, symbols are data structures having a value slot: such slot has symbol-value as accessor and set-symbol-value! as mutator and it is used as storage location.

So the expanded code is accompanied by the following association:

label	location gensym
lab.this	loc.this
lab.that	loc.that

where the loc.* symbols are gensyms. To represent the association between the top level binding labels (both the exported ones and the non–exported ones) and their storage location gensyms, the expander builds a data structure indicated as GLOBAL-ENV.

Exported bindings and EXPORT-SUBST

Not all the top level syntactic bindings are exported by a library. To list those that are, a data structure is built and indicated as EXPORT-SUBST; such data structure associates the external name of exported bindings to their label gensym. For the example library, the EXPORT-SUBST represents the association:

label	external name
lab.this	this

If the export clause of the library form renames a binding as in:

(export (rename this external-this))

then the EXPORT-SUBST represents the association:

label	external name
lab.this	external-this