Libraries for Vicare Scheme: srfi eager-comp design

A lazy comprehension, for example stream-of in the sense of SRFI-41, constructs an object representing a sequence of values. They are actually produced only at the time they are needed. An eager comprehension, on the other hand, is an instruction to run through a certain sequence of values and do something with it, for example as in do-ec. In other words, it is nothing more sophisticated than a loop, potentially with a more convenient notation. This also explains why stream-of is the most fundamental lazy comprehension, and all others can be formulated in terms of it, whereas the most fundamental eager comprehension is do-ec.

Why the [outer .. inner | expr] order of qualifiers?

In principle, there are six possible orders in which the qualifiers and the expression of a comprehension can be written. We denote the different conventions with a pattern in which expr denotes the expression over which the comprehension ranges, inner denotes the generator spinning fastest, and outer denotes the generator spinning slowest. For example, Haskell and Python use:

[expr | outer .. inner]

Probably with sufficient persistence, instances for any of the conventions can be found on the Internet. In addition, there is the common mathematical notation {f(x) | x in X}.

It is important to understand that the notational convention does not only determine the order of enumeration but also the scope of the variables introduced by the generators. The scope of inner includes expr, and the scope of outer should include inner to allow inner generators to depend on outer generators. Eventually, the choice for a particular syntactic convention is largely a matter of personal preferences. However, there are a few considerations that went into the choice made for this SRFI:

Regarding these contradicting preferences, I regard linearity in scoping (point 2) most important, followed by readability for more complicated comprehensions (point 3). This leads to:

[outer .. inner | expr]

(list-ec (: x 10) (: y x) (f x y))

which looks acceptable to me even without similarity to the mathematical notation. As a downside, the convention clashes with other the convention used in other languages (e.g. Haskell and Python).

You forgot choose your favorite here–ec!

I tried to construct a reasonably useful set of tools according to what R5RS specifies. Nevertheless, the choice about what to include and what to leave out is a matter of personal preference.

When “packing the toolbox” I went for travelling light; this SRFI does not include everything imaginable or even everything useful. I oriented myself at the standard procedures of R5RS, with a few omissions and additions. A notable omission are gcd-ec and lcm-ec because they are one–liners, and more severely, of questionable value in practice. Notable additions are fold-ec and fold3-ec, providing a mechanism to define lots of useful one–liners. The other notable addition is first-ec, which is the fundamental “early stopping” comprehension. It is used to define any?-ec and every?-ec which are among the most frequent comprehensions.

Concerning the generators, the same principle has been used. Additions include :range and friends because they are universally needed, and :dispatched which is primarily intended for implementing :.

Why is the order of enumeration specified?

For the purpose of this SRFI, every generator runs through its sequence of bindings in a well specified order, and nested generators run through the Cartesian product in the order of nested loops. The purpose of this definition is making the sequence as easily predictable as possible. On the other hand, many mechanisms for lazy comprehensions do not specify the order in which the elements are enumerated. When it comes to infinite streams, this has the great advantage that a comprehension may interleave an inner and an outer enumeration, a method also known as “dove–tailing” or “diagonalizing”. Interleaving ensures that any value of the resulting stream is produced after a finite amount of time, even if one or more inner streams are infinite.

Why both typed and dispatching generators?

The reason for typed generators is runtime efficiency. In fact, the code produced by :range and others will run as fast as a hand–coded do loop. The primary purpose of the dispatching generator is convenience. It comes at the price of reduced runtime performance, both for loop iteration and startup.

Why the something-ec and :type naming?

The purpose of the :type convention is to keep many common comprehensions down to one–liners. In my opinion, the fundamental nature of eager comprehensions justifies a single character naming convention. The something-ec convention is primarily intended to stay away from the widely used something-of. It reduces confusion and conflict with related mechanisms.

Why combine variable binding and sequence definition?

The generators of this SRFI do two different things with a single syntactic construct: They define a sequence of values to enumerate and they specify a variable (within a certain scope) to run through that sequence. An alternative is to separate the two, for example as it has been done in SRFI-41.

The reason for combining sequence definition and enumeration for the purpose of this SRFI is threefold. Firstly, sequences of values are not explicitly represented as objects in the typed generators; the generators merely manipulate an internal state. Secondly, this SRFI aims at a most concise notation for common comprehensions and reduces syntax to the naked minimum. Thirdly, this SRFI aims at the highest possible performance for typed generators, which is achieved if the state being manipulated is represented by the loop variable itself.

Why is (: vars) illegal?

(: vars)

(:integers vars)

enumerating the non–negative integers. However, it turned out that a frequent mistake in using the eager comprehensions is to forget either the variable or an argument for the enumeration. As this would lead to an infinite loop (not always equally pleasant in interactive sessions), it is not allowed.

Why is there no :sequential?

Just like :parallel enumerates generators in parallel, a :sequential generator could enumerate a concatenation of several generator, starting the next one when the previous has finished. The reason for not having such a qualifier is that the generators should use all the same variable name and there is no hygienic and referentially transparent way of enforcing this (or even knowing the variable).

Why is there no general let qualifier?

It is easy to add let, let*, and letrec as cases to qualifier. This would allow more sophisticated local variables and expressions than possible with:

(:let vars expression)

(begin sequence*)

In particular, a local definition in the sense of R5RS Section 7.1.5 would be possible.

There are two reasons for not including let and friends as qualifiers. The first reason concerns readability. A qualifier of the form:

(let (binding-spec*) body)

only makes sense if the scope of the new variables ends at the end of the comprehension, and not already after body. The similarity with ordinary let expressions would be very confusing. The second reason concerns the design rationale. If sophisticated let qualifiers involving recursion or local definitions are needed, it is likely that eager comprehensions are being overused. In that case it might be better to define a procedure for the task. So including an invitation to overuse the mechanism would be a serious violation of the Keep It Simple and Stupid principle.

Why is there no :nested generator?

The specification above defines nested as a qualifier but :parallel as a generator. In particular, this makes it impossible to make parallel generators from nested ones.

This design simply reflects an implementability limitation. All component generators of :parallel are transformed into :do-generators and these can be merged into a parallel generator. However, nested generators cannot be merged easily without losing the type of the generator, which would seriously hurt modularity and performance.

Is any?-ec eager?

Yes, it is still eager because it immediately starts to run through the sequence.

In fact, the reference implementation makes sure first-ec, any?-ec, and every?-ec execute efficiently so they can be used conveniently as in:

(every?-ec (:list x my-list) (pred? x))

Why this whole :dispatched business?

It is specified above that the dispatching generator, called :, is just a special case of :dispatched using a global dispatching procedure. Alternatively, a simple fixed global mechanism to extend : could have been used. This is much simpler but does not support the definition of new dispatched generators.

The purpose of :dispatched and its utilities (:generator-proc and dispatch-union) is the following. Assume : is to be used inside a module but it is essential that no other module can spoil it, e.g. by installing a very slow dispatcher. The recommended way to proceed in this case is to define a local copy of the original dispatching generator :, for example with the following code:

(define :my-dispatch
  (make-initial-:-dispatch))

(define-syntax :my
  (syntax-rules (index)
    [(:my cc var (index i) arg1 arg ...)
     (:dispatched cc var (index i) :my-dispatch arg1 arg ...)]
    [(:my cc var arg1 arg ...)
     (:dispatched cc var :my-dispatch arg1 arg ...)]))

An alternative for the dispatching mechanism as defined in this SRFI is the use of parameter objects in the sense of SRFI-39. The dispatching generator would then access a dynamically scoped variable to find the dispatcher, allowing full control over dispatching. However, this approach does not solve the dilemma that it is sometimes useful that : is global and sometimes undesired. The approach specified for this SRFI addresses this dilemma by offering options.

Another alternative for dealing with the dispatching problem is adding an optional argument to the syntax of : through which the dispatcher can be passed explicitly. However, as : has variable arity and the identifier for the variable cannot be distinguished from any value for a dispatcher, this is syntactically problematic.

Why is there no local mechanism for adding to :?

According to R5RS Section 7.1.6 macros can only be defined at the level of the <program> syntax. This implies that the scope of typed generators cannot easily be limited to local scopes. As typed and dispatched generators go together, there is also no strong need for a limited scope of dispatched generators either. Furthermore, locally extendable dispatchers are another major headache to those trying to understand other people’s code.

Why are dispatchers unary?

As defined in :dispatched, a dispatching procedure is called with a single argument being the list of values to dispatch on. An alternative is to apply the dispatcher to the list of values to dispatch on, which would be more natural in Scheme.

The reason for not using apply is a minor improvement in efficiency. Every time apply is used on a procedure of variable arity, an object containing the argument list is allocated on the heap. As a dispatcher may call many other dispatchers, this will add to the overhead of dispatching, which is relevant in inner loops.

Why are there two fold comprehensions?

The reason for having two fold comprehensions (fold-ec and fold3-ec) is efficiency.

Clearly, the more general construction is fold3-ec as it allows individual treatment of the empty sequence case and the singleton sequence case. However, this comes at the price of more book–keeping as can be seen from the implementation example. As the overhead is located within inner loops, it makes sense to define another fold comprehension for the case where the added flexibility is not needed. This is fold-ec.

The names fold-ec and fold3-ec have been chosen for the comprehensions in order to stay clear any other ’fold’ that may be around.

Why is :char-range not defined by integer->char?

The definition of :char-range specifies a sequence of adjacent characters ordered by char<=?. The reason for not using char->integer and integer->char is the fact that R5RS Section 6.3.4 leaves it to the implementation whether the integers representing characters are consecutive or not. In effect, this underspecification is inherited by :char-range.

2.22.5 Design rationale

What is the difference between eager and lazy comprehensions?

Why the `[outer .. inner | expr]` order of qualifiers?

You forgot choose your favorite here–`ec!`

Why is the order of enumeration specified?

Why both typed and dispatching generators?

Why the `something-ec` and `:type` naming?

Why combine variable binding and sequence definition?

Why is `(: vars)` illegal?

Why is there no `:sequential`?

Why is there no general `let` qualifier?

Why is there no `:nested` generator?

Is `any?-ec` eager?

Why this whole `:dispatched` business?

Why is there no local mechanism for adding to `:`?

Why are dispatchers unary?

Why are there two fold comprehensions?

Why is `:char-range` not defined by `integer->char`?

2.22.5 Design rationale

What is the difference between eager and lazy comprehensions?

Why the [outer .. inner | expr] order of qualifiers?

You forgot choose your favorite here–ec!

Why is the order of enumeration specified?

Why both typed and dispatching generators?

Why the something-ec and :type naming?

Why combine variable binding and sequence definition?

Why is (: vars) illegal?

Why is there no :sequential?

Why is there no general let qualifier?

Why is there no :nested generator?

Is any?-ec eager?

Why this whole :dispatched business?

Why is there no local mechanism for adding to :?

Why are dispatchers unary?

Why are there two fold comprehensions?

Why is :char-range not defined by integer->char?

Why the `[outer .. inner | expr]` order of qualifiers?

You forgot choose your favorite here–`ec!`

Why the `something-ec` and `:type` naming?

Why is `(: vars)` illegal?

Why is there no `:sequential`?

Why is there no general `let` qualifier?

Why is there no `:nested` generator?

Is `any?-ec` eager?

Why this whole `:dispatched` business?

Why is there no local mechanism for adding to `:`?

Why is `:char-range` not defined by `integer->char`?