Next: srfi eager-comp ack, Previous: srfi eager-comp spec, Up: srfi eager-comp [Index]
A lazy comprehension, for example stream-of
in the sense of
SRFI-41, constructs an object representing a sequence of values. They
are actually produced only at the time they are needed. An eager
comprehension, on the other hand, is an instruction to run through a
certain sequence of values and do something with it, for example as in
do-ec
. In other words, it is nothing more sophisticated than a
loop, potentially with a more convenient notation. This also explains
why stream-of
is the most fundamental lazy comprehension, and all
others can be formulated in terms of it, whereas the most fundamental
eager comprehension is do-ec
.
[outer .. inner | expr]
order of qualifiers?In principle, there are six possible orders in which the qualifiers and the expression of a comprehension can be written. We denote the different conventions with a pattern in which expr denotes the expression over which the comprehension ranges, inner denotes the generator spinning fastest, and outer denotes the generator spinning slowest. For example, Haskell and Python use:
[expr | outer .. inner]
Probably with sufficient persistence, instances for any of the
conventions can be found on the Internet. In addition, there is the
common mathematical notation {f(x) | x in X}
.
It is important to understand that the notational convention does not only determine the order of enumeration but also the scope of the variables introduced by the generators. The scope of inner includes expr, and the scope of outer should include inner to allow inner generators to depend on outer generators. Eventually, the choice for a particular syntactic convention is largely a matter of personal preferences. However, there are a few considerations that went into the choice made for this SRFI:
[expr | inner .. outer]
and:
[expr | outer .. inner]
[expr | inner .. outer]
and:
[outer .. inner | expr]
and not change direction, as in:
[expr | outer .. inner]
where expr is in the scope of inner but outer is not.
This is even more important if the syntax in Scheme does not explicitly
contain the |
separator.
do
and named let
.
This is in favor of:
[expr | outer .. inner]
and:
[outer .. inner | expr]
Shorter comprehensions may look more naturally the other way around.
Regarding these contradicting preferences, I regard linearity in scoping (point 2) most important, followed by readability for more complicated comprehensions (point 3). This leads to:
[outer .. inner | expr]
An example in Scheme syntax is:
(list-ec (: x 10) (: y x) (f x y))
which looks acceptable to me even without similarity to the mathematical notation. As a downside, the convention clashes with other the convention used in other languages (e.g. Haskell and Python).
ec!
I tried to construct a reasonably useful set of tools according to what R5RS specifies. Nevertheless, the choice about what to include and what to leave out is a matter of personal preference.
When “packing the toolbox” I went for travelling light; this SRFI
does not include everything imaginable or even everything useful. I
oriented myself at the standard procedures of R5RS, with a few
omissions and additions. A notable omission are gcd-ec
and
lcm-ec
because they are one–liners, and more severely, of
questionable value in practice. Notable additions are fold-ec
and fold3-ec
, providing a mechanism to define lots of useful
one–liners. The other notable addition is first-ec
, which is
the fundamental “early stopping” comprehension. It is used to define
any?-ec
and every?-ec
which are among the most frequent
comprehensions.
Concerning the generators, the same principle has been used. Additions
include :range
and friends because they are universally needed,
and :dispatched
which is primarily intended for implementing
:
.
For the purpose of this SRFI, every generator runs through its sequence of bindings in a well specified order, and nested generators run through the Cartesian product in the order of nested loops. The purpose of this definition is making the sequence as easily predictable as possible. On the other hand, many mechanisms for lazy comprehensions do not specify the order in which the elements are enumerated. When it comes to infinite streams, this has the great advantage that a comprehension may interleave an inner and an outer enumeration, a method also known as “dove–tailing” or “diagonalizing”. Interleaving ensures that any value of the resulting stream is produced after a finite amount of time, even if one or more inner streams are infinite.
The reason for typed generators is runtime efficiency. In fact, the
code produced by :range
and others will run as fast as a
hand–coded do
loop. The primary purpose of the dispatching
generator is convenience. It comes at the price of reduced runtime
performance, both for loop iteration and startup.
something-ec
and :type
naming?The purpose of the :type
convention is to keep many common
comprehensions down to one–liners. In my opinion, the fundamental
nature of eager comprehensions justifies a single character naming
convention. The something-ec
convention is primarily intended to
stay away from the widely used something-of
. It reduces
confusion and conflict with related mechanisms.
The generators of this SRFI do two different things with a single syntactic construct: They define a sequence of values to enumerate and they specify a variable (within a certain scope) to run through that sequence. An alternative is to separate the two, for example as it has been done in SRFI-41.
The reason for combining sequence definition and enumeration for the purpose of this SRFI is threefold. Firstly, sequences of values are not explicitly represented as objects in the typed generators; the generators merely manipulate an internal state. Secondly, this SRFI aims at a most concise notation for common comprehensions and reduces syntax to the naked minimum. Thirdly, this SRFI aims at the highest possible performance for typed generators, which is achieved if the state being manipulated is represented by the loop variable itself.
(: vars)
illegal?It is reasonable and easy to define:
(: vars)
as:
(:integers vars)
enumerating the non–negative integers. However, it turned out that a frequent mistake in using the eager comprehensions is to forget either the variable or an argument for the enumeration. As this would lead to an infinite loop (not always equally pleasant in interactive sessions), it is not allowed.
:sequential
?Just like :parallel
enumerates generators in parallel, a
:sequential
generator could enumerate a concatenation of several
generator, starting the next one when the previous has finished. The
reason for not having such a qualifier is that the generators should use
all the same variable name and there is no hygienic and referentially
transparent way of enforcing this (or even knowing the variable).
let
qualifier?It is easy to add let
, let*
, and letrec
as cases to
qualifier. This would allow more sophisticated local variables
and expressions than possible with:
(:let vars expression)
and:
(begin sequence*)
In particular, a local definition in the sense of R5RS Section 7.1.5 would be possible.
There are two reasons for not including let
and friends as
qualifiers. The first reason concerns readability. A qualifier of the
form:
(let (binding-spec*) body)
only makes sense if the scope of the new variables ends at the end of
the comprehension, and not already after body. The similarity
with ordinary let
expressions would be very confusing. The
second reason concerns the design rationale. If sophisticated
let
qualifiers involving recursion or local definitions are
needed, it is likely that eager comprehensions are being overused. In
that case it might be better to define a procedure for the task. So
including an invitation to overuse the mechanism would be a serious
violation of the Keep It Simple and Stupid principle.
:nested
generator?The specification above defines nested as a qualifier but
:parallel
as a generator. In particular, this makes it
impossible to make parallel generators from nested ones.
This design simply reflects an implementability limitation. All
component generators of :parallel
are transformed into
:do-generators
and these can be merged into a parallel generator.
However, nested generators cannot be merged easily without losing the
type of the generator, which would seriously hurt modularity and
performance.
any?-ec
eager?Yes, it is still eager because it immediately starts to run through the sequence.
In fact, the reference implementation makes sure first-ec
,
any?-ec
, and every?-ec
execute efficiently so they can be
used conveniently as in:
(every?-ec (:list x my-list) (pred? x))
:dispatched
business?It is specified above that the dispatching generator, called :
,
is just a special case of :dispatched
using a global dispatching
procedure. Alternatively, a simple fixed global mechanism to extend
:
could have been used. This is much simpler but does not
support the definition of new dispatched generators.
The purpose of :dispatched
and its utilities
(:generator-proc
and dispatch-union
) is the following.
Assume :
is to be used inside a module but it is essential that
no other module can spoil it, e.g. by installing a very slow dispatcher.
The recommended way to proceed in this case is to define a local copy of
the original dispatching generator :
, for example with the
following code:
(define :my-dispatch (make-initial-:-dispatch)) (define-syntax :my (syntax-rules (index) [(:my cc var (index i) arg1 arg ...) (:dispatched cc var (index i) :my-dispatch arg1 arg ...)] [(:my cc var arg1 arg ...) (:dispatched cc var :my-dispatch arg1 arg ...)]))
and to use the new generator :my
instead of :
.
An alternative for the dispatching mechanism as defined in this SRFI
is the use of parameter objects in the sense of SRFI-39. The
dispatching generator would then access a dynamically scoped variable to
find the dispatcher, allowing full control over dispatching. However,
this approach does not solve the dilemma that it is sometimes useful
that :
is global and sometimes undesired. The approach specified
for this SRFI addresses this dilemma by offering options.
Another alternative for dealing with the dispatching problem is adding
an optional argument to the syntax of :
through which the
dispatcher can be passed explicitly. However, as :
has variable
arity and the identifier for the variable cannot be distinguished from
any value for a dispatcher, this is syntactically problematic.
:
?According to R5RS Section 7.1.6 macros can only be defined at the
level of the <program>
syntax. This implies that the scope of
typed generators cannot easily be limited to local scopes. As typed and
dispatched generators go together, there is also no strong need for a
limited scope of dispatched generators either. Furthermore, locally
extendable dispatchers are another major headache to those trying to
understand other people’s code.
As defined in :dispatched
, a dispatching procedure is called with
a single argument being the list of values to dispatch on. An
alternative is to apply the dispatcher to the list of values to dispatch
on, which would be more natural in Scheme.
The reason for not using apply is a minor improvement in efficiency. Every time apply is used on a procedure of variable arity, an object containing the argument list is allocated on the heap. As a dispatcher may call many other dispatchers, this will add to the overhead of dispatching, which is relevant in inner loops.
The reason for having two fold comprehensions (fold-ec
and
fold3-ec
) is efficiency.
Clearly, the more general construction is fold3-ec
as it allows
individual treatment of the empty sequence case and the singleton
sequence case. However, this comes at the price of more book–keeping
as can be seen from the implementation example. As the overhead is
located within inner loops, it makes sense to define another fold
comprehension for the case where the added flexibility is not needed.
This is fold-ec
.
The names fold-ec
and fold3-ec
have been chosen for the
comprehensions in order to stay clear any other ’fold’ that may be
around.
:char-range
not defined by integer->char
?The definition of :char-range
specifies a sequence of adjacent
characters ordered by char<=?
. The reason for not using
char->integer
and integer->char
is the fact that R5RS
Section 6.3.4 leaves it to the implementation whether the integers
representing characters are consecutive or not. In effect, this
underspecification is inherited by :char-range
.
Next: srfi eager-comp ack, Previous: srfi eager-comp spec, Up: srfi eager-comp [Index]