Libraries for Vicare Scheme: streams examples pitfalls

Programming with streams, or any lazy evaluator, can be tricky, even for programmers experienced in the genre. Programming with streams is even worse in Scheme than in a purely functional language, because, though the streams are lazy, the surrounding Scheme expressions in which they are embedded are eager. The impedance between lazy and eager can occasionally lead to astonishing results. Thirty–two years ago, William Burge warned:

For example, a simple version of stream-map that returns a stream built by applying a unary procedure to the elements of an input stream could be defined like this:

(define-stream (stream-map proc strm) ;wrong!
  (let loop ([strm strm])
    (if (stream-null? strm)
        stream-null
      (stream-cons
        (proc (stream-car strm))
        (loop (stream-cdr strm))))))

That looks right. It properly wraps the procedure in stream-lambda, and the two legs of the if both return streams, so it type–checks. But it fails because the named let binds loop to a procedure using normal lambda rather than stream-lambda, so even though the first element of the result stream is lazy, subsequent elements are eager. stream-map can be written using stream-let:

(define-stream (stream-map proc strm)
  (stream-let loop ([strm strm])
    (if (stream-null? strm)
        stream-null
      (stream-cons
        (proc (stream-car strm))
        (loop (stream-cdr strm))))))

Here, stream-let assures that each element of the result stream is properly delayed, because each is subject to the stream-lambda that is implicit in stream-let, so the result is truly a stream, not a “list in disguise”. Another version of this procedure was given previously at the description of define-stream.

Another common problem occurs when a stream–valued procedure requires the next stream element in its definition. Consider this definition of stream-unique:

(define-stream (stream-unique eql? strm) ;wrong!
  (stream-match strm
    [() strm]
    [(_) strm]
    [(a b . _)
     (if (eql? a b)
         (stream-unique eql?
           (stream-cdr strm))
       (stream-cons a
         (stream-unique eql?
           (stream-cdr strm))))]))

the (a b . _) pattern requires the value of the next stream element after the one being considered. Thus, to compute the nth element of the stream, one must know the n+1st element, and to compute the n+1st element, one must know the n+2nd element, and to compute... The correct version, given above in the description of stream-drop-while, only needs the current stream element.

A similar problem occurs when the stream expression uses the previous element to compute the current element:

(define (nat n)
  (stream-ref
    (stream-let loop ([s (stream 0)])
      (stream-cons (stream-car s)
        (loop (stream (add1 (stream-car s))))))
    n))

this program traverses the stream of natural numbers, building the stream as it goes. The definition is correct: (nat 15) evaluates to 15; but it needlessly uses unbounded space because each stream element holds the value of the prior stream element in the binding to s.

When traversing a stream, it is easy to write the expression in such a way that evaluation requires unbounded space, even when that is not strictly necessary. During the discussion of SRFI-40, Joe Marshall created this infamous procedure:

(define (times3 n)
  (stream-ref
    (stream-filter
      (lambda (x)
        (zero? (modulo x n)))
      (stream-from 0))
    3))

(times3 5) evaluates to 15 and (times3 #e1e9) evaluates to three billion, though it takes a while. In either case, times3 should operate in bounded space, since each iteration mutates the promise that holds the next value. But it is easy to write times3 so that it does not operate in bounded space, as the follies of SRFI-40 showed.

The common problem is that some element of the stream (often the first element) is bound outside the expression that is computing the stream, so it holds the head of the stream, which holds the second element, and so on. In addition to testing the programmer, this procedure tests the stream primitives (it caught several errors during development) and also tests the underlying Scheme system (it found a bug in one implementation).

Laziness is no defense against an infinite loop; for instance, the expression below never returns, because the odd? predicate never finds an odd stream element.

(stream-null?
  (stream-filter odd?
    (stream-from 0 2)))

Ultimately, streams are defined as promises, which are implemented as thunks (lambda with no arguments). Since a stream is a procedure, comparisons such as eq?, eqv? and equal? are not meaningful when applied to streams. For instance, the expression

(define s ((stream-lambda () stream-null)))

defines s as the null stream, and (stream-null? s) is #t, but (eq? s stream-null) is #f.

To determine if two streams are equal, it is necessary to evaluate the elements in their common prefixes, reporting #f if two elements ever differ and #t if both streams are exhausted at the same time.

(define (stream-equal? eql? xs ys)
  (cond [(and (stream-null? xs)
              (stream-null? ys)) #t]
        [(or (stream-null? xs)
             (stream-null? ys)) #f]
        [(not (eql? (stream-car xs)
                    (stream-car ys))) #f]
        [else (stream-equal? eql?
                (stream-cdr xs)
                (stream-cdr ys))]))

It is generally not a good idea to mix lazy streams with eager side–effects, because the order in which stream elements are evaluated determines the order in which the side–effects occur. For a simple example, consider this side–effecting version of strm123:

(define strm123-with-side-effects
  (stream-cons (begin (display "one") 1)
    (stream-cons (begin (display "two") 2)
      (stream-cons (begin (display "three") 3)
        stream-null))))

The stream has elements 1 2 3. But depending on the order in which stream elements are accessed, "one", "two" and "three" could be printed in any order.

Since the performance of streams can be very poor, normal (eager) lists should be preferred to streams unless there is some compelling reason to the contrary. For instance, computing pythagorean triples with streams:

(stream-ref
  (stream-of (list a b c)
    (n in (stream-from 1))
    (a in (stream-range 1 n))
    (b in (stream-range a n))
    (c is (- n a b))
    (= (+ (* a a) (* b b)) (* c c)))
  50)

is about two orders of magnitude slower than the equivalent expression using loops:

(do ([n 1 (+ n 1)]) ([> n 228])
  (do ([a 1 (+ a 1)]) ([> a n])
    (do ([b a (+ b 1)]) ([> b n])
      (let ([c (- n a b)])
        (if (= (+ (* a a) (* b b)) (* c c))
            (display (list a b c)))))))

1.15.14.6 Pitfalls