Libraries for Vicare Scheme: parser logic operators

After all the macros have been expanded, the parser is a set of operator functions extracting characters from an input device with the purpose of producing a token. Some operators are “entry points” to the parser: public functions we can call to start parsing; other operators are for internal use only. Each operator is meant to either: tail–call another operator, terminate parsing by raising an exception, terminate parsing by returning an error value, terminate parsing successfully by returning a token value.

Operators are generated by macros from a symbolic expression specifying an abstract parser:

(define-parser-logic define-parser ch next fail . ?operators)

and containing a subexpression for each operator. Access to the input device is specified by another macro which must implement a set of syntax-rules:

(define-syntax device-logic
  (syntax-rules (:introduce-device-arguments
                 :generate-end-of-input-or-char-tests
                 :unexpected-end-of-input
                 :generate-delimiter-test
                 :invalid-input-char)
    ((_ :introduce-device-arguments          ---) ---)
    ((_ :generate-end-of-input-or-char-tests ---) ---)
    ((_ :unexpected-end-of-input             ---) ---)
    ((_ :generate-delimiter-test             ---) ---)
    ((_ :invalid-input-char                  ---) ---)))

Concrete parsers are defined by combining the parser logic with the device logic:

(define-parser device-logic (?operator-name ...))

we can define any number of concrete parsers using the same parser logic and different device logics; at the end of the expansion, the input device forms are hard coded into the operator. The list of ?operator-name is a list of identifiers bound to the operators being entry points to the parser.

To understand the semantics of operators, let’s consider one accepting only the characters ‘#\X’ or ‘#\Y’ and rejecting the end–of-input:

(define (operator-1 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
           (error-form))
          ((char=? X ch)
           (a-clause-form))
          ((char=? Y ch)
           (another-clause-form))
          (else ;invalid input char
           (error-form)))))

such operator would be specified by the following ?operator symbolic subexpression:

(operator-1 (parser-state)
  ((#\X)
   (a-clause-form))
  ((#\Y)
   (another-clause-form)))

notice how the end–of–input test is automatically generated. The operator has some arguments representing the input device state and other arguments representing the parser state; the list of input device arguments comes first and is specified by the device logic, discussed later; the list of parser state arguments comes last and is specified in the ?operator symbolic expression.

An operator function accepting characters ‘#\X’, ‘#\Y’ or ‘#\Z’, with ‘#\Y’ and ‘#\Z’ to be processed in the same way, and rejecting the end–of-input looks like this:

(define (operator-2 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
           (error-form))
          ((char=? #\X ch)
           (a-clause-form))
          ((or (char=? #\Y ch)
               (char=? #\Z ch))
           (another-clause-form))
          (else ;invalid input char
           (error-form)))))

such operator would be specified by the following ?operator symbolic subexpression:

(operator-2 (parser-state)
  ((#\X)
   (a-clause-form))
  ((#\Y #\Z)
   (another-clause-form)))

An operator function accepting characters ‘#\X’ or ‘#\Y’, but also the end–of–input from the device, looks like this:

(define (operator-3 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
           (end-of-input-form))
          ((char=? #\X ch)
           (a-clause-form))
          ((char=? #\Y ch)
           (another-clause-form))
          (else ;invalid input char
           (error-form)))))

and is specified in the parser logic as the following ?operator symbolic subexpression:

(operator-3 (parser-state)
  ((:end-of-input)
   (end-of-input-form))
  ((#\X)
   (a-clause-form))
  ((#\Y)
   (another-clause-form)))

An operator function accepting characters ‘#\X’ or ‘#\Y’, the end–of–input from the device, and also a set of end–of–lexeme delimiter characters, looks like this:

(define (operator-4 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
           (end-of-input-form))
          ((char=? #\X ch)
           (a-clause-form))
          ((char=? #\Y ch)
           (another-clause-form))
          ((end-of-lexeme-delimiter? ch)
           (end-of-input-form))
          (else ;invalid input char
           (error-form)))))

notice how the end-of-input-form is used for both the proper end–of–input state and the end–of–lexeme state; such operator is specified in the parser logic as the following ?operator symbolic subexpression:

(operator-4 (parser-state)
  ((:end-of-input)
   (end-of-input-form))
  ((X)
   (a-clause-form))
  ((Y)
   (another-clause-form)))

notice that processing of the end–of–lexeme state is not specified in the parser logic: its generation is completely delegated to the device logic.

Sometimes it is useful to apply a test function or macro to an input character and collect the result for further processing; this can be done as follows:

(define (the-test ch arg1 arg2 arg3)
  ---)

(define (operator-5 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
           (error-form))
          ((the-test ch 1 2 3)
           => (lambda (result)
                (a-clause-form)))
          ((char=? #\Y ch)
           (another-clause-form))
          (else ;invalid input char
           (error-form)))))

(operator-5 (parser-state)
  ((the-test 1 2 3) => result
   (a-clause-form))
  ((#\Y)
   (another-clause-form)))

11.2 The logic of parser operators