Next: , Previous: , Up: parser logic   [Index]

11.2 The logic of parser operators

After all the macros have been expanded, the parser is a set of operator functions extracting characters from an input device with the purpose of producing a token. Some operators are “entry points” to the parser: public functions we can call to start parsing; other operators are for internal use only. Each operator is meant to either: tail–call another operator, terminate parsing by raising an exception, terminate parsing by returning an error value, terminate parsing successfully by returning a token value.

NOTE Operator functions are just ordinary Scheme functions playing a special role in a parser; they are given a name with the only purpose of letting us talk about them, and it happens that such name is “operator”.

Operators are generated by macros from a symbolic expression specifying an abstract parser:

(define-parser-logic define-parser ch next fail . ?operators)

and containing a subexpression for each operator. Access to the input device is specified by another macro which must implement a set of syntax-rules:

(define-syntax device-logic
  (syntax-rules (:introduce-device-arguments
    ((_ :introduce-device-arguments          ---) ---)
    ((_ :generate-end-of-input-or-char-tests ---) ---)
    ((_ :unexpected-end-of-input             ---) ---)
    ((_ :generate-delimiter-test             ---) ---)
    ((_ :invalid-input-char                  ---) ---)))

Concrete parsers are defined by combining the parser logic with the device logic:

(define-parser device-logic (?operator-name ...))

we can define any number of concrete parsers using the same parser logic and different device logics; at the end of the expansion, the input device forms are hard coded into the operator. The list of ?operator-name is a list of identifiers bound to the operators being entry points to the parser.

To understand the semantics of operators, let’s consider one accepting only the characters ‘#\X’ or ‘#\Y’ and rejecting the end–of-input:

(define (operator-1 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
          ((char=? X ch)
          ((char=? Y ch)
          (else ;invalid input char

such operator would be specified by the following ?operator symbolic subexpression:

(operator-1 (parser-state)

notice how the end–of–input test is automatically generated. The operator has some arguments representing the input device state and other arguments representing the parser state; the list of input device arguments comes first and is specified by the device logic, discussed later; the list of parser state arguments comes last and is specified in the ?operator symbolic expression.

An operator function accepting characters ‘#\X’, ‘#\Y’ or ‘#\Z’, with ‘#\Y’ and ‘#\Z’ to be processed in the same way, and rejecting the end–of-input looks like this:

(define (operator-2 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
          ((char=? #\X ch)
          ((or (char=? #\Y ch)
               (char=? #\Z ch))
          (else ;invalid input char

such operator would be specified by the following ?operator symbolic subexpression:

(operator-2 (parser-state)
  ((#\Y #\Z)

An operator function accepting characters ‘#\X’ or ‘#\Y’, but also the end–of–input from the device, looks like this:

(define (operator-3 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
          ((char=? #\X ch)
          ((char=? #\Y ch)
          (else ;invalid input char

and is specified in the parser logic as the following ?operator symbolic subexpression:

(operator-3 (parser-state)

An operator function accepting characters ‘#\X’ or ‘#\Y’, the end–of–input from the device, and also a set of end–of–lexeme delimiter characters, looks like this:

(define (operator-4 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
          ((char=? #\X ch)
          ((char=? #\Y ch)
          ((end-of-lexeme-delimiter? ch)
          (else ;invalid input char

notice how the end-of-input-form is used for both the proper end–of–input state and the end–of–lexeme state; such operator is specified in the parser logic as the following ?operator symbolic subexpression:

(operator-4 (parser-state)

notice that processing of the end–of–lexeme state is not specified in the parser logic: its generation is completely delegated to the device logic.

Sometimes it is useful to apply a test function or macro to an input character and collect the result for further processing; this can be done as follows:

(define (the-test ch arg1 arg2 arg3)

(define (operator-5 input-device parser-state)
  (let ((ch (get-next-char)))
    (cond ((end-of-input? ch)
          ((the-test ch 1 2 3)
           => (lambda (result)
          ((char=? #\Y ch)
          (else ;invalid input char

and is specified in the parser logic as the symbolic subexpression:

(operator-5 (parser-state)
  ((the-test 1 2 3) => result

where => is the auxiliary syntax exported by (rnrs base (6)).

Next: , Previous: , Up: parser logic   [Index]