Next: parser logic api, Previous: parser logic intro, Up: parser logic [Index]
After all the macros have been expanded, the parser is a set of operator functions extracting characters from an input device with the purpose of producing a token. Some operators are “entry points” to the parser: public functions we can call to start parsing; other operators are for internal use only. Each operator is meant to either: tail–call another operator, terminate parsing by raising an exception, terminate parsing by returning an error value, terminate parsing successfully by returning a token value.
NOTE Operator functions are just ordinary Scheme functions playing a special role in a parser; they are given a name with the only purpose of letting us talk about them, and it happens that such name is “operator”.
Operators are generated by macros from a symbolic expression specifying an abstract parser:
(define-parser-logic define-parser ch next fail . ?operators)
and containing a subexpression for each operator. Access to the input
device is specified by another macro which must implement a set of
syntax-rules
:
(define-syntax device-logic (syntax-rules (:introduce-device-arguments :generate-end-of-input-or-char-tests :unexpected-end-of-input :generate-delimiter-test :invalid-input-char) ((_ :introduce-device-arguments ---) ---) ((_ :generate-end-of-input-or-char-tests ---) ---) ((_ :unexpected-end-of-input ---) ---) ((_ :generate-delimiter-test ---) ---) ((_ :invalid-input-char ---) ---)))
Concrete parsers are defined by combining the parser logic with the device logic:
(define-parser device-logic (?operator-name ...))
we can define any number of concrete parsers using the same parser logic and different device logics; at the end of the expansion, the input device forms are hard coded into the operator. The list of ?operator-name is a list of identifiers bound to the operators being entry points to the parser.
To understand the semantics of operators, let’s consider one accepting only the characters ‘#\X’ or ‘#\Y’ and rejecting the end–of-input:
(define (operator-1 input-device parser-state) (let ((ch (get-next-char))) (cond ((end-of-input? ch) (error-form)) ((char=? X ch) (a-clause-form)) ((char=? Y ch) (another-clause-form)) (else ;invalid input char (error-form)))))
such operator would be specified by the following ?operator symbolic subexpression:
(operator-1 (parser-state) ((#\X) (a-clause-form)) ((#\Y) (another-clause-form)))
notice how the end–of–input test is automatically generated. The operator has some arguments representing the input device state and other arguments representing the parser state; the list of input device arguments comes first and is specified by the device logic, discussed later; the list of parser state arguments comes last and is specified in the ?operator symbolic expression.
An operator function accepting characters ‘#\X’, ‘#\Y’ or ‘#\Z’, with ‘#\Y’ and ‘#\Z’ to be processed in the same way, and rejecting the end–of-input looks like this:
(define (operator-2 input-device parser-state) (let ((ch (get-next-char))) (cond ((end-of-input? ch) (error-form)) ((char=? #\X ch) (a-clause-form)) ((or (char=? #\Y ch) (char=? #\Z ch)) (another-clause-form)) (else ;invalid input char (error-form)))))
such operator would be specified by the following ?operator symbolic subexpression:
(operator-2 (parser-state) ((#\X) (a-clause-form)) ((#\Y #\Z) (another-clause-form)))
An operator function accepting characters ‘#\X’ or ‘#\Y’, but also the end–of–input from the device, looks like this:
(define (operator-3 input-device parser-state) (let ((ch (get-next-char))) (cond ((end-of-input? ch) (end-of-input-form)) ((char=? #\X ch) (a-clause-form)) ((char=? #\Y ch) (another-clause-form)) (else ;invalid input char (error-form)))))
and is specified in the parser logic as the following ?operator symbolic subexpression:
(operator-3 (parser-state) ((:end-of-input) (end-of-input-form)) ((#\X) (a-clause-form)) ((#\Y) (another-clause-form)))
An operator function accepting characters ‘#\X’ or ‘#\Y’, the end–of–input from the device, and also a set of end–of–lexeme delimiter characters, looks like this:
(define (operator-4 input-device parser-state) (let ((ch (get-next-char))) (cond ((end-of-input? ch) (end-of-input-form)) ((char=? #\X ch) (a-clause-form)) ((char=? #\Y ch) (another-clause-form)) ((end-of-lexeme-delimiter? ch) (end-of-input-form)) (else ;invalid input char (error-form)))))
notice how the end-of-input-form
is used for both the proper
end–of–input state and the end–of–lexeme state; such operator is
specified in the parser logic as the following ?operator symbolic
subexpression:
(operator-4 (parser-state) ((:end-of-input) (end-of-input-form)) ((X) (a-clause-form)) ((Y) (another-clause-form)))
notice that processing of the end–of–lexeme state is not specified in the parser logic: its generation is completely delegated to the device logic.
Sometimes it is useful to apply a test function or macro to an input character and collect the result for further processing; this can be done as follows:
(define (the-test ch arg1 arg2 arg3) ---) (define (operator-5 input-device parser-state) (let ((ch (get-next-char))) (cond ((end-of-input? ch) (error-form)) ((the-test ch 1 2 3) => (lambda (result) (a-clause-form))) ((char=? #\Y ch) (another-clause-form)) (else ;invalid input char (error-form)))))
and is specified in the parser logic as the symbolic subexpression:
(operator-5 (parser-state) ((the-test 1 2 3) => result (a-clause-form)) ((#\Y) (another-clause-form)))
where =>
is the auxiliary syntax exported by (rnrs base (6))
.
Next: parser logic api, Previous: parser logic intro, Up: parser logic [Index]