Next: parser logic api, Previous: parser logic intro, Up: parser logic [Index]
After all the macros have been expanded, the parser is a set of operator functions extracting characters from an input device with the purpose of producing a token. Some operators are “entry points” to the parser: public functions we can call to start parsing; other operators are for internal use only. Each operator is meant to either: tail–call another operator, terminate parsing by raising an exception, terminate parsing by returning an error value, terminate parsing successfully by returning a token value.
NOTE Operator functions are just ordinary Scheme functions playing a special role in a parser; they are given a name with the only purpose of letting us talk about them, and it happens that such name is “operator”.
Operators are generated by macros from a symbolic expression specifying an abstract parser:
(define-parser-logic define-parser ch next fail . ?operators)
and containing a subexpression for each operator. Access to the input
device is specified by another macro which must implement a set of
syntax-rules:
(define-syntax device-logic
(syntax-rules (:introduce-device-arguments
:generate-end-of-input-or-char-tests
:unexpected-end-of-input
:generate-delimiter-test
:invalid-input-char)
((_ :introduce-device-arguments ---) ---)
((_ :generate-end-of-input-or-char-tests ---) ---)
((_ :unexpected-end-of-input ---) ---)
((_ :generate-delimiter-test ---) ---)
((_ :invalid-input-char ---) ---)))
Concrete parsers are defined by combining the parser logic with the device logic:
(define-parser device-logic (?operator-name ...))
we can define any number of concrete parsers using the same parser logic and different device logics; at the end of the expansion, the input device forms are hard coded into the operator. The list of ?operator-name is a list of identifiers bound to the operators being entry points to the parser.
To understand the semantics of operators, let’s consider one accepting only the characters ‘#\X’ or ‘#\Y’ and rejecting the end–of-input:
(define (operator-1 input-device parser-state)
(let ((ch (get-next-char)))
(cond ((end-of-input? ch)
(error-form))
((char=? X ch)
(a-clause-form))
((char=? Y ch)
(another-clause-form))
(else ;invalid input char
(error-form)))))
such operator would be specified by the following ?operator symbolic subexpression:
(operator-1 (parser-state) ((#\X) (a-clause-form)) ((#\Y) (another-clause-form)))
notice how the end–of–input test is automatically generated. The operator has some arguments representing the input device state and other arguments representing the parser state; the list of input device arguments comes first and is specified by the device logic, discussed later; the list of parser state arguments comes last and is specified in the ?operator symbolic expression.
An operator function accepting characters ‘#\X’, ‘#\Y’ or ‘#\Z’, with ‘#\Y’ and ‘#\Z’ to be processed in the same way, and rejecting the end–of-input looks like this:
(define (operator-2 input-device parser-state)
(let ((ch (get-next-char)))
(cond ((end-of-input? ch)
(error-form))
((char=? #\X ch)
(a-clause-form))
((or (char=? #\Y ch)
(char=? #\Z ch))
(another-clause-form))
(else ;invalid input char
(error-form)))))
such operator would be specified by the following ?operator symbolic subexpression:
(operator-2 (parser-state) ((#\X) (a-clause-form)) ((#\Y #\Z) (another-clause-form)))
An operator function accepting characters ‘#\X’ or ‘#\Y’, but also the end–of–input from the device, looks like this:
(define (operator-3 input-device parser-state)
(let ((ch (get-next-char)))
(cond ((end-of-input? ch)
(end-of-input-form))
((char=? #\X ch)
(a-clause-form))
((char=? #\Y ch)
(another-clause-form))
(else ;invalid input char
(error-form)))))
and is specified in the parser logic as the following ?operator symbolic subexpression:
(operator-3 (parser-state) ((:end-of-input) (end-of-input-form)) ((#\X) (a-clause-form)) ((#\Y) (another-clause-form)))
An operator function accepting characters ‘#\X’ or ‘#\Y’, the end–of–input from the device, and also a set of end–of–lexeme delimiter characters, looks like this:
(define (operator-4 input-device parser-state)
(let ((ch (get-next-char)))
(cond ((end-of-input? ch)
(end-of-input-form))
((char=? #\X ch)
(a-clause-form))
((char=? #\Y ch)
(another-clause-form))
((end-of-lexeme-delimiter? ch)
(end-of-input-form))
(else ;invalid input char
(error-form)))))
notice how the end-of-input-form is used for both the proper
end–of–input state and the end–of–lexeme state; such operator is
specified in the parser logic as the following ?operator symbolic
subexpression:
(operator-4 (parser-state) ((:end-of-input) (end-of-input-form)) ((X) (a-clause-form)) ((Y) (another-clause-form)))
notice that processing of the end–of–lexeme state is not specified in the parser logic: its generation is completely delegated to the device logic.
Sometimes it is useful to apply a test function or macro to an input character and collect the result for further processing; this can be done as follows:
(define (the-test ch arg1 arg2 arg3)
---)
(define (operator-5 input-device parser-state)
(let ((ch (get-next-char)))
(cond ((end-of-input? ch)
(error-form))
((the-test ch 1 2 3)
=> (lambda (result)
(a-clause-form)))
((char=? #\Y ch)
(another-clause-form))
(else ;invalid input char
(error-form)))))
and is specified in the parser logic as the symbolic subexpression:
(operator-5 (parser-state) ((the-test 1 2 3) => result (a-clause-form)) ((#\Y) (another-clause-form)))
where => is the auxiliary syntax exported by (rnrs base (6)).
Next: parser logic api, Previous: parser logic intro, Up: parser logic [Index]