Libraries for Vicare Scheme: parser logic api

11.3 Programming interface to parser definition

The following bindings are exported by the library (vicare parser-logic).

Syntax: define-parser-logic ?definer ?ch ?next ?fail . ?operators

Auxiliary Syntax: :end-of-input

Define an abstract parser specifying the rules for parsing the input characters through calls to a set of operator functions; the result of the expansion is a syntax definition which can be used to instantiate a concrete parser by combining the parser logic with the input device logic.

The input arguments are:

?definer

It must be an identifier. It is bound to the generated syntax definition; such syntax is used as follows:

(?definer ?device-logic (?operator-name …))

where: ?device-logic is the identifier bound to the device logic syntax; the ?operator-name are identifiers among the public operator function names.

?ch

It must be an identifier. When a character is successfully extracted from the input device, it is bound to this identifier and made available to the operator clauses.

?next

It must be an identifier. The device logic rule :generate-end-of-input-or-char-tests must bind it to a syntax; such syntax must expand to a tail–call to an operator processing the next input character. ?next is used as follows in the operator clauses:

(next ?operator-name ?operator-arg …)

and it should expand to something like:

(?operator-name ?device-arg … ?operator-arg …)

where: ?device-arg are the arguments representing the input device state; ?operator-arg are the arguments representing the parser state as specified in the ?operator-spec.

?fail

It must be an identifier. The device logic rule :generate-end-of-input-or-char-tests must bind it to a syntax; such syntax is used to handle parsing errors detected by the operator clauses. ?fail is simply used as (?fail).

Each ?operator-spec must have the form:

(?operator-name (?operator-arg …) ?operator-clause …)

where:

?operator-name

Must be an identifier. It is bound to a generated operator function.

There is no difference in the way public operators and private ones are specified; the public operators names are listed in the concrete parser definition. An operator can be public in a concrete parser and private in another concrete parser.

?operator-arg

Must be identifiers bound to the formal arguments associated to the parser state.

?operator-clause

Are symbolic expressions specifying the input accepted by the operator.

Each ?operator-clause must have one of the formats:

((?char0 ?char …) ?body0 ?body …)

Each ?char must be an expression evaluating to a Scheme character object. The ?body forms are evaluated if the input character bound to ?ch is equal, according to char=?, to one among the ?char characters.

((?func ?expr …) => ?ret ?body0 ?body …)

?func must be an expression evaluating to a function; the ?expr must be expressions; ?ret must be an identifier. The ?body forms are evaluated if the form:

(?func ?ch ?expr …)

evaluates to a true value; such true value is bound to ?ret prior to evaluating the ?body.

((:end-of-input) ?body0 ?body …)

The ?body forms are evaluated if no more characters are available from the input device. This clause is to be used by operators accepting the end–of–input state as valid; if such rule is not present: the end–of–input will cause an error and the device logic is used to handle it.

Auxiliary Syntax: :introduce-device-arguments

Auxiliary Syntax: :generate-end-of-input-or-char-tests

Auxiliary Syntax: :unexpected-end-of-input

Auxiliary Syntax: :generate-delimiter-test

Auxiliary Syntax: :invalid-input-char

Identifiers used to specify device logic syntax rules; they must be used in a syntax definition like:

(define-syntax device-logic
  (syntax-rules (:introduce-device-arguments
                 :generate-end-of-input-or-char-tests
                 :unexpected-end-of-input
                 :generate-delimiter-test
                 :invalid-input-char)
    ((_ :introduce-device-arguments          ---) ---)
    ((_ :generate-end-of-input-or-char-tests ---) ---)
    ((_ :unexpected-end-of-input             ---) ---)
    ((_ :generate-delimiter-test             ---) ---)
    ((_ :invalid-input-char                  ---) ---)))

the rules have the following syntax:

:introduce-device-arguments

The input form is:

(_ :introduce-device-arguments ?kont . ?rest)

this rule introduces a list of identifiers used as device–specific arguments; they will be the first arguments for each parser operator function. The output form must be:

(?kont (?device-arg …) . ?rest)

where the ?device-arg are identifiers.

:generate-end-of-input-or-char-tests

The input form is:

(_ :generate-end-of-input-or-char-tests
   ?ch ?next ?fail
   (?device-arg …)
   ?end-of-input-kont ?parse-input-char-kont)

this rule is used to generate the input device tests for an operator function. The expanded code must first test for the end–of–input state and then proceed to evaluate code for the input character; in pseudocode the output form should be:

(if (end-of-input? ?device-arg ...)
    ?end-of-input-kont
  (let ((?ch (get-next-char ?device-arg ...)))
    ?parse-input-char-kont))

?ch is an identifier. The input character must be bound to it before evaluating ?parse-input-char-kont.

?next is an identifier. This rule must bind it to a syntax used to tail–call another operator using ?device-arg as first arguments; for example:

(define-syntax ?next
  (syntax-rules ()
    ((_ ?operator-name ?operator-arg ...)
     (?operator-name ?device-arg ... ?operator-arg))))

?fail is an identifier. This rule must bind it to a syntax used to signal an error detected by an operator clause; for example:

(define-syntax ?fail
  (syntax-rules ()
    ((_)
     (error #f "invalid input character"
       ?device-arg ...))))

The ?device-arg are the identifiers introduced by :introduce-device-arguments.

?end-of-input-kont is a form to be evaluated whenever the end–of–input is detected.

?parse-input-char-kont is a form to be evaluated whenever a character is extracted from the input device.

:unexpected-end-of-input

The input form is:

(_ :unexpected-end-of-input (?device-arg …))

whenever the end–of–input is found by an operator that does not accept it as valid, this rule is used to decide what to do.

The ?device-arg are the identifiers introduced by :introduce-device-arguments.

The output form can return a value or raise an exception; the returned value becomes the return value of the call to the parser.

:generate-delimiter-test

The input form is:

(_ :generate-delimiter-test
   ?ch
   ?ch-is-delimiter-kont
   ?ch-is-not-delimiter-kont)

this rule is used for input devices for which the lexeme string is embedded into a sequence of other characters, so there exists a set of characters that delimit the end–of–lexeme. The parser delegates to the device the responsibility of knowing which characters are delimiters, if any.

?ch is an identifier bound to the input character. ?ch-is-delimiter-kont is a form to be evaluated whenever ?ch is a delimiter character. ?ch-is-not-delimiter-kont is a form to be evaluated whenever ?ch is not a delimiter character.

For parsers accepting a full Scheme string as lexeme: there are no delimiters,3 the end–of–lexeme is the end–of–input; such parsers should just use ?ch-is-not-delimiter-kont as output form.

For parsers having delimiter characters, for example, recognised by a function like:

(define (delimiter? ch)
  (or (char=? ch #\space)
      (char=? ch #\linefeed)))

the output form should be something like:

(if (delimiter? ?ch)
    ?ch-is-delimiter-kont
  ?ch-is-not-delimiter-kont)

:invalid-input-char

The input form is:

(_ :invalid-input-char (?device-arg …) ?ch)

whenever an input character is not accepted by an operator function this rule is used to decide what to do.

The ?device-arg are the identifiers introduced by :introduce-device-arguments; ?ch is an identifier bound to the invalid input character.

The output form can return a value or raise an exception; the returned value becomes the return value of the call to the parser.

Syntax: string->token-or-false ?keyword ?arg ...

Define the device logic to parse a lexeme from a full Scheme string object as in string->number. It is implemented as follows:

(define-syntax string->token-or-false
  (syntax-rules (:introduce-device-arguments
                 :generate-end-of-input-or-char-tests
                 :unexpected-end-of-input
                 :generate-delimiter-test
                 :invalid-input-char)
    ((_ :introduce-device-arguments
        ?kont . ?rest)
     (?kont (input.string input.length input.index) . ?rest))

    ((_ :invalid-input-char
        (?input.string ?input.length ?input.index)
        ?ch)
     #f)

    ((_ :unexpected-end-of-input
        (?input.string ?input.length ?input.index))
     #f)

    ((_ :generate-delimiter-test
        ?ch ?ch-is-delimiter-kont ?ch-is-not-delimiter-kont)
     ?ch-is-not-delimiter-kont)

    ((_ :generate-end-of-input-or-char-tests
        ?ch ?next ?fail
        (?input.string ?input.length ?input.index)
        ?end-of-input-kont ?parse-input-char-kont)
     (let-syntax
         ((?fail (syntax-rules ()
                   ((_) #f)))
          (?next (syntax-rules ()
                   ((_ ?operator-name ?operator-arg (... ...))
                    (?operator-name ?input.string ?input.length
                                    (fx+ 1 ?input.index)
                                    ?operator-arg (... ...))))))
       (if (fx=? ?input.index ?input.length)
           ?end-of-input-kont
         (let ((?ch (string-ref ?input.string ?input.index)))
           ?parse-input-char-kont))))
    ))