Next: , Up: silex   [Index]


53.1 A lexer example for a calculator

The following is a lexer specification file that can be used to tokenise a mathematical expression.

blanks          [ \9\10\13]+

decint          [0-9]+
binint          #[bB][01]+
octint          #[oO][0-7]+
hexint          #[xX][0-9A-Fa-f]+
integer         {decint}|{binint}|{octint}|{hexint}

exponent        ([eE][+\-]?[0-9]+)
truereal        [0-9]+\.|[0-9]*\.[0-9]+{exponent}?|[0-9]+{exponent}
real            {truereal}|{integer}

imag            ({decint}|{real})i

nan             \-nan\.0|\+nan\.0|nan\.0
pinf            \+inf\.0|inf\.0
minf            \-inf\.0

initial         [a-zA-Z_]
subsequent      {initial}|[0-9\.!$&:<=>?~\-]
symbol          {initial}{subsequent}*

operator        <=|>=|//|[\+\-*/%\^<>=]

comma           ,

oparen          \(
cparen          \)

%%
{blanks}        ;; skip blanks, tabs and newlines
{imag}          (string->number (string-append "+" yytext))
{real}          (string->number yytext)
{nan}           +nan.0
{pinf}          +inf.0
{minf}          -inf.0
{operator}      (case (string->symbol yytext)
                    ((+) '+)
                    ((-) '-)
                    ((*) '*)
                    ((/) '/)
                    ((%) 'mod)
                    ((^) 'expt)
                    ((//) 'div)
                    ((=) '=)
                    ((<) '<)
                    ((>) '>)
                    ((<=) '<=)
                    ((>=) '>=))
{symbol}        (string->symbol yytext)
{comma}         'cons

{oparen}        #\(
{cparen}        #\)

<<EOF>>         (eof-object)
<<ERROR>>       (assertion-violation #f "invalid lexer token")

Let’s say the file is called calc.l, then the table for this lexer can be created with one of the following forms (and other forms not described here):

(import (vicare)
  (prefix (vicare parser-tools silex)       lex.)
  (prefix (vicare parser-tools silex lexer) lex.))

;;Generate a proper Scheme library called "(calc)",
;;containing the table definition, and save it in the
;;file "calc-lib.sls".  Use the default table format.
;;The library exports the table bound to "calc-table".
;;
(lex.lex (lex.input-file:   "calc.l")
         (lex.output-file:  "calc-lib.sls")
         (lex.library-spec: "(calc)")
         (lex.table-name:   'calc-table))

;;Generate a standalone DEFINE form that binds the
;;lexer table to the symbol "calc-table" and save it
;;in the file "calc-def.sls".  Use the Scheme code
;;table format.
;;
(lex.lex (lex.input-file:   "calc.l")
         (lex.output-file:  "calc-def.sls")
         (lex.lexer-format: 'code)
         (lex.table-name:   'calc-table))

;;Generate the lexer table, evaluate it and return it
;;as value immediately usable.  Use the Scheme code
;;table format.
;;
(define calc-table
  (lex.lex (lex.input-file:   "calc.l")
           (lex.output-value: #t)
           (lex.lexer-format: 'code)))

Once we have created the lexer table, let’s say bound to ‘calc-table’, we can use it as follows; we take advantage of the fact that: when the input reaches the end, the lexer closure returns the ‘(eof-object)’ value.

(define (tokenize table string)
  (let* ((IS    (lex.make-IS (lex.string: string)))
         (lexer (lex.make-lexer table IS))
    (do ((token (lexer) (lexer))
         (out   '()))
        ((eof-object? token)
         (reverse out))
      (set-cons! out token)))))

(tokenize calc-table "1*(2/3)")
⇒ (1 * #\( 2 / 3 #\))

(tokenize calc-table "fun(1+a, sin(2), 3, 4)")
⇒ (fun #\( 1 + a cons sin #\( 2 #\) cons 3 cons 4 #\))

Next: , Up: silex   [Index]