Next: silex semantics rules, Up: silex semantics [Index]
The action of a rule is evaluated when the corresponding pattern is matched. The result of its evaluation is the result that the lexical analyser returns to its caller.
We can think of an action like this: it is a form which is placed in the
body of a lambda
function, which in turn is invoked when a token
matching the regular expression is found. So the following
specification:
decint [0-9]+ %% {decint} (string->number yytext)
will cause the following code to be put in the generated lexer tables:
(lambda (yytext) (string->number yytext))
arguments in the formals of the lambda
are local bindings we can
use in our actions. There are a few local bindings that are accessible
by the action when it is evaluated: yycontinue
, yygetc
,
yyungetc
, yytext
, yyline
, yycolumn
and
yyoffset
.
Contains the lexical analysis function itself. Use (yycontinue)
to ask for the next token. Typically, the action associated with a
pattern that matches white space is a call to yycontinue
; it has
the effect of skipping the white space.
Contain functions to get and unget characters from the input of the
analyser. They take no argument. yygetc
returns a character, or
the ‘(eof-object)’ value if the end–of–input is reached.
They should be used to read characters instead of accessing directly the input port because the analyser may have read more characters in order to have a look–ahead.
If we get more characters than we unget: those characters are skipped by the lexer function at the next invocation. If we want to perform a lookahead without loosing characters, we must unget all the characters we have got.
It is incorrect to try to unget more characters than has been gotten
since the parsing of the last token. If such an attempt is made,
yyungetc
silently refuses.
Bound to a string containing the lexeme. This string is guaranteed not to be mutated. The string is created only if the action seems to need it. The action is considered to need the lexeme when ‘yytext’ appears somewhere in the text of the action.
Indicate the position in the input at the beginning of the lexeme.
yyline
is the number of the line; the first line is numbered
1. yycolumn
is the number of the column; the first column
numbered 1.
It is important to mention that characters such as the tabulation
generate a variable length output when they are printed. So it would be
more accurate to say that yycolumn
is the index of the first
character of the lexeme, starting at the beginning of the line.
yyoffset
indicates the distance from the beginning of the input;
the first lexeme has offset 0.
The three bindings may not all be existent depending on options given to
the function lex
when generating the tables.
There is a default action that is provided for a rule when its action is omitted.
It is clearer (and normally more useful) to specify explicitly the action associated with each rule.
Next: silex semantics rules, Up: silex semantics [Index]