Next: silex utilities, Previous: silex semantics, Up: silex [Index]
SILex provides three different table encodings: the decision tree encoding, the portable encoding and the “compilation” to Scheme code; the decision tree is the default.
With the decision tree encoding, the finite automaton of the analyser is
represented with data structures holding integers representation of the
characters (in the sense of char->integer
). This representation
is the most compact, but it relies on the character integer
representations in R6RS Schemes.
With the portable encoding, the data structures describing the automaton contain characters directly. If the automaton, as generated, contains a transition from state s to state t on character c, then somewhere in the table there is the Scheme character ‘#\c’. When the file containing the analyser is loaded in any implementation, the character is read as is, and not as the number ‘(char->integer #\c)’.
This encoding should be portable to non–R6RS Schemes. However, it is less compact. This is because something like ‘(65 90)’ is more compact than something like ‘(#\A #\B … #\Y #\Z)’ to represent ‘[A-Z]’. The construction of an analyser from a portable table takes more time than the construction from a default table. But, once built, the performance of the analyser is the same in both cases.
It is important to note that in some character sets, the letters or the digits are not contiguous. So, in those cases, the regular expression ‘[A-Z]’ does not necessarily accept only the uppercase letters.
The last encoding is the compilation to Scheme code; it produces a fast lexical analyser. Instead of containing data structures representing the behavior of the automaton, the table contains Scheme code that “hard–codes” the automaton. This encoding often generates big tables. Such an analyser is not portable to non–R6RS Schemes.
Next: silex utilities, Previous: silex semantics, Up: silex [Index]