Libraries for Vicare Scheme: silex format

53.7 Tables output format

SILex provides three different table encodings: the decision tree encoding, the portable encoding and the “compilation” to Scheme code; the decision tree is the default.

With the decision tree encoding, the finite automaton of the analyser is represented with data structures holding integers representation of the characters (in the sense of char->integer). This representation is the most compact, but it relies on the character integer representations in R6RS Schemes.

With the portable encoding, the data structures describing the automaton contain characters directly. If the automaton, as generated, contains a transition from state s to state t on character c, then somewhere in the table there is the Scheme character ‘#\c’. When the file containing the analyser is loaded in any implementation, the character is read as is, and not as the number ‘(char->integer #\c)’.

This encoding should be portable to non–R6RS Schemes. However, it is less compact. This is because something like ‘(65 90)’ is more compact than something like ‘(#\A #\B … #\Y #\Z)’ to represent ‘[A-Z]’. The construction of an analyser from a portable table takes more time than the construction from a default table. But, once built, the performance of the analyser is the same in both cases.

It is important to note that in some character sets, the letters or the digits are not contiguous. So, in those cases, the regular expression ‘[A-Z]’ does not necessarily accept only the uppercase letters.

The last encoding is the compilation to Scheme code; it produces a fast lexical analyser. Instead of containing data structures representing the behavior of the automaton, the table contains Scheme code that “hard–codes” the automaton. This encoding often generates big tables. Such an analyser is not portable to non–R6RS Schemes.