Libraries for Vicare Scheme: srfi regexps compatibility

2.39.5 Compatibility levels and features

We specify a thorough, though not exhaustive, syntax with many extensions popular in modern regular expression libraries such as PCRE. This is because it is assumed in many cases said libraries will be used as the underlying implementation, the features will be desirable, and if left unspecified people will provide their own, often incompatible, extensions.

On the other hand it is acknowledged that not all implementations will be able to support all extensions. Some are difficult to implement for DFA implementations, and some, like ‘backref’, are prohibitively expensive for any implementation. Furthermore, even if an implementation has Unicode support, its regexp library may not.

To resolve these differences we divide the syntax into a minimal core which all implementations are required to support, and additional extensions. In R7RS or other implementations which support SRFI-0 cond-expand with library level features, the availability can be tested with the following cond-expand features:

regexp-non-greedy: The non–greedy repetition patterns ‘??’, ‘*?’, and ‘**?’ are supported.
regexp-look-around: The ‘[neg-]look-ahead’ and ‘[neg-]look-behind’ zero–width assertions are supported.
regexp-backrefs: The ‘backref’ pattern is supported.
regexp-unicode: Regexp character sets support Unicode.

The first three simply refer to support for certain SRE patterns.

‘regexp-unicode’ indicates support for Unicode contexts. Toggling between Unicode and ASCII can be done with the ‘w/unicode’ and ‘w/ascii’ patterns. In a Unicode context, the named character sets have their full Unicode definition as described below and grapheme boundaries are “extended grapheme clusters” as defined in UAX #29 (Unicode Text Segmentation). Implementations which provide this feature may still support non–Unicode characters.