Next: , Previous: , Up: srfi regexps   [Index]


2.39.5 Compatibility levels and features

We specify a thorough, though not exhaustive, syntax with many extensions popular in modern regular expression libraries such as PCRE. This is because it is assumed in many cases said libraries will be used as the underlying implementation, the features will be desirable, and if left unspecified people will provide their own, often incompatible, extensions.

On the other hand it is acknowledged that not all implementations will be able to support all extensions. Some are difficult to implement for DFA implementations, and some, like ‘backref’, are prohibitively expensive for any implementation. Furthermore, even if an implementation has Unicode support, its regexp library may not.

To resolve these differences we divide the syntax into a minimal core which all implementations are required to support, and additional extensions. In R7RS or other implementations which support SRFI-0 cond-expand with library level features, the availability can be tested with the following cond-expand features:

regexp-non-greedy

The non–greedy repetition patterns ‘??’, ‘*?’, and ‘**?’ are supported.

regexp-look-around

The ‘[neg-]look-ahead’ and ‘[neg-]look-behind’ zero–width assertions are supported.

regexp-backrefs

The ‘backref’ pattern is supported.

regexp-unicode

Regexp character sets support Unicode.

The first three simply refer to support for certain SRE patterns.

regexp-unicode’ indicates support for Unicode contexts. Toggling between Unicode and ASCII can be done with the ‘w/unicode’ and ‘w/ascii’ patterns. In a Unicode context, the named character sets have their full Unicode definition as described below and grapheme boundaries are “extended grapheme clusters” as defined in UAX #29 (Unicode Text Segmentation). Implementations which provide this feature may still support non–Unicode characters.


Next: , Previous: , Up: srfi regexps   [Index]