Next: srfi regexps procs, Previous: srfi regexps conventions, Up: srfi regexps [Index]
We specify a thorough, though not exhaustive, syntax with many extensions popular in modern regular expression libraries such as PCRE. This is because it is assumed in many cases said libraries will be used as the underlying implementation, the features will be desirable, and if left unspecified people will provide their own, often incompatible, extensions.
On the other hand it is acknowledged that not all implementations will be able to support all extensions. Some are difficult to implement for DFA implementations, and some, like ‘backref’, are prohibitively expensive for any implementation. Furthermore, even if an implementation has Unicode support, its regexp library may not.
To resolve these differences we divide the syntax into a minimal core
which all implementations are required to support, and additional
extensions. In R7RS or other implementations which support
SRFI-0 cond-expand
with library level features, the
availability can be tested with the following cond-expand
features:
regexp-non-greedy
The non–greedy repetition patterns ‘??’, ‘*?’, and ‘**?’ are supported.
regexp-look-around
The ‘[neg-]look-ahead’ and ‘[neg-]look-behind’ zero–width assertions are supported.
regexp-backrefs
The ‘backref’ pattern is supported.
regexp-unicode
Regexp character sets support Unicode.
The first three simply refer to support for certain SRE patterns.
‘regexp-unicode’ indicates support for Unicode contexts. Toggling between Unicode and ASCII can be done with the ‘w/unicode’ and ‘w/ascii’ patterns. In a Unicode context, the named character sets have their full Unicode definition as described below and grapheme boundaries are “extended grapheme clusters” as defined in UAX #29 (Unicode Text Segmentation). Implementations which provide this feature may still support non–Unicode characters.
Next: srfi regexps procs, Previous: srfi regexps conventions, Up: srfi regexps [Index]