Next: srfi regexps syntax basic, Up: srfi regexps syntax [Index]
The grammar for SREs is summarized below. Note that an SRE is a
first–class object consisting of nested lists of strings, chars,
char-set
s, symbols and numbers. Where the syntax is described
as ‘(foo bar)’, this can be constructed equivalently as
(quote (foo bar))
or (list 'foo 'bar)
, etc. The following
sections explain the semantics in greater detail.
<sre> ::= | <string> ; A literal string match. | <cset-sre> ; A character set match. | (* <sre> ...) ; 0 or more matches. | (zero-or-more <sre> ...) | (+ <sre> ...) ; 1 or more matches. | (one-or-more <sre> ...) | (? <sre> ...) ; 0 or 1 matches. | (optional <sre> ...) | (= <n> <sre> ...) ; <n> matches. | (exactly <n> <sre> ...) | (>= <n> <sre> ...) ; <n> or more matches. | (at-least <n> <sre> ...) | (** <n> <m> <sre> ...) ; <n> to <m> matches. | (repeated <n> <m> <sre> ...) | (| <sre> ...) ; Alternation. | (or <sre> ...) | (: <sre> ...) ; Sequence. | (seq <sre> ...) | ($ <sre> ...) ; Numbered submatch. | (submatch <sre> ...) | (-> <name> <sre> ...) ; Named submatch. <name> is | (submatch-named <name> <sre> ...) ; a symbol. | (w/case <sre> ...) ; Introduce a case-sensitive context. | (w/nocase <sre> ...) ; Introduce a case-insensitive context. | (w/unicode <sre> ...) ; Introduce a unicode context. | (w/ascii <sre> ...) ; Introduce an ascii context. | (w/nocapture <sre> ...) ; Ignore all enclosed submatches. | bos ; Beginning of string. | eos ; End of string. | bol ; Beginning of line. | eol ; End of line. | bog ; Beginning of grapheme cluster. | eog ; End of grapheme cluster. | grapheme ; A single grapheme cluster. | bow ; Beginning of word. | eow ; End of word. | nwb ; A non-word boundary. | (word <sre> ...) ; An SRE wrapped in word boundaries. | (word+ <cset-sre> ...) ; A single word restricted to a cset. | word ; A single word. | (?? <sre> ...) ; A non-greedy pattern, 0 or 1 match. | (non-greedy-optional <sre> ...) | (*? <sre> ...) ; Non-greedy 0 or more matches. | (non-greedy-zero-or-more <sre> ...) | (**? <m> <n> <sre> ...) ; Non-greedy <m> to <n> matches. | (non-greedy-repeated <sre> ...) | (look-ahead <sre> ...) ; Zero-width look-ahead assertion. | (look-behind <sre> ...) ; Zero-width look-behind assertion. | (neg-look-ahead <sre> ...) ; Zero-width negative look-ahead assertion. | (neg-look-behind <sre> ...) ; Zero-width negative look-behind assertion. | (backref <n-or-name>) ; Match a previous submatch.
The grammar for ‘cset-sre’ is as follows.
<cset-sre> ::= | <char> ; literal char | "<char>" ; string of one char | <char-set> ; embedded SRFI 14 char set | (<string>) ; literal char set | (char-set <string>) | (/ <range-spec> ...) ; ranges | (char-range <range-spec> ...) | (or <cset-sre> ...) ; union | (and <cset-sre> ...) ; intersection | (- <cset-sre> ...) ; difference | (- <difference> ...) | (~ <cset-sre> ...) ; complement of union | (complement <cset-sre> ...) | (w/case <cset-sre> ...) ; case and unicode toggling | (w/nocase <cset-sre> ...) | (w/ascii <cset-sre> ...) | (w/unicode <cset-sre> ...) | any | nonl | ascii | lower-case | lower | upper-case | upper | title-case | title | alphabetic | alpha | alphanumeric | alphanum | alnum | numeric | num | punctuation | punct | symbol | graphic | graph | whitespace | white | space | printing | print | control | cntrl | hex-digit | xdigit <range-spec> ::= <string> | <char>
Next: srfi regexps syntax basic, Up: srfi regexps syntax [Index]