Next: , Up: srfi regexps syntax   [Index]


2.39.7.1 Summary of SRE syntax

The grammar for SREs is summarized below. Note that an SRE is a first–class object consisting of nested lists of strings, chars, char-sets, symbols and numbers. Where the syntax is described as ‘(foo bar)’, this can be constructed equivalently as (quote (foo bar)) or (list 'foo 'bar), etc. The following sections explain the semantics in greater detail.

<sre> ::=
 | <string>                    ; A literal string match.
 | <cset-sre>                  ; A character set match.
 | (* <sre> ...)               ; 0 or more matches.
 | (zero-or-more <sre> ...)
 | (+ <sre> ...)               ; 1 or more matches.
 | (one-or-more <sre> ...)
 | (? <sre> ...)               ; 0 or 1 matches.
 | (optional <sre> ...)
 | (= <n> <sre> ...)           ; <n> matches.
 | (exactly <n> <sre> ...)
 | (>= <n> <sre> ...)          ; <n> or more matches.
 | (at-least <n> <sre> ...)
 | (** <n> <m> <sre> ...)      ; <n> to <m> matches.
 | (repeated <n> <m> <sre> ...)

 | (|  <sre> ...)              ; Alternation.
 | (or <sre> ...)

 | (:   <sre> ...)             ; Sequence.
 | (seq <sre> ...)
 | ($ <sre> ...)               ; Numbered submatch.
 | (submatch <sre> ...)
 | (-> <name> <sre> ...)               ;  Named submatch.  <name> is
 | (submatch-named <name> <sre> ...)   ;  a symbol.

 | (w/case   <sre> ...)        ; Introduce a case-sensitive context.
 | (w/nocase <sre> ...)        ; Introduce a case-insensitive context.

 | (w/unicode   <sre> ...)     ; Introduce a unicode context.
 | (w/ascii <sre> ...)         ; Introduce an ascii context.

 | (w/nocapture <sre> ...)     ; Ignore all enclosed submatches.

 | bos                         ; Beginning of string.
 | eos                         ; End of string.

 | bol                         ; Beginning of line.
 | eol                         ; End of line.

 | bog                         ; Beginning of grapheme cluster.
 | eog                         ; End of grapheme cluster.
 | grapheme                    ; A single grapheme cluster.

 | bow                         ; Beginning of word.
 | eow                         ; End of word.
 | nwb                         ; A non-word boundary.
 | (word <sre> ...)            ; An SRE wrapped in word boundaries.
 | (word+ <cset-sre> ...)      ; A single word restricted to a cset.
 | word                        ; A single word.

 | (?? <sre> ...)              ; A non-greedy pattern, 0 or 1 match.
 | (non-greedy-optional <sre> ...)
 | (*? <sre> ...)              ; Non-greedy 0 or more matches.
 | (non-greedy-zero-or-more <sre> ...)
 | (**? <m> <n> <sre> ...)     ; Non-greedy <m> to <n> matches.
 | (non-greedy-repeated <sre> ...)
 | (look-ahead <sre> ...)      ; Zero-width look-ahead assertion.
 | (look-behind <sre> ...)     ; Zero-width look-behind assertion.
 | (neg-look-ahead <sre> ...)  ; Zero-width negative look-ahead assertion.
 | (neg-look-behind <sre> ...) ; Zero-width negative look-behind assertion.

 | (backref <n-or-name>)       ; Match a previous submatch.

The grammar for ‘cset-sre’ is as follows.

<cset-sre> ::=
 | <char>                      ; literal char
 | "<char>"                    ; string of one char
 | <char-set>                  ; embedded SRFI 14 char set
 | (<string>)                  ; literal char set
 | (char-set <string>)
 | (/ <range-spec> ...)        ; ranges
 | (char-range <range-spec> ...)
 | (or <cset-sre> ...)         ; union
 | (and <cset-sre> ...)        ; intersection
 | (- <cset-sre> ...)          ; difference
 | (- <difference> ...)
 | (~ <cset-sre> ...)          ; complement of union
 | (complement <cset-sre> ...)
 | (w/case <cset-sre> ...)     ; case and unicode toggling
 | (w/nocase <cset-sre> ...)
 | (w/ascii <cset-sre> ...)
 | (w/unicode <cset-sre> ...)
 | any | nonl | ascii | lower-case | lower
 | upper-case | upper | title-case | title
 | alphabetic | alpha | alphanumeric | alphanum | alnum
 | numeric | num | punctuation | punct | symbol
 | graphic | graph | whitespace | white | space
 | printing | print | control | cntrl | hex-digit | xdigit

<range-spec> ::= <string> | <char>

Next: , Up: srfi regexps syntax   [Index]