Next: srfi regexps syntax nongreed, Previous: srfi regexps syntax named, Up: srfi regexps syntax [Index]
bos
eos
Matches at the beginning/end of string without consuming any characters (a zero–width assertion). If the search was initiated with start/end parameters, these are considered the end points, rather than the full string.
bol
eol
Matches at the beginning/end of a line without consuming any characters
(a zero–width assertion). A line is a possibly empty sequence of
characters followed by an end of line sequence as understood by the
R7RS read-line
procedure, specifically any of a linefeed
character, carriage return character, or a carriage return followed by a
linefeed character. The string is assumed to contain end of line
sequences before the start and after the end of the string, even if the
search was made on a substring and the actual surrounding characters
differ.
bow
eow
Matches at the beginning/end of a word without consuming any characters (a zero–width assertion). A word is a contiguous sequence of characters that are either alphanumeric or the underscore character, i.e. (or alphanumeric ‘_’), with the definition of alphanumeric depending on the Unicode or ASCII context. The string is assumed to contain non–word characters immediately before the start and after the end, even if the search was made on a substring and word constituent characters appear immediately before the beginning or after the end.
(regexp-search '(: bow "foo") "foo") ⇒ #<regexp-match> (regexp-search '(: bow "foo") "") ⇒ #<regexp-match> (regexp-search '(: bow "foo") "snafoo") ⇒ #f (regexp-search '(: "foo" eow) "foo") ⇒ #<regexp-match> (regexp-search '(: "foo" eow) "foo!") ⇒ #<regexp-match> (regexp-search '(: "foo" eow) "foobar") ⇒ #f
nwb
Matches a non–word–boundary (i.e. ‘\B’ in PCRE). Equivalent to ‘(neg-look-ahead (or bow eow))’.
(word sre ...)
Anchors a sequence to word boundaries. Equivalent to ‘(: bow sre ... eow)’.
(word+ cset-sre ...)
Matches a single word composed of characters in the intersection of the given cset-sre and the word constituent characters. Equivalent to:
(word (+ (and (or alphanumeric "_") (or cset-sre ...))))
word
A shorthand for ‘(word+ any)’.
bog
eog
Matches at the beginning/end of a single extended grapheme cluster without consuming any characters (a zero–width assertion). Grapheme cluster boundaries are defined in Unicode TR29. The string is assumed to contain non–combining code–points immediately before the start and after the end. These always succeed in an ASCII context.
grapheme
Matches a single grapheme cluster (i.e. ‘\X’ in PCRE). This is what the end–user typically thinks of as a single character, comprised of a base non–combining code–point followed by zero or more combining marks. In an ASCII context this is equivalent to any.
Assuming char-set:mark
contains all characters with the
‘Extend’ or ‘SpacingMark’ properties defined in TR29, and
char-set:control
, char-set:regional-indicator
and
char-set:hangul-*
are defined similarly, then the following SRE
can be used with regexp-extract
to define grapheme:
`(or (: (* ,char-set:hangul-l) (+ ,char-set:hangul-v) (* ,char-set:hangul-t)) (: (* ,char-set:hangul-l) ,char-set:hangul-v (* ,char-set:hangul-v) (* ,char-set:hangul-t)) (: (* ,char-set:hangul-l) ,char-set:hangul-lvt (* ,char-set:hangul-t)) (+ ,char-set:hangul-l) (+ ,char-set:hangul-t) (+ ,char-set:regional-indicator) (: "\r\n") (: (~ control ("\r\n")) (+ ,char-set:mark)) control)
Next: srfi regexps syntax nongreed, Previous: srfi regexps syntax named, Up: srfi regexps syntax [Index]