Next: srfi regexps syntax, Previous: srfi regexps compatibility, Up: srfi regexps [Index]
Compiles a regexp if given an object whose structure matches the SRE syntax. This may be written as a literal or partial literal with quote or quasiquote, or may be generated entirely programmatically. Return re unmodified if it is already a regexp. Raise an error if re is neither a regexp nor a valid representation of an SRE.
Mutating re may invalidate the resulting regexp, causing unspecified results if subsequently used for matching.
Macro shorthand for (regexp `(: sre ...))
. May be able to
perform some or all computation at compile time if sre is not
unquoted.
NOTE Because of this equivalence with the procedural constructor
regexp
, the semantics of unquote differs from the original SCSH implementation in that unquoted expressions can expand into any object matching the SRE syntax, but not a compiled regexp object. Further,unquote
andunquote-splicing
both expand all matches.
RATIONALE Providing a procedural interface provides for greater flexibility, and without loss of potential compile–time optimizations by preserving the syntactic shorthand. The alternative is to rely on
eval
to dynamically generate regular expressions. However regexps in many cases come from untrusted sources, such as search parameters to a server, or from serialized sources such as config files or command–line arguments. Moreover many applications may want to keep many thousands of regexps in memory at once. Given the relatively heavy cost and insecurity ofeval
, and the frequency with which SREs are read and written as text, we prefer the procedural interface.
Return an SRE corresponding to the given regexp re. The SRE will
be equivalent to (will match the same strings) but not necessarily
equal?
to the SRE originally used to compile re. Mutating
the result may invalidate re, causing unspecified results if
subsequently used for matching.
Return an SRE corresponding to the given SRFI-14 character set. The resulting SRE expands the character set into notation which does not make use of embedded SRFI-14 character sets, and so is suitable for writing portably.
Return true if, and only if, obj can be safely passed to
regexp
.
Return true if, and only if, obj is a regexp.
Return a regexp-match
object if re successfully matches
the entire string str from start (inclusive) to end
(exclusive), or #f
is the match fails. The regexp-match
object will contain information needed to extract any submatches.
Return #t
if re matches str as in
regexp-matches
, or #f
otherwise. May be faster than
regexp-matches
since it doesn’t need to return submatch data.
Return a regexp-match
object if re successfully matches
a substring of str between start (inclusive) and end
(exclusive), or #f
if the match fails. The regexp-match
object will contain information needed to extract any submatches.
The fundamental regexp matching iterator. Repeatedly search str for the regexp re so long as a match can be found. On each successful match, applies:
(kons i regexp-match str acc)
where i is the index since the last match (beginning with
start), regexp-match is the resulting match, and acc
is the result of the previous kons application, beginning with
knil. When no more matches can be found, calls finish with
the same arguments, except that regexp-match is #f
.
By default finish just returns acc.
(regexp-fold 'word (lambda (i m str acc) (let ((s (regexp-match-submatch m 0))) (cond ((assoc s acc) => (lambda (x) (set-cdr! x (+ 1 (cdr x))) acc)) (else `((,s . 1) ,@acc))))) '() "to be or not to be") ⇒ (("not" . 1) ("or" . 1) ("be" . 2) ("to" . 2))
Extract all the non–empty substrings of str which match re between start and end as a list of strings.
(regexp-extract '(+ numeric) "192.168.0.1") ⇒ ("192" "168" "0" "1")
Split str into a list of strings separated by matches of re.
(regexp-split '(+ space) " fee fi fo\tfum\n") ⇒ ("fee" "fi" "fo" "fum")
Partition str into a list of non–empty strings matching re, interspersed with the unmatched portions of the string str. The first and every odd element is an unmatched substring, which will be the empty string if re matches at the beginning of the string or end of the previous match. The second and every even element will be a substring matching re. If the final match ends at the end of the string, no trailing empty string will be included. Thus, in the degenerate case where str is the empty string, the result is ‘("")’.
(regexp-partition '(+ (or space punct)) "") ⇒ ("") (regexp-partition '(+ (or space punct)) "Hello, world!\n") ⇒ ("Hello" ", " "world" "!\n") (regexp-partition '(+ (or space punct)) "¿Dónde Estás?") ⇒ ("" "¿" "Dónde" " " "Estás" "?")
Return a new string replacing the countth match of re in str with the subst, where the zero–indexed count defaults to zero (i.e. the first match). If there are not count matches, return the selected substring unmodified.
subst can be a string, an integer or symbol indicating the contents of a numbered or named submatch of re, ‘pre’ for the substring to the left of the match, or ‘post’ for the substring to the right of the match.
The optional parameters start and end restrict both the
matching and the substitution, to the given indices, such that the
result is equivalent to omitting these parameters and replacing on
(substring str start end)
. As a convenience, a
value of #f
for end is equivalent to (string-length
str)
.
(regexp-replace '(+ space) "one two three" "_") ⇒ "one_two three" (regexp-replace '(+ space) "one two three" "_" 0 #f 0) ⇒ "one_two three" (regexp-replace '(+ space) "one two three" "_" 0 #f 1) ⇒ "one two_three" (regexp-replace '(+ space) "one two three" "_" 0 #f 2) ⇒ "one two three"
Equivalent to regexp-replace
, but replaces all occurrences of
re in str.
(regexp-replace-all '(+ space) "one two three" "_") ⇒ "one_two_three"
Return true if, and only if, obj is a successful match from
regexp-matches
or regexp-search
.
(regexp-match? (regexp-matches "x" "x")) ⇒ #t (regexp-match? (regexp-matches "x" "y")) ⇒ #f
Return the number of submatches of regexp-match
, regardless of
whether they matched or not. Do not include the implicit zero full
match in the count.
(regexp-match-count (regexp-matches "x" "x")) ⇒ 0 (regexp-match-count (regexp-matches '($ "x") "x")) ⇒ 1
Return the substring matched in regexp-match corresponding to
field, either an integer or a symbol for a named submatch. Index
‘0’ refers to the entire match, index ‘1’ to the first
lexicographic submatch, and so on. If there are multiple submatches
with the same name, the first which matched is returned. If passed an
integer outside the range of matches, or a symbol which does not
correspond to a named submatch of the pattern, an error is raised. If
the corresponding submatch did not match, return #f
.
The result of extracting a submatch after the original matched string has been mutated is unspecified.
(regexp-match-submatch (regexp-search 'word "**foo**") 0) ⇒ "foo" (regexp-match-submatch (regexp-search '(: "*" ($ word) "*") "**foo**") 0) ⇒ "*foo*" (regexp-match-submatch (regexp-search '(: "*" ($ word) "*") "**foo**") 1) ⇒ "foo"
Return the start index in regexp-match corresponding to
field, as in regexp-match-submatch
.
(regexp-match-submatch-start (regexp-search 'word "**foo**") 0) ⇒ 2 (regexp-match-submatch-start (regexp-search '(: "*" ($ word) "*") "**foo**") 0) ⇒ 1 (regexp-match-submatch-start (regexp-search '(: "*" ($ word) "*") "**foo**") 1) ⇒ 2
Return the end index in regexp-match corresponding to field,
as in regexp-match-submatch
.
(regexp-match-submatch-end (regexp-search 'word "**foo**") 0) ⇒ 5 (regexp-match-submatch-end (regexp-search '(: "*" ($ word) "*") "**foo**") 0) ⇒ 6 (regexp-match-submatch-end (regexp-search '(: "*" ($ word) "*") "**foo**") 1) ⇒ 5
Return a list of all submatches in regexp-match as string or
#f
, beginning with the entire match ‘0’.
(regexp-match->list (regexp-search '(: ($ word) (+ (or space punct)) ($ word)) "cats & dogs")) ⇒ ("cats & dogs" "cats" "dogs")
Next: srfi regexps syntax, Previous: srfi regexps compatibility, Up: srfi regexps [Index]