Libraries for Vicare Scheme: srfi regexps syntax basic

2.39.7.2 Basic patterns

<string>

A literal string.

(regexp-search "needle" "hayneedlehay") ⇒ #<regexp-match>
(regexp-search "needle" "haynEEdlehay") ⇒ #f

(seq sre ...)

(: sre ...)

Sequencing. Matches if each of sre matches adjacently in order.

(regexp-search '(: "one" space "two" space "three")
               "one two three")
⇒ #<regexp-match>

(or sre ...)

(|\|| sre ...)

Alternation. Matches if any of sre match.

(regexp-search '(or "eeney" "meeney" "miney") "meeney")
⇒ #<regexp-match>
(regexp-search '(or "eeney" "meeney" "miney") "moe")
⇒ #f

NOTE The syntax ‘|\||’ is not supported by Vicare.

(w/nocase sre ...)

Enclosed sres are case–insensitive. In a Unicode context character and string literals match with the default simple Unicode case–insensitive matching. Implementations may, but are not required to, handle variable length case conversions, such as ‘#\x00DF’ matching the two characters ‘SS’.

Character sets match if any character in the set matches case–insensitively to the input. Conceptually each cset-sre is expanded to contain all case variants for all of its characters. In a compound cset-sre the expansion is applied at the terminals consisting of characters, strings, embedded SRFI-14 char-sets, and named character sets. For simple unions this would be equivalent to computing the full union first and then expanding case variants, but the semantics can differ when differences and intersections are applied. For example:

(w/nocase (~ ("Aab")))

is equivalent to:

(~ ("AaBb"))

for which ‘B’ is clearly not a member. However if you were to compute ‘(~ ("Aab"))’ first then you would have a char-set containing ‘B’, and after expanding case variants both ‘B’ and ‘b’ would be members.

In an ASCII context only the 52 ASCII letters ‘(/ "a-zA-Z")’ match case–insensitively to each other.

In a Unicode context the only named cset-sre which are affected by ‘w/nocase’ are upper and lower. Note that the case insensitive versions of these are not equivalent to letter as there are characters with the letter property but no case.

(regexp-search "needle" "haynEEdlehay")
⇒ #f
(regexp-search '(w/nocase "needle") "haynEEdlehay")
⇒ #<regexp-match>

(regexp-search '(~ ("Aab")) "B") ⇒ #<regexp-match>
(regexp-search '(~ ("Aab")) "b") ⇒ #f
(regexp-search '(w/nocase (~ ("Aab"))) "B") ⇒ #f
(regexp-search '(w/nocase (~ ("Aab"))) "b") ⇒ #f
(regexp-search '(~ (w/nocase ("Aab"))) "B") ⇒ #f
(regexp-search '(~ (w/nocase ("Aab"))) "b") ⇒ #f

(w/case sre ...)

Enclosed sres are case–sensitive. This is the default, and overrides any enclosing ‘w/nocase’ setting.

(regexp-search '(w/nocase "SMALL" (w/case "BIG"))
               "smallBIGsmall")
⇒ #<regexp-match>

(regexp-search '(w/nocase (~ (w/case ("Aab")))) "b")
⇒ #f

(w/ascii sre ...)

Enclosed sres are interpreted in an ASCII context. In practice many regular expressions are used for simple parsing and only ASCII characters are relevant. Switching to ASCII mode can improve performance in some implementations.

(regexp-search '(w/ascii bos (* alpha) eos) "English")
⇒ #<regexp-match>

(w/unicode sre ...)

Enclosed sres are interpreted in a Unicode context; character sets with both an ASCII and Unicode definition take the latter. Has no effect if the ‘regexp-unicode’ feature is not provided. This is the default.

(regexp-search '(w/unicode bos (* letter) eos) "English")
⇒ #<regexp-match>

(w/nocapture sre ...)

Disables capturing for all submatches (‘$’, ‘submatch’, ‘->’ and ‘submatch-named’) in the enclosed sres. The resulting SRE matches exactly the same strings, but without any associated submatch info. Useful for utility SREs which you want to incorporate without affecting your submatch positions.

(let ((number '($ (+ digit))))
  (cdr
   (regexp-match->list
    (regexp-search `(: ,number "-" ,number "-" ,number)
                   "555-867-5309")))
  ⇒ ("555" "867" "5309")

  (cdr
   (regexp-match->list
    (regexp-search `(: ,number "-" (w/nocapture ,number) "-" ,number)
                   "555-867-5309"))))
  ⇒ ("555" "5309")