Next: irregex sre char-sets, Previous: irregex sre basic, Up: irregex sre [Index]
There are several ways to control the number of times a pattern is
matched. The simplest of these is ?
which just optionally
matches the pattern:
(irregex-search '(: "match" (? "es") "!") "matches!") ⇒ #<match> (irregex-search '(: "match" (? "es") "!") "match!") ⇒ #<match> (irregex-search '(: "match" (? "es") "!") "matche!") ⇒ #f
To optionally match any number of times we use *
, the Kleene
star:
(irregex-search '(: "<" (* (~ #\>)) ">") "<html>") ⇒ #<match> (irregex-search '(: "<" (* (~ #\>)) ">") "<>") ⇒ #<match> (irregex-search '(: "<" (* (~ #\>)) ">") "<html") ⇒ #f
Often we want to match any number of times, but at least one time is
required, and for that we use +
:
(irregex-search '(: "<" (+ (~ #\>)) ">") "<html>") ⇒ #<match> (irregex-search '(: "<" (+ (~ #\>)) ">") "<a>") ⇒ #<match> (irregex-search '(: "<" (+ (~ #\>)) ">") "<>") ⇒ #f
More generally, to match at least a given number of times, we use
>=
:
(irregex-search '(: "<" (>= 3 (~ #\>)) ">") "<table>") ⇒ #<match> (irregex-search '(: "<" (>= 3 (~ #\>)) ">") "<pre>") ⇒ #<match> (irregex-search '(: "<" (>= 3 (~ #\>)) ">") "<tr>") ⇒ #f
To match a specific number of times exactly we use =
:
(irregex-search '(: "<" (= 4 (~ #\>)) ">") "<html>") ⇒ #<match> (irregex-search '(: "<" (= 4 (~ #\>)) ">") "<table>") ⇒ #f
And finally, the most general form is **
which specifies a range
of times to match. All of the earlier forms are special cases of this.
(irregex-search '(: (= 3 (** 1 3 numeric) ".") (** 1 3 numeric)) "192.168.1.10") ⇒ #<match> (irregex-search '(: (= 3 (** 1 3 numeric) ".") (** 1 3 numeric)) "192.0168.1.10") ⇒ #f
There are also so–called “non-greedy” variants of these repetition
operators, by convention suffixed with an additional ?
. Since
the normal repetition patterns can match any of the allotted repetition
range, these operators will match a string if and only if the normal
versions matched. However, when the endpoints of which submatch matched
are taken into account (specifically, all matches when using
irregex-search
since the endpoints of the match itself matter),
the use of a non–greedy repetition can change the result.
So, whereas ?
can be thought to mean “match or don’t match”,
??
means “don’t match or match”. *
typically consumes
as much as possible, but *?
tries first to match zero times, and
only consumes one at a time if that fails. If we have a greedy operator
followed by a non–greedy operator in the same pattern, they can produce
surprising results as they compete to make the match longer or shorter.
If this seems confusing, that’s because it is. Non–greedy repetitions
are defined only in terms of the specific backtracking algorithm used to
implement them, which for compatibility purposes always means the Perl
algorithm. Thus, when using these patterns we force (vicare
irregex)
to use a backtracking engine, and can’t rely on efficient
execution.
Next: irregex sre char-sets, Previous: irregex sre basic, Up: irregex sre [Index]