Next: , Previous: , Up: irregex sre   [Index]


50.9.3 Repetition patterns

There are several ways to control the number of times a pattern is matched. The simplest of these is ? which just optionally matches the pattern:

(irregex-search '(: "match" (? "es") "!") "matches!")
⇒ #<match>

(irregex-search '(: "match" (? "es") "!") "match!")
⇒ #<match>

(irregex-search '(: "match" (? "es") "!") "matche!")
⇒ #f

To optionally match any number of times we use *, the Kleene star:

(irregex-search '(: "<" (* (~ #\>)) ">") "<html>")
⇒ #<match>

(irregex-search '(: "<" (* (~ #\>)) ">") "<>")
⇒ #<match>

(irregex-search '(: "<" (* (~ #\>)) ">") "<html")
⇒ #f

Often we want to match any number of times, but at least one time is required, and for that we use +:

(irregex-search '(: "<" (+ (~ #\>)) ">") "<html>")
⇒ #<match>

(irregex-search '(: "<" (+ (~ #\>)) ">") "<a>")
⇒ #<match>

(irregex-search '(: "<" (+ (~ #\>)) ">") "<>")
⇒ #f

More generally, to match at least a given number of times, we use >=:

(irregex-search '(: "<" (>= 3 (~ #\>)) ">") "<table>")
⇒ #<match>

(irregex-search '(: "<" (>= 3 (~ #\>)) ">") "<pre>")
⇒ #<match>

(irregex-search '(: "<" (>= 3 (~ #\>)) ">") "<tr>")
⇒ #f

To match a specific number of times exactly we use =:

(irregex-search '(: "<" (= 4 (~ #\>)) ">") "<html>")
⇒ #<match>

(irregex-search '(: "<" (= 4 (~ #\>)) ">") "<table>")
⇒ #f

And finally, the most general form is ** which specifies a range of times to match. All of the earlier forms are special cases of this.

(irregex-search '(: (= 3 (** 1 3 numeric) ".")
                    (** 1 3 numeric))
                "192.168.1.10")
⇒ #<match>

(irregex-search '(: (= 3 (** 1 3 numeric) ".")
                    (** 1 3 numeric))
                "192.0168.1.10")
⇒ #f

There are also so–called “non-greedy” variants of these repetition operators, by convention suffixed with an additional ?. Since the normal repetition patterns can match any of the allotted repetition range, these operators will match a string if and only if the normal versions matched. However, when the endpoints of which submatch matched are taken into account (specifically, all matches when using irregex-search since the endpoints of the match itself matter), the use of a non–greedy repetition can change the result.

So, whereas ? can be thought to mean “match or don’t match”, ?? means “don’t match or match”. * typically consumes as much as possible, but *? tries first to match zero times, and only consumes one at a time if that fails. If we have a greedy operator followed by a non–greedy operator in the same pattern, they can produce surprising results as they compete to make the match longer or shorter. If this seems confusing, that’s because it is. Non–greedy repetitions are defined only in terms of the specific backtracking algorithm used to implement them, which for compatibility purposes always means the Perl algorithm. Thus, when using these patterns we force (vicare irregex) to use a backtracking engine, and can’t rely on efficient execution.


Next: , Previous: , Up: irregex sre   [Index]