Next: pregexp syntax look, Previous: pregexp syntax alternation, Up: pregexp syntax [Index]
We’ve already seen that greedy quantifiers match the maximal number of times, but the overriding priority is that the overall match succeed. Consider:
(pregexp-match "a*a" "aaaa")
The regexp consists of two subregexps, a*
followed by a
.
The subregexp a*
cannot be allowed to match all four a
’s
in the text string aaaa
, even though *
is a greedy
quantifier. It may match only the first three, leaving the last one for
the second subregexp. This ensures that the full regexp matches
successfully.
The regexp matcher accomplishes this via a process called backtracking.
The matcher tentatively allows the greedy quantifier to match all four
a
’s, but then when it becomes clear that the overall match is in
jeopardy, it backtracks to a less greedy match of three a
’s. If
even this fails, as in the call:
(pregexp-match "a*aa" "aaaa")
the matcher backtracks even further. Overall failure is conceded only when all possible backtracking has been tried with no success.
Backtracking is not restricted to greedy quantifiers. Nongreedy quantifiers match as few instances as possible, and progressively backtrack to more and more instances in order to attain an overall match. There is backtracking in alternation too, as the more rightward alternates are tried when locally successful leftward ones fail to yield an overall match.
Sometimes it is efficient to disable backtracking. For example, we may
wish to commit to a choice, or we know that trying alternatives is
fruitless. A nonbacktracking regexp is enclosed in (?>...)
.
(pregexp-match "(?>a+)." "aaaa") ⇒ #f
In this call, the subregexp ?>a+
greedily matches all four
a
’s, and is denied the opportunity to backpedal. So the overall
match is denied. The effect of the regexp is therefore to match one or
more a
’s followed by something that is definitely non–a
.
Next: pregexp syntax look, Previous: pregexp syntax alternation, Up: pregexp syntax [Index]