When searching for a pattern string in a text string:
(define pattern "ciao") (define text "Oh, ciao!")
we start looking for the first char in the pattern:
(define match-start (let loop ((i 0)) (if (char=? (string-ref text i) (string-ref pattern 0)) i (loop (+ 1 i))))) ⇒ 3
now that we know a match for the first char, we go on looking for the full pattern:
(define match-past (let loop ((pi 1) ;; pattern index (ti (+ 1 match-start))) ;; text index (cond ((= pi (string-length pattern)) pi) ;; found ((= ti (string-length text)) #f) ;; not found (else (if (char=? (string-ref ti text) (string-ref pi pattern)) (loop (+ 1 pi) (+ 1 ti)) #f))))) ;; not found
we have the following possible outcomes:
match-past
is true: We have found a match in the substring
[match-start, match-past)
of text
.
match-past
is false: We have to restart matching pattern
from character (+ 1 match-start)
of text
.
The action “restart matching” involves backtracking (to go back) in
the text
string; this can be avoided if we do the following
considerations.
(define pattern "ciao mamma") (define text "Oh, ciao papa e ciao mamma!") ;; 0123456789012345678901234567 ;; 0 1 2
we notice that the first char c
is never repeated in the pattern;
so after we have matched the first occurrence of "ciao "
in the
substring [4, 9) of text
, we do not need to restart from
index 5, we just try to match pattern
from index 10
of text
.
(define pattern "ciao ciao mamma") (define text "Oh, ciao ciao ciao mamma!") ;; 01234567890123456789012345 ;; 0 1 2
after we have matched the first occurrence of "ciao ciao "
in the
substring [4, 14) of text
, we find a c
rather than
a m
; we could restart matching pattern
from index
5, but we notice that we already have matched substring [0,
5) of pattern
in substring [9, 14) of text
; so we
can avoid backtracking and go on matching index 5 of
pattern
with index 14 of text
.
Summary: We can implement an efficient algorithm if we precompute where to restart from, after a partial match. Of course, since the algorithm needs no backtracking, we can also search for multiple patterns in “parallel”.