When searching for a pattern string in a text string:
(define pattern "ciao") (define text "Oh, ciao!")
we start looking for the first char in the pattern:
(define match-start
(let loop ((i 0))
(if (char=? (string-ref text i)
(string-ref pattern 0))
i
(loop (+ 1 i)))))
⇒ 3
now that we know a match for the first char, we go on looking for the full pattern:
(define match-past
(let loop ((pi 1) ;; pattern index
(ti (+ 1 match-start))) ;; text index
(cond
((= pi (string-length pattern)) pi) ;; found
((= ti (string-length text)) #f) ;; not found
(else
(if (char=? (string-ref ti text)
(string-ref pi pattern))
(loop (+ 1 pi) (+ 1 ti))
#f))))) ;; not found
we have the following possible outcomes:
match-past is true: We have found a match in the substring
[match-start, match-past) of text.
match-past is false: We have to restart matching pattern
from character (+ 1 match-start) of text.
The action “restart matching” involves backtracking (to go back) in
the text string; this can be avoided if we do the following
considerations.
(define pattern "ciao mamma") (define text "Oh, ciao papa e ciao mamma!") ;; 0123456789012345678901234567 ;; 0 1 2
we notice that the first char c is never repeated in the pattern;
so after we have matched the first occurrence of "ciao " in the
substring [4, 9) of text, we do not need to restart from
index 5, we just try to match pattern from index 10
of text.
(define pattern "ciao ciao mamma") (define text "Oh, ciao ciao ciao mamma!") ;; 01234567890123456789012345 ;; 0 1 2
after we have matched the first occurrence of "ciao ciao " in the
substring [4, 14) of text, we find a c rather than
a m; we could restart matching pattern from index
5, but we notice that we already have matched substring [0,
5) of pattern in substring [9, 14) of text; so we
can avoid backtracking and go on matching index 5 of
pattern with index 14 of text.
Summary: We can implement an efficient algorithm if we precompute where to restart from, after a partial match. Of course, since the algorithm needs no backtracking, we can also search for multiple patterns in “parallel”.