Using this function is equivalent to apply %kmp-step
across the
selected subsequence of text in search of the selected subsequence
of pattern; the pattern is (vector-length rv)
characters
long.
This utility is designed to allow searching for occurrences of a fixed sequence that might extend across multiple buffers of data. Notice that, in this case, when the returned value is negative: It is the index in the last buffer, not in the whole text.
A simple one–shot search over a given string looks like the following:
(let* ((text "ciao hello salut") ;; 01234567890123456 ;; 0 1 (text-start 0) (text-past (string-length text)) (pattern "hello") (pattern-start 0) (pattern-past (string-length pattern)) (restart-vector (%kmp-make-restart-vector char=? string-ref pattern pattern-start pattern-past))) (let ((i (%kmp-partial-search char=? string-ref restart-vector pattern-start text text-start text-past pattern pattern-start))) (or (<= 0 i) ;; not found (- i)))) ;; found, return match past index ⇒ 10
if the pattern was not found: Return #f
. If the pattern was
found: Return the index in the string of the character past the one that
matched the end of the pattern.
Generalising this to a generic source of strings (represented by a list of strings):
(let* ((strings '("ciao h " "he hel h" "ell hel" "lo salut")) (end-of-data? (lambda () (null? strings))) (get-next-chunk (lambda () (begin0 (car strings) (set! strings (cdr strings)))))) (let* ((pattern "hello") (pattern-start 0) (pattern-past (string-length pattern)) (restart-vector (%kmp-make-restart-vector char=? string-ref pattern pattern-start pattern-past))) (let loop ((pi 0)) (and (not (end-of-data?)) ; not found (let* ((buf (get-next-chunk)) (pi (%kmp-partial-search char=? string-ref restart-vector pi buf 0 (string-length buf) pattern pattern-start))) (if (< pi 0) (cons buf (- pi)) ; found (loop pi))))))) ⇒ ("lo salut" . 2)
if the pattern was not found: Return #f
. If the pattern was
found: Return a cons whose car is the string chunk holding the end of
the text that matched, and whose cdr is the index in the chunk of the
character past the one that matched the end of the pattern.