Next: , Previous: , Up: kmp   [Index]


29.5 Partial search

Function: %kmp-partial-search item= item-ref restart-vector next-index-in-pattern text text-start text-end pattern pattern-start

Using this function is equivalent to apply %kmp-step across the selected subsequence of text in search of the selected subsequence of pattern; the pattern is (vector-length rv) characters long.

This utility is designed to allow searching for occurrences of a fixed sequence that might extend across multiple buffers of data. Notice that, in this case, when the returned value is negative: It is the index in the last buffer, not in the whole text.

A simple one–shot search over a given string looks like the following:

(let* ((text            "ciao hello salut")
       ;;                01234567890123456
       ;;                0         1
       (text-start      0)
       (text-past       (string-length text))

       (pattern         "hello")
       (pattern-start   0)
       (pattern-past    (string-length pattern))

       (restart-vector  (%kmp-make-restart-vector
                           char=? string-ref
                           pattern pattern-start pattern-past)))
  (let ((i (%kmp-partial-search
              char=? string-ref restart-vector pattern-start
              text text-start text-past
              pattern pattern-start)))
    (or (<= 0 i) ;; not found
        (- i)))) ;; found, return match past index
⇒ 10

if the pattern was not found: Return #f. If the pattern was found: Return the index in the string of the character past the one that matched the end of the pattern.

Generalising this to a generic source of strings (represented by a list of strings):

(let* ((strings         '("ciao h " "he hel h"
                          "ell hel" "lo salut"))
       (end-of-data?    (lambda ()
                          (null? strings)))
       (get-next-chunk  (lambda ()
                          (begin0
                              (car strings)
                            (set! strings (cdr strings))))))

  (let* ((pattern         "hello")
         (pattern-start   0)
         (pattern-past    (string-length pattern))

         (restart-vector  (%kmp-make-restart-vector char=? string-ref
                             pattern pattern-start pattern-past)))

    (let loop ((pi 0))
      (and (not (end-of-data?))          ; not found
           (let* ((buf (get-next-chunk))
                  (pi  (%kmp-partial-search
                          char=? string-ref
                          restart-vector pi
                          buf 0 (string-length buf)
                          pattern pattern-start)))
             (if (< pi 0)
                 (cons buf (- pi)) ; found
               (loop pi)))))))
⇒ ("lo salut" . 2)

if the pattern was not found: Return #f. If the pattern was found: Return a cons whose car is the string chunk holding the end of the text that matched, and whose cdr is the index in the chunk of the character past the one that matched the end of the pattern.


Next: , Previous: , Up: kmp   [Index]