Next: , Previous: , Up: pregexp   [Index]


51.2 Interface procedures

Function: pregexp urex

Takes a U-regexp, which is a string, and returns an S-regexp, which is a tree.

(pregexp "c.r")
⇒ (:sub (:or (:seq #\c :any #\r)))

There is rarely any need to look at the S-regexps returned by pregexp.

Function: pregexp-match-positions rex str
Function: pregexp-match-positions rex str start
Function: pregexp-match-positions rex str start past

Take a regexp pattern, either a U- or an S-regexp, and a text string, and return a match if the regexp matches (some part of) the text string.

Return #f if the regexp did not match the string; and a list of index pairs if it did match.

(pregexp-match-positions "brain" "bird")
⇒ #f

(pregexp-match-positions "needle" "hay needle stack")
⇒ ((4 . 10))

In the second example, the integers 4 and 10 identify the substring that was matched. 4 is the starting (inclusive) index and 10 the ending (exclusive) index of the matching substring.

(substring "hay needle stack" 4 10)
⇒ "needle"

Here, pregexp-match-positions’s return list contains only one index pair, and that pair represents the entire substring matched by the regexp. When we discuss subpatterns later, we will see how a single match operation can yield a list of submatches.

pregexp-match-positions takes optional third and fourth arguments that specify the indices of the text string within which the matching should take place.

(pregexp-match-positions "needle"
  "his hay needle stack -- my hay needle stack -- her hay needle stack"
  24 43)
⇒ ((31 . 37))

Note that the returned indices are still reckoned relative to the full text string.

Function: pregexp-match rex str
Function: pregexp-match rex str start
Function: pregexp-match rex str start past

Like pregexp-match-positions but instead of returning index pairs it returns the matching substrings:

(pregexp-match "brain" "bird")
⇒ #f

(pregexp-match "needle" "hay needle stack")
⇒ ("needle")
Function: pregexp-split rex str

Takes a regexp pattern and a text string, and return a list of substrings of the text string, where the pattern identifies the delimiter separating the substrings.

(pregexp-split ":"
   "/bin:/usr/bin:/usr/bin/X11:/usr/local/bin")
⇒ ("/bin" "/usr/bin" "/usr/bin/X11" "/usr/local/bin")

(pregexp-split " " "pea soup")
⇒ ("pea" "soup")

If the first argument can match an empty string, then the list of all the single–character substrings is returned.

(pregexp-split "" "smithereens")
⇒ ("s" "m" "i" "t" "h" "e" "r" "e" "e" "n" "s")

To identify one–or–more spaces as the delimiter, take care to use the regexp " +", not " *".

(pregexp-split " +" "split pea     soup")
⇒ ("split" "pea" "soup")

(pregexp-split " *" "split pea     soup")
⇒ ("s" "p" "l" "i" "t" "p" "e" "a" "s" "o" "u" "p")
Function: pregexp-replace rex str replacement

Replace the matched portion of the text string by another string. The first argument is the pattern, the second the text string, and the third is the insert string (string to be inserted).

(pregexp-replace "te" "liberte" "ty")
⇒ "liberty"

If the pattern doesn’t occur in the text string, the returned string is identical (eq?) to the text string.

Function: pregexp-replace* rex str replacement

Replace all matches in the text string by the insert string:

(pregexp-replace* "te" "liberte egalite fraternite" "ty")
⇒ "liberty egality fratyrnity"

If the pattern doesn’t occur in the text string, the returned string is identical (eq?) to the text string.

Function: pregexp-quote

Take an arbitrary string and returns a U-regexp (string) that precisely represents it. In particular, characters in the input string that could serve as regexp metacharacters are escaped with a backslash, so that they safely match only themselves.

(pregexp-quote "cons")
⇒ "cons"

(pregexp-quote "list?")
⇒ "list\\?"

(pregexp-quote "([a-z]+) +([0-9]+,)? *([0-9]+)")
⇒ "\\(\\[a-z\\]\\+\\) \\+\\(\\[0-9\\]\\+,\\)\\? \\*\\(\\[0-9\\]\\+\\)"

pregexp-quote is useful when building a composite regexp from a mix of regexp strings and verbatim strings.


Next: , Previous: , Up: pregexp   [Index]