Next: , Previous: , Up: glibc   [Index]


5.10 Pattern matching, globbing and regular expressions

The following bindings are exported by the (vicare glibc) library.

Function: fnmatch pattern string flags

Interface to the C function fnmatch(), (libc)fnmatch. Tests whether string matches pattern according to flags; if it matches return #t, else return #f.

pattern and string must be Scheme strings or bytevectors; flags must be a fixnum resulting from the bitwise combination (fxior) of the FNM_ constants exported by (vicare platform constants).

#!r6rs
(import (vicare)
  (prefix (vicare glibc) glibc.)
  (vicare platform constants))

(glibc.fnmatch "ciao" "ciao"  0)        ⇒ #t
(glibc.fnmatch "ciao" "salut" 0)        ⇒ #f

(glibc.fnmatch "ciao*" "ciao a tutti" 0)
⇒ #t

(glibc.fnmatch "*(Ciao)" "CiaoCiao" FNM_EXTMATCH)
⇒ #t

(glibc.fnmatch "?(Ciao|Hello)" "Hello" FNM_EXTMATCH)
⇒ #t
Function: glob pattern flags error-handler
Function: glob/string pattern flags error-handler

Interface to the C function glob(), (libc)glob. Perform file globbing in the current directory using pattern to select entries according to flags. If successful glob returns a list of bytevectors representing the result, glob/string returns a list of strings representing the result; else both return a fixnum representing one of the error codes: GLOB_ABORTED, GLOB_NOMATCH, GLOB_NOSPACE.

pattern must be a string or bytevector. flags must be a fixnum built as bitwise composition (fxior) of the flags: GLOB_ERR, GLOB_MARK, GLOB_NOCHECK, GLOB_NOSORT, GLOB_NOESCAPE, GLOB_PERIOD, GLOB_BRACE, GLOB_NOMAGIC, GLOB_TILDE, GLOB_TILDE_CHECK, GLOB_ONLYDIR.

error-handler must be #f or a callback pointer with the signature:

int error-handler (const char * filename, int error-code)

called by glob whenever it cannot open a directory; see the documentation for glob() and the flag GLOB_ERR for details.

#!r6rs
(import (vicare)
  (prefix (vicare glibc) glibc.)
  (vicare platform constants))

(glibc.glob/string "*" 0 #f)
⇒ ("bin" "boot" "dev" "etc" "home" "lib"
    "libexec" "lost+found" "media" "mnt" "opt"
    "proc" "root" "sbin" "share" "srv" "sys"
    "tmp" "usr" "var")

(glibc.glob/string "~marco" GLOB_TILDE #f)
⇒ ("/home/marco")

POSIX regular expressions

The following are usage examples of POSIX regular expressions matching:

#!r6rs
(import (vicare)
  (prefix (vicare glibc) glibc.)
  (vicare platform constants))

(let ((rex (glibc.regcomp "abc" 0)))
  (glibc.regexec rex "abc" 0))
⇒ #((0 . 3)))   ;the regex matched the whole string

(let ((rex (glibc.regcomp "abc" 0)))
  (glibc.regexec rex "abcdef" 0))
⇒ #((0 . 3))    ;substring [0, 3) matched

(let ((rex (glibc.regcomp "ciao" 0)))
  (glibc.regexec rex "abc" 0))
⇒ #f            ;no match

(let ((rex (glibc.regcomp "\\(a\\)" 0)))
  (glibc.regexec rex "abc" 0))
⇒ #((0 . 1)     ;the regexp matched the whole string
     (0 . 1))    ;substring [0, 1) matched the 1st paren

(let ((rex (glibc.regcomp "\\(a\\)\\(b\\)\\(c\\)" 0)))
  (glibc.regexec rex "abc" 0))
⇒ #((0 . 3)     ;the regexp matched the whole string
     (0 . 1)     ;substring [0, 1) matched the 1st paren
     (1 . 2)     ;substring [1, 2) matched the 2nd paren
     (2 . 3))    ;substring [2, 3) matched the 3rd paren

(let ((rex (glibc.regcomp "\\(a\\(b\\(c\\)\\)\\)" 0)))
  (glibc.regexec rex "abc" 0))
⇒ #((0 . 3)     ;the regexp matched the whole string
     (0 . 3)     ;substring [0, 3) matched the 1st paren
     (1 . 3)     ;substring [1, 3) matched the 2nd paren
     (2 . 3))    ;substring [2, 3) matched the 3rd paren

(let* ((rex (glibc.regcomp/disown "[a-z]+" REG_EXTENDED))
       (rv  (glibc.regexec rex "abc" 0)))
  (glibc.regfree rex)
  rv)
⇒ #((0 . 3))

we have to remember that this API can be used only with bytevectors representing ASCII coded strings and with Scheme strings containing only characters whose Unicode code points are in the range [0, 255].

POSIX regular expression patterns are described in the “Base Definitions” volume of IEEE Std 1003.1-2001, Chapter 9, Regular Expressions:

http://pubs.opengroup.org/onlinepubs/009695399/nframe.html

URL last verified Dec 9, 2011.

Function: regcomp pattern flags

Interface to the C function regcomp(), (libc)regcomp. Compile the regular expression in pattern accoding to flags. If successful return a pointer referencing the compiled regexp, else raise an exception with condition components &error, &who, &message, &irritants.

pattern must be a string or bytevector representing the regular expression. flags must be a fixnum resulting from the bitwise combination (fxior) of REG_ constants.

The pointer returned in case of success references a regex_t data structure whose fields must be released explicitly by regfree, or they are automatically released by the garbage collector whenever the pointer itself is collected.

Function: regcomp/disown pattern flags

Like regcomp, but when the returned pointer object is garbage collected nothing happens; we have to explicitly apply regfree to the returned pointer object to release the allocated resources.

Function: regexec regex string flags

Interface to the C function regexec(), (libc)regexec. Attempt to match string against the precompiled regular expression regex according to flags.

regex must be a pointer returned by a previous call to regcomp; string must be a string or bytevector in ASCII encoding; flags must be a fixnum representing matching flags, we can use zero for no flags.

If one or more matches occur return a vector holding pairs describing the portions of string that did match; if no match occurs return #f; if an error occurs raise an exception with condition components: &error, &who, &message, &irritants.

The vector returned in case of success contains pairs: the car being a fixnum representing the starting offset of a match substring, the cdr being a fixnum representing the ending offset of a match substring.

The vector element at index 0 represents the portion of string which matched the whole regular expression; the vector element at index 1 represents the portion of string which matched the first parenthetical subexpression, the vector element at index 2 represents the portion of string which matched the second parenthetical subexpression, and so on. If string matches: the returned vector has at least one element.

Function: regfree regex

Interface to the C function regfree(), (libc)regfree. Release the resources associated to the compiled regular expression regex, which must be a pointer.

It is safe to apply this function multiple times to the same regex pointer object: releasing occurs only if the first time, subsequent applications do nothing. After the resources have been released: regex is reset to NULL.


Next: , Previous: , Up: glibc   [Index]