Next: , Previous: , Up: Top   [Contents][Index]


4 Matching configuration

Compiled regular expressions can be configured, at construction–time, with a number of options collected in a cre2_options_t object. Notice that, by default, when attempting to compile an invalid regular expression pattern, RE2 will print to stderr an error message; usually we want to avoid this logging by disabling the associated option:

cre2_options_t *  opt;

opt = cre2_opt_new();
cre2_opt_set_log_errors(opt, 0);
Opaque Typedef: cre2_options_t

Type of opaque pointers to options objects. Any instance of this type can be used to configure any number of regular expression objects.

Enumeration Typedef: cre2_encoding_t

Enumeration type for constants selecting encoding. It contains the following values:

CRE2_UNKNOWN
CRE2_UTF8
CRE2_Latin1

The value CRE2_UNKNOWN should never be used: it exists only in case there is a mismatch between the definitions of RE2 and CRE2.

Function: cre2_options_t * cre2_opt_new (void)

Allocate and return a new options object. If memory allocation fails: the return value is a NULL pointer.

Function: void cre2_opt_delete (cre2_options_t * opt)

Finalise an options object releasing all the associated resources. Compiled regular expressions configured with this object are not affected by its destruction.

All the following functions are getters and setters for regular expression options; the flag argument to the setter must be false to disable the option and true to enable it; unless otherwise specified the int return value is true if the option is enabled and false if it is disabled.

Function: cre2_encoding_t cre2_opt_encoding (cre2_options_t * opt)
Function: void cre2_opt_set_encoding (cre2_options_t * opt, cre2_encoding_t enc)

By default, the regular expression pattern and input text are interpreted as UTF-8. CRE2_Latin1 encoding causes them to be interpreted as Latin-1.

The getter returns CRE2_UNKNOWN if the encoding value returned by RE2 is unknown.

Function: int cre2_opt_posix_syntax (cre2_options_t * opt)
Function: void cre2_opt_set_posix_syntax (cre2_options_t * opt, int flag)

Restrict regexps to POSIX egrep syntax. Default is disabled.

Function: int cre2_opt_longest_match (cre2_options_t * opt)
Function: void cre2_opt_set_longest_match (cre2_options_t * opt, int flag)

Search for longest match, not first match. Default is disabled.

Function: int cre2_opt_log_errors (cre2_options_t * opt)
Function: void cre2_opt_set_log_errors (cre2_options_t * opt, int flag)

Log syntax and execution errors to stderr. Default is enabled.

Function: int cre2_opt_literal (cre2_options_t * opt)
Function: void cre2_opt_set_literal (cre2_options_t * opt, int flag)

Interpret the pattern string as literal, not as regular expression. Default is disabled.

Setting this option is equivalent to quoting all the special characters defining a regular expression pattern:

cre2_regexp_t *   rex;
cre2_options_t *  opt;
const char *      pattern = "(ciao) (hello)";
const char *      text    = pattern;
int               len     = strlen(pattern);

opt = cre2_opt_new();
cre2_opt_set_literal(opt, 1);
rex = cre2_new(pattern, len, opt);
{
  /* successful match */
  cre2_match(rex, text, len, 0, len,
             CRE2_UNANCHORED, NULL, 0);
}
cre2_delete(rex);
cre2_opt_delete(opt);
Function: int cre2_opt_never_nl (cre2_options_t * opt)
Function: void cre2_opt_set_never_nl (cre2_options_t * opt, int flag)

Never match a newline character, even if it is in the regular expression pattern; default is disabled. Turning on this option allows us to attempt a partial match, against the beginning of a multiline text, without using subpatterns to exclude the newline in the regexp pattern.

Function: int cre2_opt_dot_nl (cre2_options_t * opt)
Function: void cre2_opt_set_dot_nl (cre2_options_t * opt, int flag)

The dot matches everything, including the new line; default is disabled.

Function: int cre2_opt_never_capture (cre2_options_t * opt)
Function: void cre2_opt_set_never_capture (cre2_options_t * opt, int flag)

Parse all the parentheses as non–capturing; default is disabled.

Function: int cre2_opt_case_sensitive (cre2_options_t * opt)
Function: void cre2_opt_set_case_sensitive (cre2_options_t * opt, int flag)

Match is case–sensitive; the regular expression pattern can override this setting with (?i) unless configured in POSIX syntax mode. Default is enabled.

Function: int cre2_opt_max_mem (cre2_options_t * opt)
Function: void cre2_opt_set_max_mem (cre2_options_t * opt, int m)

The max memory option controls how much memory can be used to hold the compiled form of the regular expression and its cached DFA graphs. These functions set and get such amount of memory. See the documentation of RE2 for details.

The following options are only consulted when POSIX syntax is enabled; when POSIX syntax is disabled: these features are always enabled and cannot be turned off.

Function: int cre2_opt_perl_classes (cre2_options_t * opt)
Function: void cre2_opt_set_perl_classes (cre2_options_t * opt, int flag)

Allow Perl’s \d, \s, \w, \D, \S, \W. Default is disabled.

Function: int cre2_opt_word_boundary (cre2_options_t * opt)
Function: void cre2_opt_set_word_boundary (cre2_options_t * opt, int flag)

Allow Perl’s \b, \B (word boundary and not). Default is disabled.

Function: int cre2_opt_one_line (cre2_options_t * opt)
Function: void cre2_opt_set_one_line (cre2_options_t * opt, int flag)

The patterns ^ and $ only match at the beginning and end of the text. Default is disabled.


Next: , Previous: , Up: Top   [Contents][Index]

This document describes version 0.4.0-devel.2 of CRE2.