Next: , Previous: , Up: Top   [Contents][Index]


5 Matching regular expressions

Basic pattern matching goes as follows (with error checking omitted):

cre2_regexp_t *   rex;
cre2_options_t *  opt;
const char *      pattern = "(ciao) (hello)";

opt = cre2_opt_new();
cre2_opt_set_posix_syntax(opt, 1);

rex = cre2_new(pattern, strlen(pattern), opt);
{
  const char *   text     = "ciao hello";
  int            text_len = strlen(text);
  int            nmatch   = 3;
  cre2_string_t  match[nmatch];

  cre2_match(rex, text, text_len, 0, text_len, CRE2_UNANCHORED,
             match, nmatch);

  /* prints: full match: ciao hello */
  printf("full match: ");
  fwrite(match[0].data, match[0].length, 1, stdout);
  printf("\n");

  /* prints: first group: ciao */
  printf("first group: ");
  fwrite(match[1].data, match[1].length, 1, stdout);
  printf("\n");

  /* prints: second group: hello */
  printf("second group: ");
  fwrite(match[2].data, match[2].length, 1, stdout);
  printf("\n");
}
cre2_delete(rex);
cre2_opt_delete(opt);
Enumeration Typedef: cre2_anchor_t

Enumeration type for the anchor point of matching operations. It contains the following constants:

CRE2_UNANCHORED
CRE2_ANCHOR_START
CRE2_ANCHOR_BOTH
Function: int cre2_match (const cre2_regexp_t * rex, const char * text, int text_len, int start_pos, int end_pos, cre2_anchor_t anchor, cre2_string_t * match, int nmatch)

Match a substring of the text referenced by text and holding text_len bytes against the regular expression object rex. Return true if the text matched, false otherwise.

The zero–based indices start_pos (inclusive) and end_pos (exclusive) select the substring of text to be examined. anchor selects the anchor point for the matching operation.

Data about the matching groups is stored in the array match, which must have at least nmatch entries; the referenced substrings are portions of the text buffer. If we are only interested in verifying if the text matches or not (ignoring the matching portions of text): we can use NULL as match argument and 0 as nmatch argument.

The first element of match (index 0) references the full portion of the substring of text matching the pattern; the second element of match (index 1) references the portion of text matching the first parenthetical subexpression, the third element of match (index 2) references the portion of text matching the second parenthetical subexpression; and so on.

Function: int cre2_easy_match (const char * pattern, int pattern_len, const char * text, int text_len, cre2_string_t * match, int nmatch)

Like cre2_match() but the pattern is specified as string pattern holding pattern_len bytes. Also the text is fully matched without anchoring.

If the text matches the pattern: the return value is 1. If the text does not match the pattern: the return value is 0. If the pattern is invalid: the return value is 2.

Struct Typedef: cre2_range_t

Structure type used to represent a substring of the text to be matched as starting and ending indices. It has the following fields:

long start

Inclusive start byte index.

long past

Exclusive end byte index.

Function: void cre2_strings_to_ranges (const char * text, cre2_range_t * ranges, cre2_string_t * strings, int nmatch)

Given an array of strings with nmatch elements being the result of matching text against a regular expression: fill the array of ranges with the index intervals in the text buffer representing the same results.


Next: , Previous: , Up: Top   [Contents][Index]

This document describes version 0.4.0-devel.2 of CRE2.