Previous: pregexp syntax, Up: pregexp [Index]
Here’s an extended example24 that covers many
of the features in (vicare pregexp)
. The problem is to
fashion a regexp that will match any and only IP addresses or dotted
quads, i.e., four numbers separated by three dots, with each number
between 0 and 255. We will use the commenting mechanism
to build the final regexp with clarity. First, a subregexp
n0-255
that matches 0
through 255
.
(define n0-255 "(?x: \\d ; 0 through 9 | \\d\\d ; 00 through 99 | [01]\\d\\d ;000 through 199 | 2[0-4]\\d ;200 through 249 | 25[0-5] ;250 through 255 )")
The first two alternates simply get all single- and double–digit
numbers. Since zero–padding is allowed, we need to match both 1
and 01
. We need to be careful when getting 3–digit
numbers, since numbers above 255 must be excluded. So we fashion
alternates to get 000
through 199
, then 200
through
249
, and finally 250
through 255
.25
An IP–address is a string that consists of four n0-255
s with
three dots separating them.
(define ip-re1 (string-append "^" ;nothing before n0-255 ;the first n0-255, "(?x:" ;then the subpattern of "\\." ;a dot followed by n0-255 ;an n0-255, ")" ;which is "{3}" ;repeated exactly 3 times "$" ;with nothing following ))
Let’s try it out.
(pregexp-match ip-re1 "1.2.3.4") ⇒ ("1.2.3.4") (pregexp-match ip-re1 "55.155.255.265") ⇒ #f
which is fine, except that we also have:
(pregexp-match ip-re1 "0.00.000.00") ⇒ ("0.00.000.00")
All–zero sequences are not valid IP addresses! Lookahead to the
rescue. Before starting to match ip-re1
, we look ahead to ensure
we don’t have all zeros. We could use positive lookahead to ensure
there is a digit other than zero.
(define ip-re (string-append "(?=.*[1-9])" ;ensure there's a non-0 digit ip-re1))
Or we could use negative lookahead to ensure that what’s ahead isn’t composed of only zeros and dots.
(define ip-re (string-append "(?![0.]*$)" ;not just zeros and dots ;(note: dot is not metachar inside []) ip-re1))
The regexp ip-re
will match all and only valid IP addresses.
(pregexp-match ip-re "1.2.3.4") ⇒ ("1.2.3.4") (pregexp-match ip-re "0.0.0.0") ⇒ #f
From: Jeffrey E. F. Friedl, Mastering Regular Expressions, 2/e, O’Reilly, 2002.
Note
that n0-255
lists prefixes as preferred alternates, something we
cautioned against. However, since we intend to anchor this subregexp
explicitly to force an overall match, the order of the alternates does
not matter.
Previous: pregexp syntax, Up: pregexp [Index]