regular expression
1. <text, operating system> (regexp, RE) One of the wild card patterns
used by Perl and other languages, following Unix utilities such as grep, sed,
and awk and editors such as vi and Emacs. Regular expressions use conventions
similar to but more elaborate than those described under glob. A regular
expression is a sequence of characters with the following meanings:
An ordinary character (not one of the special characters discussed below)
matches that character.
A backslash (\) followed by any special character matches the special character
itself. The special characters are:
"." matches any character except NEWLINE; "RE*" (where the "*" is called the
"Kleene star") matches zero or more occurrences of RE. If there is any choice,
the longest leftmost matching string is chosen, in most regexp flavours.
"^" at the beginning of an RE matches the start of a line and "$" at the end of
an RE matches the end of a line.
[string] matches any one character in that string. If the first character of the
string is a "^" it matches any character except the remaining characters in the
string (and also usually excluding NEWLINE). "-" may be used to indicate a range
of consecutive ASCII characters.
\( RE \) matches whatever RE matches and \n, where n is a digit, matches
whatever was matched by the RE between the nth \( and its corresponding \)
earlier in the same RE. Many flavours use ( RE ) used instead of \( RE \).
The concatenation of REs is a RE that matches the concatenation of the strings
matched by each RE. RE1 | RE2 matches whatever RE1 or RE2 matches.
\< matches the beginning of a word and \> matches the end of a word. In many
flavours of regexp, \> and \< are replaced by "\b", the special character for
"word boundary".
RE{m} matches m occurences of RE. RE{m,} matches m or more occurences of RE.
RE{m,n} matches between m and n occurences.
The exact details of how regexp will work in a given application vary greatly
from flavour to flavour. A comprehensive survey of regexp flavours is found in
Friedl 1997 (see below).
[Jeffrey E.F. Friedl, "Mastering Regular Expressions, O'Reilly, 1997].
2. Any description of a pattern composed from combinations of symbols and the
three operators:
Concatenation - pattern A concatenated with B matches a match for A followed by
a match for B.
Or - pattern A-or-B matches either a match for A or a match for B.
Closure - zero or more matches for a pattern.
The earliest form of regular expressions (and the term itself) were invented by
mathematician Stephen Cole Kleene in the mid-1950s, as a notation to easily
manipulate "regular sets", formal descriptions of the behaviour of finite state
machines, in regular algebra.
[S.C. Kleene, "Representation of events in nerve nets and finite automata",
1956, Automata Studies. Princeton].
[J.H. Conway, "Regular algebra and finite machines", 1971, Eds Chapman & Hall].
[Sedgewick, "Algorithms in C", page 294].
(2004-02-01)
Nearby terms:
regression testing « REG-SYMBOLIC « REGTRAL «
regular expression » regular graph » rehi »
Reid, Brian
|