Go to the first, previous, next, last section, table of contents.


@command{gawk}-Specific Regexp Operators

GNU software that deals with regular expressions provides a number of additional regexp operators. These operators are described in this minor node and are specific to @command{gawk}; they are not available in other @command{awk} implementations. Most of the additional operators deal with word matching. For our purposes, a word is a sequence of one or more letters, digits, or underscores (`_'):

\w
Matches any word-constituent character--that is, it matches any letter, digit, or underscore. Think of it as short-hand for [[:alnum:]_].
\W
Matches any character that is not word-constituent. Think of it as short-hand for [^[:alnum:]_].
\<
Matches the empty string at the beginning of a word. For example, /\<away/ matches `away' but not `stowaway'.
\>
Matches the empty string at the end of a word. For example, /stow\>/ matches `stow' but not `stowaway'.
\y
Matches the empty string at either the beginning or the end of a word (i.e., the word boundary). For example, `\yballs?\y' matches either `ball' or `balls', as a separate word.
\B
Matches the empty string that occurs between two word-constituent characters. For example, /\Brat\B/ matches `crate' but it does not match `dirty rat'. `\B' is essentially the opposite of `\y'.

There are two other operators that work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, @command{gawk}'s regexp library routines consider the entire string to match as the buffer.

\`
Matches the empty string at the beginning of a buffer (string).
\'
Matches the empty string at the end of a buffer (string).

Because `^' and `$' always work in terms of the beginning and end of strings, these operators don't add any new capabilities for @command{awk}. They are provided for compatibility with other GNU software.

In other GNU software, the word-boundary operator is `\b'. However, that conflicts with the @command{awk} language's definition of `\b' as backspace, so @command{gawk} uses a different letter. An alternative method would have been to require two backslashes in the GNU operators, but this was deemed too confusing. The current method of using `\y' for the GNU `\b' appears to be the lesser of two evils.

The various command-line options (see section Command-Line Options) control how @command{gawk} interprets characters in regexps:

No options
In the default case, @command{gawk} provides all the facilities of POSIX regexps and the @ifnotinfo previously described GNU regexp operators. @ifnottex GNU regexp operators described in section Regular Expression Operators. However, interval expressions are not supported.
--posix
Only POSIX regexps are supported; the GNU operators are not special (e.g., `\w' matches a literal `w'). Interval expressions are allowed.
--traditional
Traditional Unix @command{awk} regexps are matched. The GNU operators are not special, interval expressions are not available, nor are the POSIX character classes ([[:alnum:]] and so on). Characters described by octal and hexadecimal escape sequences are treated literally, even if they represent regexp metacharacters.
--re-interval
Allow interval expressions in regexps, even if @option{--traditional} has been provided.


Go to the first, previous, next, last section, table of contents.