Go to the first, previous, next, last section, table of contents.
GNU software that deals with regular expressions provides a number of additional regexp operators. These operators are described in this minor node and are specific to @command{gawk}; they are not available in other @command{awk} implementations. Most of the additional operators deal with word matching. For our purposes, a word is a sequence of one or more letters, digits, or underscores (`_'):
\w
[[:alnum:]_].
\W
[^[:alnum:]_].
\<
/\<away/ matches `away' but not
`stowaway'.
\>
/stow\>/ matches `stow' but not `stowaway'.
\y
\B
/\Brat\B/ matches `crate' but it does not match `dirty rat'.
`\B' is essentially the opposite of `\y'.
There are two other operators that work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, @command{gawk}'s regexp library routines consider the entire string to match as the buffer.
\`
\'
Because `^' and `$' always work in terms of the beginning and end of strings, these operators don't add any new capabilities for @command{awk}. They are provided for compatibility with other GNU software.
In other GNU software, the word-boundary operator is `\b'. However, that conflicts with the @command{awk} language's definition of `\b' as backspace, so @command{gawk} uses a different letter. An alternative method would have been to require two backslashes in the GNU operators, but this was deemed too confusing. The current method of using `\y' for the GNU `\b' appears to be the lesser of two evils.
The various command-line options (see section Command-Line Options) control how @command{gawk} interprets characters in regexps:
--posix
--traditional
[[:alnum:]] and so on).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
--re-interval
Go to the first, previous, next, last section, table of contents.