Go to the first, previous, next, last section, table of contents.


Escape Sequences

Some characters cannot be included literally in string constants ("foo") or regexp constants (/foo/). Instead, they should be represented with escape sequences, which are character sequences beginning with a backslash (`\'). One use of an escape sequence is to include a double quote character in a string constant. Because a plain double quote ends the string, you must use `\"' to represent an actual double quote character as a part of the string. For example:

$ awk 'BEGIN { print "He said \"hi!\" to her." }'
-| He said "hi!" to her.

The backslash character itself is another character that cannot be included normally; you must write `\\' to put one backslash in the string or regexp. Thus, the string whose contents are the two characters `"' and `\' must be written "\"\\".

Another use of backslash is to represent unprintable characters such as tab or newline. While there is nothing to stop you from entering most unprintable characters directly in a string constant or regexp constant, they may look ugly.

The following table lists all the escape sequences used in @command{awk} and what they represent. Unless noted otherwise, all these escape sequences apply to both string constants and regexp constants:

\\
A literal backslash, `\'.
\a
The "alert" character, Ctrl-g, ASCII code 7 (BEL). (This usually makes some sort of audible noise.)
\b
Backspace, Ctrl-h, ASCII code 8 (BS).
\f
Formfeed, Ctrl-l, ASCII code 12 (FF).
\n
Newline, Ctrl-j, ASCII code 10 (LF).
\r
Carriage return, Ctrl-m, ASCII code 13 (CR).
\t
Horizontal tab, Ctrl-i, ASCII code 9 (HT).
\v
Vertical tab, Ctrl-k, ASCII code 11 (VT).
\nnn
The octal value nnn, where nnn stands for 1 to 3 digits between `0' and `7'. For example, the code for the ASCII ESC (escape) character is `\033'.
\xhh...
The hexadecimal value hh, where hh stands for a sequence of hexadecimal digits (`0' through `9', and either `A' through `F' or `a' through `f'). Like the same construct in ISO C, the escape sequence continues until the first non-hexadecimal digit is seen. However, using more than two hexadecimal digits produces undefined results. (The `\x' escape sequence is not allowed in POSIX @command{awk}.)
\/
A literal slash (necessary for regexp constants only). This expression is used when you want to write a regexp constant that contains a slash. Because the regexp is delimited by slashes, you need to escape the slash that is part of the pattern, in order to tell @command{awk} to keep processing the rest of the regexp.
\"
A literal double quote (necessary for string constants only). This expression is used when you want to write a string constant that contains a double quote. Because the string is delimited by double quotes, you need to escape the quote that is part of the string, in order to tell @command{awk} to keep processing the rest of the string.

In @command{gawk}, a number of additional two-character sequences that begin with a backslash have special meaning in regexps. @xref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}.

In a regexp, a backslash before any character that is not in the above table and not listed in @ref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}, means that the next character should be taken literally, even if it would normally be a regexp operator. For example, /a\+b/ matches the three characters `a+b'.

For complete portability, do not use a backslash before any character not shown in the table above.

To summarize:

Advanced Notes: Backslash Before Regular Characters

If you place a backslash in a string constant before something that is not one of the characters listed above, POSIX @command{awk} purposely leaves what happens as undefined. There are two choices:

Strip the backslash out
This is what Unix @command{awk} and @command{gawk} both do. For example, "a\qc" is the same as "aqc". (Because this is such an easy bug to both introduce and to miss, @command{gawk} warns you about it.) Consider `FS = "[ \t]+\|[ \t]+"' to use vertical bars surrounded by whitespace as the field separator. There should be two backslashes in the string, `FS = "[ \t]+\\|[ \t]+"'.)
Leave the backslash alone
Some other @command{awk} implementations do this. In such implementations, "a\qc" is the same as if you had typed "a\\qc".

Advanced Notes: Escape Sequences for Metacharacters

Suppose you use an octal or hexadecimal escape to represent a regexp metacharacter (see section Regular Expression Operators). Does @command{awk} treat the character as a literal character or as a regexp operator?

Historically, such characters were taken literally. (d.c.) However, the POSIX standard indicates that they should be treated as real metacharacters, which is what @command{gawk} does. In compatibility mode (see section Command-Line Options), @command{gawk} treats the characters represented by octal and hexadecimal escape sequences literally when used in regexp constants. Thus, /a\52b/ is equivalent to /a\*b/.


Go to the first, previous, next, last section, table of contents.