| Previous | Table of Contents | Next |
matches hmmm and hmmmm and hmmmmm and ... (Its equivalent to /hmmmm*/ and /hmmm+/. Do you see why?)
You CANT set an upper bound without a lower bound: /hm{,3}/ wont do what you want!
? means zero or one. You can use the ? special character as a shorthand for {0,1}.
/Joh?n/
matches Jon and John: Jo followed by 0 or 1 hs followed by an n. (An additional question mark has a special meaning: Check out the section titled Greediness in Session 7.)
These are special characters, as opposed to metacharacters (Figure 2-4). Heres how to tell the difference:
Figure 2-4 In regular expressions, metacharacters are normal characters with a special meaning when backslashed; special characters are weird characters with a special meaning when not backslashed.
Metacharacters vs. Special Characters
- Metacharacters are alphanumeric characters that, when backslashed, have a special meaning. Example: \s.
- Special characters are nonalphanumeric characters that have a special meaning unless theyre backslashed. Example: +.
Heres another special character: the dot. Often, youll want to express any character in your regexes. You can do that with the . special character, which matches anything except for a newline.
| /./ | matches any character (except newline) |
| /.../ | matches any three characters (except newlines) |
| /.*/ | matches any number (including zero) of characters (except newlines). |
. always matches exactly one character: /the../ matches there and their and theta and theme, but not the or then. If you wanted to match those as well, you could do it with the special characters you just learned, such as
/the.?.?/
or
/the.{0,2}/
Experiment!
Lets say a contract has been e-mailed to you and you want to make four changes.
Listing 2-6 shows a program that uses *, +, ?, and {} to get the job done.
Listing 2-6 destuff: Using *, ?, and {} in regular expressions
#!/usr/bin/perl -wn
s/_{3,}/Your Name/g; # Replaces any series of >= 3 underscores
s/Whereas/Since/g; # Replaces Whereas
s!one-half!1/2!g; # Replaces one-half
# Replaces dates in May
s/May\s\S\S?,\s*\S+/sometime/g;
print;
Lets run destuff on the text file contract, the contents of which are shown here:
Whereas the party of the first part, ___________, and the party of
the second part, known as That Guy, have entered into a contract in good
faith as of May 4,1996 and:
Whereas __________ was on May 5, 1996 rendered one-half of the
payment and will be rendered the remaining one-half as of May 30, 1997
upon request by That Guy.
Whereas and hereunto this day of May 16, 1996 forthwith
undersigned: Signature: _____________
Heres the result:
% destuff contract
RESULT: Since the party of the first part, Your Name, and the party of
the second part, known as That Guy, have entered into a contract in good
faith as of sometime and:
Since Your Name was on sometime rendered 1/2 of the payment
and will be rendered the remaining 1/2 as of sometime upon request by
That Guy.
Since and hereunto this day of sometime forthwith
undersigned: Signature: Your Name
This uses all the metacharacters and special characters that youve learned so far, as well as one new feature: the ability of s/// to use a delimiter other than a slash. Since destuff replaces one-half with 1/2, you might be tempted to write s/one-half/1/2/g. But because of the extra slash in 1/2, Perl would think replace one-half with 1, using the modifier 2, and hey, whats that extra /g doing there? Luckily, you can use different delimiters to separate the chunks of s///.
s! one-half!1/2! g s# one-half#1/2# g s@one-half@1/2@g s&one-half&1/2&g
Most nonalphanumeric characters are valid delimiters.
$_ = 'E Pluribus Unum'; s/\s/-/; s/Pluribus/of many/g; s/Unum/1/; s/E/Out/; s/ /,\s/; print;
Out of many 1
Out-of-many,-1
Out-of,many 1
Out-of,\smany 1
Difficulty: Easy
Write a program that spoofs British English by making three substitutions. First, if a word ends in or and has more than four letters, it should replace or with our, so that color becomes colour. Second, all words ending in zation or ze should substitute ss for zs, so that realize becomes realise. Third, coffee should become tea.
| Previous | Table of Contents | Next |