| Previous | Table of Contents | Next |
Listing 2-11 nameswap: Using backreferences
#!/usr/bin/perl -w print 'Type your name: '; chomp($_ = <>); s/(\S+) (\S+)/$2 $1/; # Swap the first and second words print;
The two pairs of parentheses create two backreferences: $1 refers to the first (\S+), and $2 refers to the second.
% nameswap RESULT: Type your name: Abraham Lincoln Lincoln Abraham
\1, \2, \3...
The backreferences \1 and \2 and \3 (and so on) have the same meaning as $1 and $2 and $3, but are valid only inside s/// or m// expressions. In particular, theyre valid inside the pattern itself. That lets you represent repeated substrings inside the pattern.
Suppose you want to match any phrase that contains a double word, like Pago Pago or chocolate chocolate chip. You could use (\S+) to match the first word and \1 to match the second occurrence of the same word, as shown in Listing 2-12.
Listing 2-12 couplet: Using \1
#!/usr/bin/perl -w
while (<>) {
print "Echo! \n" if /(\S+)\s\1/;
}
couplet matches double double but not Hello world! or soy product.
% couplet RESULT: double double toil and trouble Echo! Hello world! mime pantomime A thousand thousand is a million. Echo!
The programs youve seen so far in this chapter use \S+ to define words. That classifies strings such as 3.14159 and :-) and world! as words unto themselves, which might upset strict grammarians. They should use \w.
The Metacharacters \w and \W
- \w matches any letter, number, or underscore.
- \W matches any character not matched by \w.
Listing 2-13 shows advent, the worlds shortest and hardest adventure game. It lets you type an adventure-style command and then rejects whatever you typed, using /w to identify characters belonging to words and three backreferences.
Listing 2-13 advent: A simple yet difficult adventure game
#!/usr/bin/perl
while (<>) {
chomp;
# Match a first word (the verb), any middle words,
# and an optional last word (the noun).
($verb, $other_words, $noun) = (/\b(\w+)\b\s+\b(.*)\b\s?\b(\w+)\b/);
if ($noun) { print "Where is $noun?\n"; }
else { print "How do I $verb?\n"; }
}
advents regex matches a word, followed by an optional space, followed by zero or more words followed by a space, followed by a final optional word. Thats a fairly contorted pattern; what matters is that it plops the first word into $verb, the last word into $noun, and any other words in the middle into $extra_words. (For an uncluttered version of this regex, see page 111.)
Then, if a $noun exists, advent prints Where is $noun?. Otherwise, it prints How do I $verb?.
% advent RESULT: look up Where is up? run How do I run? read the perl book Where is book?
$calculator_entry =~ /(\d+)\s*(.)\s*(\d+)/;
#!/usr/bin/perl
$_ = 'a_b8';
print 'yes ' if /\b\w{2,4}\b/;
print 'yup ' if /\b\w{2,3}\b/;
print 'yeah ' if /\b\w{2,}\b/;
print 'yo ' if /\b\w{2,3}\b/;
yes yeah yo
yes yup yeah yo
yes yup yo
yes yup yeah
$_ = "14:"; ($hour, $minute) = /(\d+):(\d+)/; print "--$hour hours, $minute minutes--";
-- hours, minutes--
-- 14 hours, 0 minutes--
-- 14 hours, minutes--
Difficulty: Easy
Modify advents regular expression so that it uses the \b metacharacter.
| Previous | Table of Contents | Next |