Previous Table of Contents Next


Listing 2-11 nameswap: Using backreferences

#!/usr/bin/perl -w

print 'Type your name: ';
chomp($_ = <>);

s/(\S+) (\S+)/$2 $1/;      # Swap the first and second words

print;

The two pairs of parentheses create two backreferences: $1 refers to the first (\S+), and $2 refers to the second.

% nameswap
RESULT: Type your name: Abraham Lincoln
Lincoln Abraham

\1, \2, \3...

The backreferences \1 and \2 and \3 (and so on) have the same meaning as $1 and $2 and $3, but are valid only inside s/// or m// expressions. In particular, they’re valid inside the pattern itself. That lets you represent repeated substrings inside the pattern.

Suppose you want to match any phrase that contains a double word, like Pago Pago or chocolate chocolate chip. You could use (\S+) to match the first word and \1 to match the second occurrence of the same word, as shown in Listing 2-12.

Listing 2-12 couplet: Using \1

#!/usr/bin/perl -w

while (<>) {
    print "Echo! \n" if /(\S+)\s\1/;
}

couplet matches double double but not Hello world! or soy product.

% couplet
RESULT: double double toil and trouble
Echo!
Hello world!
mime pantomime
A thousand thousand is a million.
Echo!

The programs you’ve seen so far in this chapter use \S+ to define words. That classifies strings such as “3.14159” and “:-)” and “world!” as words unto themselves, which might upset strict grammarians. They should use \w.


The Metacharacters \w and \W
\w matches any letter, number, or underscore.
\W matches any character not matched by \w.

Listing 2-13 shows advent, the world’s shortest and hardest adventure game. It lets you type an adventure-style command and then rejects whatever you typed, using /w to identify characters belonging to words and three backreferences.

Listing 2-13 advent: A simple yet difficult adventure game

#!/usr/bin/perl

while (<>) {
    chomp;

# Match a first word (the verb), any middle words,
# and an optional last word (the noun).

    ($verb, $other_words, $noun) = (/\b(\w+)\b\s+\b(.*)\b\s?\b(\w+)\b/);
    if ($noun) { print "Where is $noun?\n"; }
    else      { print "How do I $verb?\n"; }
}

advent’s regex matches a word, followed by an optional space, followed by zero or more words followed by a space, followed by a final optional word. That’s a fairly contorted pattern; what matters is that it plops the first word into $verb, the last word into $noun, and any other words in the middle into $extra_words. (For an uncluttered version of this regex, see page 111.)

Then, if a $noun exists, advent prints Where is $noun?. Otherwise, it prints How do I $verb?.

% advent
RESULT: look up
Where is up?
run
How do I run?
read the perl book
Where is book?

QUIZ 3

1.  All of these one-liners will print Soon if the user asks a question beginning with the word when. Which would you consider the best?
a.  perl -ne ‘print “Soon.\n” if /when\s/;’
b.  perl -ne ‘print “Soon.\n” if /when\s/i;’
c.  perl -ne ‘print “Soon.\n” if /when\s/g;’
d.  perl -ne ‘print “Soon.\n” if /when\S/i;’
2.  Imagine you have a calculator program with the following line, used to match an addition statement (such as “15+17”) stored in $calculator_entry:
$calculator_entry =~ /(\d+)\s*(.)\s*(\d+)/;

which stores 15 in $1, + in $2, and 17 in $3.
Which of the following is not matched by this pattern?
a.  1729 + 1089
b.  1729 + 1089
c.  Neither a nor b is matched.
d.  Both a and b are matched.
3.  What will the following program print?
#!/usr/bin/perl
$_ = 'a_b8';
print 'yes '  if /\b\w{2,4}\b/;
print 'yup '  if /\b\w{2,3}\b/;
print 'yeah ' if /\b\w{2,}\b/;
print 'yo '   if /\b\w{2,3}\b/;
a.  
yes yeah yo
b.  
yes yup yeah yo
c.  
yes yup yo
d.  
yes yup yeah
4.  What will this code fragment print? (Caution! This is tricky.)
$_ = "14:";
($hour, $minute) = /(\d+):(\d+)/;
print "--$hour hours, $minute minutes--";
a.  
-- hours,  minutes--
b.  
-- 14 hours, 0 minutes--
c.  
-- 14 hours,   minutes--
d.  Nothing—there’s a syntax error.

Exercise 3

Difficulty: Easy

Modify advent’s regular expression so that it uses the \b metacharacter.


Previous Table of Contents Next