Previous Table of Contents Next


Making Substitutions

So what can you do with regexes? Well, one common task is to make substitutions in a chunk of text using the s/// operator (Figure 2-2).


Figure 2-2  The s/// function removes strings matching regex and substitutes another string


The s/// Operator
s/REGEX/STRING/MODIFIERS replaces strings matching REGEX with STRING.

s/// returns the number of substitutions made, and operates on the default variable $_ unless you supply another scalar with =~ or !~, introduced in Session 3 of this chapter.

The MODIIFERS /e, /g, /i, /m, /o, /s, and /x are characters that you can add to the end of the s/// expression to affect its behavior. They’ll be discussed throughout this chapter.

Listing 2-1 shows a simple program that uses s/// to replace occurrences of which with that.

Listing 2-1 strunk: Using s/// to replace one string with another

#!/usr/bin/perl
while (<>) {             # assign each input line to $_
    s/which/that/;       # in $_, replace "which" with "that"
    print;               # print $_
}

strunk’s pattern consists of five literal characters: which. That’s about as simple as patterns get: Only strings containing those five characters in that order will match.

% strunk
RESULT: which
that
the one which has the broomstick?
the one that has the broomstick?
<CTRL-D>

Wouldn’t it be nice if strunk could read files, too? Surprise—it can! For a detailed explanation about the subtleties of <> that make this possible, see the beginning of Chapter 3. For now, however, we’ll demonstrate this ability using the text file grammar, which appears in Listing 2-2.

Listing 2-2 grammar

The videotape which malfunctioned is in the corner.
That which is, is not; that which is not, is.

On a UNIX system, you can pipe this file to the standard input of strunk by typing cat grammar | strunk. Or, you can just supply grammar as a command-line argument.

% strunk grammar
RESULT: The videotape that malfunctioned is in the corner.
That that is, is not; that which is not, is.

Hold on! If you look closely at strunk’s output, you’ll notice that the substitution worked only for the first which. The second which remained unchanged. That’s because s///, by default, replaces only the first matching string. To change all strings, use the /g modifier, which makes global replacements (Figure 2-3).


Figure 2-3  s///g makes global replacements


The /GModifier for s///
The /g modifier performs “global search and replace.”

Modifiers are tacked on to the end of s///, so

s/which/that/g

replaces all occurrences of which with that, and

s/2/20/g

replaces all 2s with 20s.

Alternatives

The vertical bar (|) is used to separate alternatives in regexes. (In this context, the | is pronounced “or.”) Listing 2-3, abbrevi8, replaces several similar-sounding words with the same abbreviation.

Listing 2-3 abbrevi8: Using s///g to make global substitutions

#!/usr/bin/perl -w

while (<>) {
    s/too|to|two/2/g;           # Change all toos and tos and twos to 2s
    s/four|fore|for/4/g;        # Change all fours and fores and fors to 4s
    s/ought|oh|owe|nothing/0/g; # etc.
    s/eight|ate/8/g;
    print;
}

Each of the four lines above is evaluated once for each line of text the user types.

% abbrevi8
RESULT: I ought to owe nothing, for I ate nothing.
I 0 2 0 0, 4 I 8 0

Now isn’t that much clearer?

There’s nothing stopping you from using variables as replacement strings.

$replacement = 'that';
s/which/$replacement/;

replaces which with that. You can even use variables as patterns.

$pattern = 'eight|ate';
s/$pattern/8/;

replaces eight or ate with 8.

Pause for a minute to think about how you might use the s/// operator to swap two text strings, so that lion becomes tiger and tiger becomes lion. Why wouldn’t the following lines work?

#!/usr/bin/perl

while (<>) {
    s/lion/tiger/g;                  # Replace all lions with tigers
    s/tiger/lion/g;                  # Replace all tigers with lions
    print;
}

The problem is that the s/// operations are executed in sequence. First all lions become tigers, then all tigers (including the former lions) become lions: There won’t be any tigers left at all. Oops.

You’ll fare better if you transform lions into a string that isn’t in the text first. Let’s say the word tigon doesn’t appear anywhere in the text. Then you can swap lions and tigers by transforming lion into tigon, then tiger into lion, and finally tigon into tiger, as shown in Listing 2-4.

Listing 2-4 tigons: Using s/// to swap two words

#!/usr/bin/perl -w

while (<>) {
    s/lion/tigon/g;                  # Replace all lions with tigons
    s/tiger/lion/g;                  # Replace all tigers with lions
    s/tigon/tiger/g;                 # Replace all tigons with tigers
    print;
}


Previous Table of Contents Next