| Previous | Table of Contents | Next |
So what can you do with regexes? Well, one common task is to make substitutions in a chunk of text using the s/// operator (Figure 2-2).
Figure 2-2 The s/// function removes strings matching regex and substitutes another string
The s/// Operator
s/REGEX/STRING/MODIFIERS replaces strings matching REGEX with STRING.
s/// returns the number of substitutions made, and operates on the default variable $_ unless you supply another scalar with =~ or !~, introduced in Session 3 of this chapter.
The MODIIFERS /e, /g, /i, /m, /o, /s, and /x are characters that you can add to the end of the s/// expression to affect its behavior. Theyll be discussed throughout this chapter.
Listing 2-1 shows a simple program that uses s/// to replace occurrences of which with that.
Listing 2-1 strunk: Using s/// to replace one string with another
#!/usr/bin/perl
while (<>) { # assign each input line to $_
s/which/that/; # in $_, replace "which" with "that"
print; # print $_
}
strunks pattern consists of five literal characters: which. Thats about as simple as patterns get: Only strings containing those five characters in that order will match.
% strunk RESULT: which that the one which has the broomstick? the one that has the broomstick? <CTRL-D>
Wouldnt it be nice if strunk could read files, too? Surpriseit can! For a detailed explanation about the subtleties of <> that make this possible, see the beginning of Chapter 3. For now, however, well demonstrate this ability using the text file grammar, which appears in Listing 2-2.
Listing 2-2 grammar
The videotape which malfunctioned is in the corner. That which is, is not; that which is not, is.
On a UNIX system, you can pipe this file to the standard input of strunk by typing cat grammar | strunk. Or, you can just supply grammar as a command-line argument.
% strunk grammar RESULT: The videotape that malfunctioned is in the corner. That that is, is not; that which is not, is.
Hold on! If you look closely at strunks output, youll notice that the substitution worked only for the first which. The second which remained unchanged. Thats because s///, by default, replaces only the first matching string. To change all strings, use the /g modifier, which makes global replacements (Figure 2-3).
Figure 2-3 s///g makes global replacements
The /GModifier for s///
The /g modifier performs global search and replace.
Modifiers are tacked on to the end of s///, so
s/which/that/g
replaces all occurrences of which with that, and
s/2/20/g
replaces all 2s with 20s.
The vertical bar (|) is used to separate alternatives in regexes. (In this context, the | is pronounced or.) Listing 2-3, abbrevi8, replaces several similar-sounding words with the same abbreviation.
Listing 2-3 abbrevi8: Using s///g to make global substitutions
#!/usr/bin/perl -w
while (<>) {
s/too|to|two/2/g; # Change all toos and tos and twos to 2s
s/four|fore|for/4/g; # Change all fours and fores and fors to 4s
s/ought|oh|owe|nothing/0/g; # etc.
s/eight|ate/8/g;
print;
}
Each of the four lines above is evaluated once for each line of text the user types.
% abbrevi8 RESULT: I ought to owe nothing, for I ate nothing. I 0 2 0 0, 4 I 8 0
Now isnt that much clearer?
Theres nothing stopping you from using variables as replacement strings.
$replacement = 'that'; s/which/$replacement/;
replaces which with that. You can even use variables as patterns.
$pattern = 'eight|ate'; s/$pattern/8/;
replaces eight or ate with 8.
Pause for a minute to think about how you might use the s/// operator to swap two text strings, so that lion becomes tiger and tiger becomes lion. Why wouldnt the following lines work?
#!/usr/bin/perl
while (<>) {
s/lion/tiger/g; # Replace all lions with tigers
s/tiger/lion/g; # Replace all tigers with lions
print;
}
The problem is that the s/// operations are executed in sequence. First all lions become tigers, then all tigers (including the former lions) become lions: There wont be any tigers left at all. Oops.
Youll fare better if you transform lions into a string that isnt in the text first. Lets say the word tigon doesnt appear anywhere in the text. Then you can swap lions and tigers by transforming lion into tigon, then tiger into lion, and finally tigon into tiger, as shown in Listing 2-4.
Listing 2-4 tigons: Using s/// to swap two words
#!/usr/bin/perl -w
while (<>) {
s/lion/tigon/g; # Replace all lions with tigons
s/tiger/lion/g; # Replace all tigers with lions
s/tigon/tiger/g; # Replace all tigons with tigers
print;
}
| Previous | Table of Contents | Next |