| Previous | Table of Contents | Next |
You can wrap an implicit while loop around your script by providing the -n flag on either the first line of your program or on the command line when you call perl.
#!/usr/bin/perl -n s/lion/tiger/g; print;
is exactly the same as
#!/usr/bin/perl
while (<>) {
s/lion/tiger/g;
print;
}
Or, if youd prefer a one-liner, you can use the -e flag as well. Just say
% perl -n -e _s/which/that/g; print; your_filename
or combine the flags:
% perl -ne _s/which/that/g; print; your_filename
If you use the -p flag instead of -n, the loop supplies the print() for you.
% perl -pe _s/which/that/g your_filename
Hows that for brevity?
$_ = 'Bugs Bunny'; s/ Bunn//g; print;
Bugs Bunny
Bugsy
gsy
Bugsny
#!/usr/bin/perl $_ = 'The Dead Zone'; s/e/i/; s/d/e/; s/e/u/g; print;
Thi Duae Zonu
Thi Diae Zoni
Thi uuad Zonu
Thi Duau Zonu
% perl -ne _s/st|ar/*/g; print
Difficulty: Easy
Write a program that reads in a file and prints it, with two modifications: It should replace every tab (\t) with a space and every occurrence of John with Jon.
The last session introduced regular expressions by showing you how to substitute one string for another. But regular expressions dont have to be literal strings. Listing 2-5 shows a program called cleanse that removes extraneous whitespace: All consecutive spaces, tabs, or newlines are replaced by a single space.
Listing 2-5 cleanse: Using \s and +
#!/usr/bin/perl -w
while (<>) {
s/\s+/ /g; # Replace one or more spaces with a single space
print $_, "\n";
}
There are two new building blocks here: the \s metacharacter and the + special character. cleanse combines them, resulting in the following behavior:
% cleanse RESULT:Here is some text with extra whitespace Here is some text with extra whitespace And here s some more. And heres some more.
The \S and \S Metacharacters
- \s matches any whitespace, including spaces, tabs, and newlines.
- \S matches anything that doesnt match \s.
\s is the first metacharacter youve encountered. A metacharacter is a normal alphanumeric character preceded by a backslash, which gives it a special meaning: In the case of \s, that meaning is any whitespace. Youll see more metacharacters later in the chapter.
So if \s stands for any whitespace character, whats \S? Theres a simple rule: Capitalized metacharacters are the opposite of their lower-cased counterparts. So \S represents anything that isnt whitespace, such as a letter, a digit, or a punctuation mark.
/\s\s/
matches any two whitespace characters: two spaces, or a space and a tab, or a newline and a space, and so on.
/hello\sthere/
matches hello, followed by a space or a tab or a newline, followed by there.
/\S\S\s/
matches any two non-whitespace characters followed by one character of whitespace: we or us\t or 64\n.
The + in \s+ is one of the regular expression special characters.
Special Characters +, *, ?, and {,}
- + means one or more
- * means zero or more
- ? means zero or one
- {a,b} means at least a but not more than b
One or more what? you ask. One or more of whatever precedes the symbol. In cleanse, that was an \s, so \s+ means one or more whitespace characters.
The + special character (formally called Kleene plus; Kleene is pronounced, believe it or not, kleeneh) means one or more. As you saw in cleanse, s/\s+/ /g replaces any consecutive spaces with a single space.
/a/
means one a, but
/a+/
means one or more as, and
/\s/
means one whitespace character, but
/\s+/
means one or more whitespace characters.
* is like +, but means zero or more instead of one or more.
/\s*/
means zero or more whitespace characters. So
/hm+/
matches hm and hmm and hmmm and hmmmm, whereas
/hm*/
matches all of those as well, but also matches a single h, and
/b*/
matches every string because all strings contain zero or more bs. Do not move your eyes away from this page until that makes sense.
What if you want no less than 2 and no more than 4 ms? Then you want curly braces, which match a limited number of occurrences, in contrast to + and *, which are unbounded.
/hm{2,4}/
matches hmm and hmmm and hmmmm. Its equivalent to /hmm|hmmm|hmmmm/.
You can set a lower bound without an upper bound:
/hm{3,}/
| Previous | Table of Contents | Next |