Previous Table of Contents Next


The -n and -p Command-Line Flags

You can wrap an implicit while loop around your script by providing the -n flag on either the first line of your program or on the command line when you call perl.

#!/usr/bin/perl -n
s/lion/tiger/g;
print;

is exactly the same as

#!/usr/bin/perl
while (<>) {
    s/lion/tiger/g;
    print;
}

Or, if you’d prefer a one-liner, you can use the -e flag as well. Just say

% perl -n -e _s/which/that/g; print; your_filename

or combine the flags:

% perl -ne _s/which/that/g; print; your_filename

If you use the -p flag instead of -n, the loop supplies the print() for you.

% perl -pe _s/which/that/g your_filename

How’s that for brevity?

Quiz 1

1.  Which of the following statements is false?
a.  A regular expression is a pattern that matches text.
b.  The s/// operator replaces a string matching a regex with something else.
c.  s/Chinatown/The Two Jakes/ replaces all occurrences of Chinatown with The Two Jakes.
d.  Few computer languages have built-in support for regular expressions.
2.  What will the following statements print?
$_ = 'Bugs Bunny';
s/ Bunn//g;
print;
a.  
Bugs Bunny
b.  
Bugsy
c.  
gsy
d.  
Bugsny
3.  What will the following program print?
#!/usr/bin/perl
$_ = 'The Dead Zone';
s/e/i/;
s/d/e/;
s/e/u/g;
print;
a.  
Thi Duae Zonu
b.  
Thi Diae Zoni
c.  
Thi uuad Zonu
d.  
Thi Duau Zonu
4.  Describe the behavior of the following one-liner.
% perl -ne _s/st|ar/*/g; print
a.  It will replace every s, t, a, and r with * and print each line.
b.  It will replace every str or sar with * and print each line.
c.  It will replace every st or ar with * and print each line.
d.  There’s an error in the program.

Exercise 1

Difficulty: Easy

Write a program that reads in a file and prints it, with two modifications: It should replace every tab (\t) with a space and every occurrence of John with Jon.

Session 2
Metacharacters And Special Characters

The last session introduced regular expressions by showing you how to substitute one string for another. But regular expressions don’t have to be literal strings. Listing 2-5 shows a program called cleanse that removes extraneous whitespace: All consecutive spaces, tabs, or newlines are replaced by a single space.

Listing 2-5 cleanse: Using \s and +

#!/usr/bin/perl -w

while (<>) {
    s/\s+/ /g;   # Replace one or more spaces with a single space
    print $_, "\n";
}

There are two new building blocks here: the \s metacharacter and the + special character. cleanse combines them, resulting in the following behavior:

% cleanse
RESULT:Here is   some text   with   extra  whitespace
Here is some text with extra whitespace
And here s some                                 more.
And here’s some more.


The \S and \S Metacharacters
\s matches any whitespace, including spaces, tabs, and newlines.
\S matches anything that doesn’t match \s.

\s is the first metacharacter you’ve encountered. A metacharacter is a normal alphanumeric character preceded by a backslash, which gives it a special meaning: In the case of \s, that meaning is “any whitespace.” You’ll see more metacharacters later in the chapter.

So if \s stands for any whitespace character, what’s \S? There’s a simple rule: Capitalized metacharacters are the opposite of their lower-cased counterparts. So \S represents anything that isn’t whitespace, such as a letter, a digit, or a punctuation mark.

/\s\s/

matches any two whitespace characters: two spaces, or a space and a tab, or a newline and a space, and so on.

/hello\sthere/

matches hello, followed by a space or a tab or a newline, followed by there.

/\S\S\s/

matches any two non-whitespace characters followed by one character of whitespace: “we“ or “us\t” or “64\n”.

The + in \s+ is one of the regular expression special characters.


Special Characters +, *, ?, and {,}
+ means one or more
* means zero or more
? means zero or one
{a,b} means at least a but not more than b

“One or more what?” you ask. One or more of whatever precedes the symbol. In cleanse, that was an \s, so \s+ means one or more whitespace characters.

Kleene Plus: +

The + special character (formally called “Kleene plus;” Kleene is pronounced, believe it or not, “kleeneh”) means “one or more.” As you saw in cleanse, s/\s+/ /g replaces any consecutive spaces with a single space.

/a/

means one a, but

/a+/

means one or more as, and

/\s/

means one whitespace character, but

/\s+/

means one or more whitespace characters.

Kleene Star: *

* is like +, but means “zero or more” instead of “one or more.”

/\s*/

means zero or more whitespace characters. So

 /hm+/

matches hm and hmm and hmmm and hmmmm, whereas

/hm*/

matches all of those as well, but also matches a single h, and

/b*/

matches every string because all strings contain zero or more bs. Do not move your eyes away from this page until that makes sense.

Curly Braces

What if you want no less than 2 and no more than 4 ms? Then you want curly braces, which match a limited number of occurrences, in contrast to + and *, which are unbounded.

/hm{2,4}/

matches hmm and hmmm and hmmmm. It’s equivalent to /hmm|hmmm|hmmmm/.

You can set a lower bound without an upper bound:

/hm{3,}/


Previous Table of Contents Next