Previous Table of Contents Next


grep()

It’s been a while since we’ve seen any new functions. In fact, there haven’t been any in this chapter until now, because m//, s///, and tr/// are actually operators, not functions. But here’s one: grep() (Figure 2-10).


Figure 2-10  The grep() function extracts certain elements from an array


The grep() Function
grep(EXPRESSION, ARRAY) extracts any elements from ARRAY for which EXPRESSION is TRUE.

grep() returns a subarray of ARRAY’s elements. Often, but not always, EXPRESSION is a regular expression.

@has_digits = grep(/\d/, @array)

@has_digits now contains any elements of @array with digits.

Within a grep(), $_ is temporarily set to each element of the array, which you can use in the EXPRESSION:

@odds = grep($_ % 2, @numbers)

@odds now contains all the odd numbers in @numbers.

You can even use grep() to emulate a foreach loop that modifies array elements.

grep(s/a/b/g, @strings)

replaces all as with bs in every element of @strings. It also returns a fresh array, which the above statement ignores. Try to avoid this; people who hate side effects consider this poor form since a foreach loop is a “cleaner” alternative (not to mention faster).

Movie sequels are never as good as the originals. Luckily, they’re usually easy to identify by the Roman numerals in their titles. We’d like to weed them out from an array of movies. We’ll do this using grep() (the origin of this word is a mystery, but it probably stood for “Generate Regular Expression and Print.” That’s not what the Perl grep does, however.) nosequel (Listing 2-22) demonstrates grep().

Listing 2-22 nosequel: Using grep() to extract array elements

#!/usr/bin/perl -w

@movies = ('Taxi Driver', 'Rocky IV', 'Casablanca', 'Godfather II',
           'Friday the 13th Part VI', 'I, Claudius', 'Pulp Fiction',
           'Police Academy III');

# extract elements that don't (!) contain a word boundary (\b) followed by
# one or more Is or Vs ((I|V)+), at the end of the string ($).

@good_movies = grep( ! /\b(I|V)+$/, @movies);
print "@good_movies";

nosequel excludes any movie with Is and Vs at the end of its title. (Hopefully we’ll never need to exclude Xs or Ls.)

% nosequel
RESULT:Taxi Driver Casablanca I, Claudius Pulp Fiction

Quiz 6

1.  Which string matches this pattern?
/\LEE\EEE/
a.  EEEEE
b.  eeEEE
c.  eeeee
d.  Both b and c
2.  Suppose Robert and Cybill develop a secret code in which each letter of a word is translated to its corresponding number on a telephone keypad: A, B, C become 2; D, E, F become 3, and so on. Because the 0 and 1 buttons don’t have letters, Robert and Cybill use those digits to separate words. Here’s a secret message:
$code = '82940374837';

Which statement places the two words of $code into $first and $second?
a.  
($first, $sep, $second) = ($code =~ /^([2-9])(0|1)([2-9])/);
b.  
($first, $sep, $second) = ($code =~ /^([^01]+)(0|1)([^01]+)/);
c.  
($first, $sep, $second) = ($code =~ /^([^0-1])+(0|1)([^0-1])+/);
d.  
($first, $sep, $second) = ($code =~ /^([^2-9+])(0|1)([^2-9+])/);
3.  Using Robert and Cybill’s secret code from Question 2, what command might miss a code word in @text?
a.  
@codelines = grep(!/[^2-9]+/, @text);
b.  
@codelines = grep(/./, @text);
c.  
@codelines = grep(/[^01]+/, @text);
d.  
@codelines = grep(/[23-89]+/, @text);
4.  Which statement extracts all numbers greater than 1234 from @numbers?
a.  
@bignums = grep($_ > 1234, @numbers);
b.  
@bignums = grep($_ = 1234, @numbers);
c.  
@bignums = grep(/\d\d\d\d/, @numbers) > 1234;
d.  
@bignums = grep('$_ > 1234', @numbers);

Exercise 6

Difficulty: Hard

Write a program that reads a Perl script and prints a modified version of the script: All comments should be removed and one-line if statements of the form if (CONDITION) {STATEMENT} should be changed to STATEMENT if CONDITION. Assume that STATEMENT doesn’t contain a semicolon.

Hint: Use grep() and backreferences.

Session 7
map() and Advanced Regexes II

Pat yourself on the back; you’ve now covered most of regular expressions. The remaining two sessions in this chapter cover some regex arcana: features that you should know about, but might well never need.


Previous Table of Contents Next