Bob
Think about Loose Coupling
Perl Monks 

Map: The Basics

by jeffa on Jun 30, 2000 at 00:00 ( print, xml )
log zgrebe out | zgrebe | The Monastery Gates | Super Search | Snippets | Cool Uses For Perl | Poetry | Code | Obfuscation | Q&A | Library | Seekers of Perl Wisdom | Craft | Meditations | Perl Monks Discussion | Perl News | Reviews | Tutorials | Newest Nodes | Offering Plate
Need Help??
++ --

Map Tutorial: The Basics

The map built-in function allows you to build a new list from another list while modifying the elements - simultaneously. Map takes a list and applies either a block of code or an expression to that list to produce a new list. I will limit the scope of this tutorial to the code block form.

When applied correctly, map can produce lightning-fast transforms very efficiently. When abused, it can produce some extremely obfuscated code, sacrificing readability and maintainability (giving legacy coders unnecessary headaches).

Vroom has an excellent tutorial on Complex Sorting - in it he has some more complex but extremely useful explanations of map. The purpose of this tutorial is to talk about the easy stuff: to allow a programmer new to the concept to stick one toe in the water at a time, so to speak.


Map: what is it good for?

Example 1:
Say that you have a text file that contains paragraphs of words. If you wanted to create a list with each element being a single line, you can use:


    open (FILE, "foo.txt");
    my @lines = <FILE>;
    close FILE;
But say that you wanted each element to contain a single word. As long as you didn't care about punctuation, you can use the map function like so:

    open (FILE, "foo.txt");
    my @words = map { split } <FILE>;
    close FILE;
Remember that split uses whitespace as its default delimiter, and the special variable $_ as its default variable to split up. Line 2 can be written as:

    my @words = map { split(' ', $_) } <FILE>;
The choice to use default arguments is a trade-off between understandability and laziness/elegance. Also, remember that a file handle can be taken in list context.

Example 2:
Let's get rid of punctuation. First we need a suitable regular expression, but before we can derive one, we need to decide if we should split first and substitute second, or substitute first and then split. The former choice would require more CPU cycles, because we are applying the regex to EACH word - the latter is more efficient, because the regex gets applied to a WHOLE line, and if we use the global modifier (g), Perl will quickly and efficiently apply the regex. If we only care about periods, commas, exclamation points, and question marks, we can use the substition operator like so:


s/[.,!\?]\B//g
Sorry, but the details of this regex are beyond the scope of the tutorial, be sure and check out root's tutorial on String matching and Regular Expressions. I will tell you what it does, though: it turns Hello World! into Hello World and it does so without removing punctuation from anacronyms like J.A.P.H. - okay, okay, half-truth: J.A.P.H. becomes J.A.P.H - good enough for this example (can anyone say "exercise for reader").

Moving on . . . now we can add this regex. The inner block of a map statement may contain a number of statements separated by semi-colons. The statements are interpreted left to right:


    open (FILE, "foo.txt");
    my @words = map { s/[.,!\?]\B//g; split; } <FILE>;
    close FILE;

Example 3: (know what a function returns!)
Let's say that we didn't want to split the line into words, we just wanted to remove punctuation:


    open (FILE, "foo.txt");
    my @lines = map { s/[.,!\?]\B//g } <FILE>;
    close FILE;
Uh-oh. What happened? If you try this, you will not receive the output you might have expected. Instead, you will see numbers and/or blank lines. If the substitution operator found no punctuation in a line it will return UNDEF, otherwise it will return the number of substitutions on that line. It does NOT return the line itself. In cases like this, the function or operator affects it's argument by reference. Split does not work in this manner - it returns what was split off. Look at example 2 again - the last thing that gets passed out of the map block is the return value of split. So, if we want to return the line altered by a substitution, we will have to tell Perl so - like this:

    open (FILE, "foo.txt");
    my @lines = map { s/[.,!\?]\B//g; $_; } <FILE>;
    close FILE;
Much better.


Map: what is it NOT good for?

Remember, map returns a list - if you do not need a list, don't be tempted to use map as an alternative to more traditional iteration constructs, such as for and foreach.

Also, some built-in functions, such as chomp and reverse, can be applied to a list AT ONCE, so to speak. For example, if you wanted to slurp the contents of a text file into a list without the new lines, you might be tempted to use your new knowledge like so:


    open (FILE, "foo.txt");
    my @lines = map { chomp; $_; } <FILE>;
    close FILE;
(remember what we learned from example 3 - chomp returns the numbers of newlines chomped off (1), so we have to explicitly let Perl know we want the remaining value). However, it turns out that chomp can do a much better job by itself:

    open (FILE, "foo.txt");
    my @lines = <FILE>;
    chomp(@lines);
    close FILE;
The second example actually runs faster. Why? Because Perl will literally stuff the entire file into the array - no iteration needed. The same goes for the chomp - Perl will not iterate through the list. By using a map statement, however, you are forcing iteration to happen.

I used benchmark to time these two examples using '/usr/dict/words' as the input file. Here were the results for 100 iterations:


Benchmark: timing 100 iterations of chomp, map...
    chomp: 31 wallclock secs (29.17 usr +  0.54 sys = 29.71 CPU)
      map: 37 wallclock secs (34.63 usr +  0.59 sys = 35.22 CPU)

Something else to consider is readability and maintainability. If you want your code to be either, map statements might not be a good solution - let's face it, no other language really has this one-liner of death implemented, and unless you like watching ears bleed, keep it simple! (personally, I like watching ears bleed!)

Of course, there aren't too many obfuscated Perl scripts out there that don't use map. Keep up the higher learning!

comment on Map: The Basics
d/l code
RE: Map: The Basics
by splinky on Jul 04, 2000 at 23:55
++ --
    I like this tutorial. Good info. I did find a few minor errors, however, which I'll enumerate below.

    my @lines = map { split } <FILE>;

    should probably be

    my @words = map { split } <FILE>;

    Check all your uses of "it's". "It's" is a contraction of "it is". The possessive is "its". So, any time you're showing possession (such as in "split uses whitespace as it's default delimiter, and the special variable $_ as it's default variable"), "it's" should be "its". Ain't English wunnerful?

    Your third code sample, "my @lines = map { split(/\s/, $_) } <FILE>;", is not equivalent to the second. split(/\s/, $_) is not the same as raw split. It should be split(' ', $_), taking advantage of the special meaning of ' ' inside split.

    Toward the end, you say, "chomp returns true or false". Not quite correct. chomp returns the number of characters it chomped.

    And finally, a few misspellings, if you don't mind:

    "lightening" should be "lightning"
    "usefull" should be "useful"
    "headeaches" should be "headaches"
    "seperated" should be "separated"

    Overall, good stuff. Have a Scooby snack on me.

    *Woof*

 [reply]
RE: Map: The Basics
by ahunter on Jul 05, 2000 at 13:32
++ --
    splinky has got me doing it now... To pick a nit:

    Map works on all types of lists, not just those returned by the <> operators. In addition, $_ is a symbolic reference (see perlref), which gives map yet another use, and makes one of your examples a bit confusing. Consider:

    
    my @array = <FILE>;
    my @newarray = map { s/\#.*$//; $_ } @array;
    
    Now @newarray and @array contain the same thing, as the s/// operator alters the contents of @array! Far better to write:
    
    my @array = <FILE>;
    map s/\#.*$//, @array;
    
    This way of doing things is deliberate, as it eliminates unnecessary assignments when you want the results to go back to the same array. The <FILE> thing is special, as perl copies the file to a temporary, anonymous, array for use while processing, so you never see that it gets altered.

    Andrew.

 [reply]
++ --
      my @array = <FILE>;
         map s/\#.*$//, @array;
      
      This way of doing things is deliberate, as it eliminates unnecessary assignments when you want the results to go back to the same array.

      Except that he explicitly said that if you aren't using the list returned, you shouldn't use map, you should look into another looping structure, such as:

      my @array = <FILE>;
      s/\#.*$// foreach @array;
      
      To do otherwise obfuscates the purpose of your loop.
 [reply]
RE: Map: The Basics
by MCauth on Nov 15, 2000 at 15:41
++ --
    Good Tutorial. I'm relatively new to Perl, and this has helped me understand this useful tool...especially regarding creating tables on the fly with CGI. Also thanks to the above comments; I'm starting to see what a great resource this site is! Matt
 [reply]
Re: Map: The Basics
by elusion on Jan 11, 2001 at 23:08
++ --
    Pretty good, but make sure you watch your regex.
    
    s/[.,!\?]\B//g
    will match anything that has a boundry at the end. You need to escape the . Like this:
    s/[\.,!\?]\B//g
    

    - p u n k k i d
    "Reality is merely an illusion, albeit a very persistent one." -Albert Einstein

 [reply]
++ --
      In fact, inside a character class, a period matches a literal period. Although you can escape a period inside a character class if you want to, doing so is not necessary.
 [reply]
Re: Map: The Basics
by scott on Jan 30, 2001 at 19:30
++ --

    ... let's face it, no other language really has this one-liner of death implemented ...</em

    Actually, Mathematica does (have map). And it's also got 'Fold' which is the same but ... different.

 [reply]
++ --

      ... let's face it, no other language really has this one-liner of death implemented ...

      Actually, Mathematica does (have map).

      Yes. As does Lisp. And ML. And Haskell. And ... well, you get the idea. :-)

 [reply]

Back to Tutorials


XP Nodelet
You have 8 votes left today.
Node Status
Node Type: perltutorial
help
Chatterbox
<z28> that works great! thanks so much!

How do I use this?
Other Users
Others lurking around the monastery: (26)
davorg
rob_au
BrowserUk
jmcnamara
ar0n
t0mas
atcroft
theorbtwo
PodMaster
fever
davis
Ryszard
valdez
snafu
AcidHawk
artist
Callum
choocroot
pop
aragorn
osama
zgrebe
aging acolyte
chimni
Coruscate
gopi
Sections
Seekers of Perl Wisdom
Categorized Q&A
Cool Uses for Perl
Obfuscated Code
Snippets Section
Code Catacombs
Meditations
Perl Poetry
Craft
Perl News
Information
Guide to the Monastery
Perl Monks Site FAQ
Site How To
Voting/Experience System
Outside Links
Tutorials
Library
Perl FAQ
Your Input
Perl Monks Discussion
Make your petition
Editor Requests
Leftovers
log zgrebe out
Super Search
The St. Larry Wall Shrine
Saints in our Book
Perl Monks User Search
Newest Nodes
Quests
Awards
Random Node
Perl Monks Merchandise
Buy PerlMonks Gear and Books
Voting Booth
Top goal for 2003
Lose weight
Get In Shape
Money Related
Get a Date
Become an Uber-Coder
Win the Lottery
Get a life
Get a (new)? job
Other
[results]
  [304 votes][past polls]