Previous Table of Contents Next


Session 4
Translations

This session introduces the tr/// operator, which is often perceived as related to s/// and m//, but the similarity is cosmetic only—it has nothing to do with regular expressions. tr/// translates one set of characters into another set (Figure 2-7).


Figure 2-7  tr/// translates one set of characters into another


The tr/// Operator
tr/FROMCHARS/TOCHARS/ replaces FROMCHARS with TOCHARS, returning the number of characters translated.

A synonym for tr/// is y///, in deference to the UNIX sed utility.

FROMCHARS and TOCHARS can be lists of individual characters or ranges: a-z is a range of the 26 letters from a to z.

Some examples:

tr/a-z/A-Z/;

uppercases every letter of $_

tr/pet/sod/;

In $_, replaces ps with ss, es with os, and ts with ds in.

$string =~ tr/0-9/a-j/;

replaces (in $string) all 0s with as, 1s with bs, and so on.

Some people write as if their Caps Lock key were permanently on. YOU KNOW THE TYPE, DON’T YOU?????? Let’s mute their enthusiasm a little bit with tr/// (Listing 2-14).

Listing 2-14 mute: Using tr/// to translate uppercase into lowercase

#!/usr/bin/perl -wn
tr/A-Z/a-z/;
print;

% mute
RESULT: HEY IZ ANYBODY HERE?
hey iz anybody here?
I AM THE GREATEST
i am the greatest

tr/A-Z/a-z/ translates the range of characters from A to Z, inclusive, to the range of characters from a to z. So A becomes a, B becomes b, and so forth. Ranges use ASCII order: tr/Z-A/z-a/ and tr/A-5/a-5/ won’t do what you want. (If you’re not familiar with ASCII, see Appendix B.)

You can mix individual characters with ranges.

tr/A-CF/W-Z/

translates A to W, B to X, C to Y, and F to Z.

tr/A-C;!X-Z/123##123/

translates A and X to 1, B and Y to 2, C and Z to 3, and ; and ! to #.

If parentheses/brackets/curly braces are used to delimit FROMCHARS, then you can use parentheses/brackets/curly braces to delimit TOCHARS.

tr[0-9][##########]

translates all digits to pound signs, and

tr{.!}(!.)

swaps periods and exclamation points.

If TOCHARS is shorter than FROMCHARS, the last character of TOCHARS is used repeatedly.

tr/A-Z/a-k/

translates A to a, B to b, C to c,..., J to j, and K through Z to k.

Because tr/// returns the number of replaced characters, you can use it to count the occurrences of certain characters. (If you want to count the occurrences of substrings within a string, check out the /g modifier to m// described in Session 8 of this chapter.)

$number_of_letters  = ($string =~ tr/a-zA-Z/a-zA-Z/);

$number_of_spaces   = ($string =~ tr/ / /);

If FROMCHARS is the same as TOCHARS, tr/// lets you omit TOCHARS.

$number_of_digits   = ($string =~ tr/0-9//);

$number_of_capitals = ($string =~ tr/A-Z//);

tr/// has three optional modifiers: /c, /d, and /s.


The /C, /D, and /S Modifiers for tr///
/c complements FROMCHARS.
/d deletes any matched but unreplaced characters.
/s squashes duplicate characters into just one.

The /c modifier complements FROMCHARS: it translates any characters that aren’t in the FROMCHARS set. To change any nonalphabetic characters in $string to spaces, you could say:

tr/A-Za-z/ /c

You could remove all capital letters with the /d modifier, which deletes any characters in FROMCHARS lacking a counterpart in TOCHARS.

tr/A-Z//d;

deletes all capital letters.

Finally, you can squash duplicate replaced characters with the /s modifier:

tr/!//s;

replaces multiple occurrences of ! with just one; equivalent to s/!+/!/g.

As with s/// and m//, you can combine tr/// modifiers.

tr/A-Z//dc

deletes any characters complementing the range A-Z.

Quiz 4

1.  Describe the behavior of
tr/A-Z/D-ZABC/
a.  It won’t work because D-ZABC isn’t a legal range.
b.  It shifts each capital letter forward three places, ignoring X, Y, and Z.
c.  It shifts each capital letter forward three places, wrapping around at the end of the alphabet so that X goes to A, Y goes to B, and Z goes to C.
d.  It counts the number of capital letters, nothing more.
2.  How would you use tr/// to remove punctuation?
a.  
tr/;.?!:',"//c;
b.  
tr/;.?!:',"//d;
c.  
tr/A-Za-z0-9/A-Za-z0-9/c;
d.  
tr/;.?!:',"//dc;
3.  Which of the following won’t shorten a string containing multiple consecutive spaces?
a.  
tr/ / /s;
b.  
tr/A-Za-z0-9//cs;
c.  
tr/ //ds;
d.  
tr/  / /;
4.  What will the following program print? (Careful—this is tricky!)
#!/usr/bin/perl
$gulliver = 'Lilliput';
($travels = $gulliver) =~ tr/a-z/A-Z/;
print $travels, $gulliver;
a.  
LilliputLilliput
b.  
LILLIPUTLilliput
c.  
LilliputLILLIPUT
d.  
7LILLIPUT

Exercise 4

Difficulty: Easy

Write a program that will Rot13 messages. Rot13 is a simple, popular encoding used to disguise messages (and offensive posts on USENET in particular). It shifts each letter forward 13 places in the alphabet, wrapping around at the end. Because each letter moves exactly halfway through the 26-letter alphabet, the encoding and decoding steps are the same: You Rot13 a message to encode it and then rot13 it again to decode it. (But what would you do if your alphabet had 31 letters, like the Russian alphabet?)

Hint: You can do this in one line.


Previous Table of Contents Next