Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










24.4. Fluent Perl



We''ve touched on a few idioms in the preceding sections (not to mention
the preceding chapters), but there are many other idioms you''ll
commonly see if you read programs by accomplished Perl programmers.
When we speak of idiomatic Perl in this context, we don''t just mean a
set of arbitrary Perl expressions with fossilized meanings.
Rather, we mean Perl code that shows an understanding of the flow of
the language, what you can get away with when, and what that buys
you. And when to buy it.

We can''t hope to list all the idioms you might see--that would take a
book as big as this one. Maybe two. (See the Perl
Cookbook
, for instance.) But here are some of the
important idioms, where "important" might be defined as "that which
induces hissy fits in people who think they already know just how
computer languages ought to work".



  • Use => in place of a comma anywhere you think it improves readability:


    return bless $mess => $class;


    This reads, "Bless this mess into the specified class." Just be careful
    not to use it after a word that you don''t want autoquoted:

    sub foo () { "FOO" }
    sub bar () { "BAR" }
    print foo => bar; # prints fooBAR, not FOOBAR;


    Another good place to use => is near a literal
    comma that might get confused visually:

    join(", " => @array);


    Perl provides you with more than one way to do things so that you can
    exercise your ability to be creative. Exercise it!



  • Use the singular pronoun to increase readability:


    for (@lines) {
    $_ .= "\n";
    }


    The $_ variable is Perl''s version of a pronoun, and it essentially
    means "it". So the code above means "for each line, append a newline to
    it." Nowadays you might even spell that:

    $_ .= "\n" for @lines;


    The $_ pronoun is so important to Perl that its use
    is mandatory in grep and map.
    Here is one way to set up a cache of common results of an expensive
    function:

    %cache = map { $_ => expensive($_) } @common_args;
    $xval = $cache{$x} || expensive($x);




  • Omit the pronoun to increase readability even further.[1]



    [1]In this section, multiple bullet items in a row all refer to the subsequent example, since some of our examples illustrate more than one idiom.




  • Use loop controls with statement modifiers.


    while (<>) {
    next if /^=for\s+(index|later)/;
    $chars += length;
    $words += split;
    $lines += y/\n//;
    }


    This is a fragment of code we used to do page counts for this book. When
    you''re going to be doing a lot of work with the same variable, it''s
    often more readable to leave out the pronouns entirely, contrary to
    common belief.

    The fragment also demonstrates the idiomatic use of next
    with a statement modifier to short-circuit a loop.

    The $_ variable is always the loop control variable
    in grep and map, but the
    program''s reference to it is often implicit:


    @haslen = grep { length } @random;


    Here we take a list of random scalars and only pick the ones that have
    a length greater than 0.



  • Use for to set the antecedent for a pronoun:


    for ($episode) {
    s/fred/barney/g;
    s/wilma/betty/g;
    s/pebbles/bambam/g;
    }


    So what if there''s only one element in the loop? It''s a convenient
    way to set up "it", that is, $_. Linguistically, this is known
    as topicalization. It''s not cheating, it''s communicating.



  • Implicitly reference the plural pronoun, @_.



  • Use control flow operators to set defaults:


    sub bark {
    my Dog $spot = shift;
    my $quality = shift || "yapping";
    my $quantity = shift || "nonstop";
    ...
    }


    Here we''re implicitly using the other Perl pronoun,
    @_, which means "them". The arguments to a
    function always come in as "them". The shift
    operator knows to operate on @_ if you omit it,
    just as the ride operator at Disneyland might call out "Next!" without
    specifying which queue is supposed to shift. (There''s no point in
    specifying, because there''s only one queue that matters.)

    The || can be used to set defaults despite its
    origins as a Boolean operator, since Perl returns the first true
    value. Perl programmers often manifest a cavalier attitude toward the
    truth; the line above would break if, for instance, you tried to
    specify a quantity of 0. But as long as you never want to set either
    $quality or $quantity to a false
    value, the idiom works great. There''s no point in getting all
    superstitious and throwing in calls to defined and
    exists all over the place. You just have to
    understand what it''s doing. As long as it won''t accidentally be
    false, you''re fine.



  • Use
    assignment forms of operators, including control flow operators:


    $xval = $cache{$x} ||= expensive($x);


    Here we don''t initialize our cache at all. We just rely on the
    ||= operator to call
    expensive($x) and assign it to
    $cache{$x} only if $cache{$x} is
    false. The result of that is whatever the new value of
    $cache{$x} is. Again, we take the cavalier
    approach towards truth, in that if we cache a false value,
    expensive($x) will get called again. Maybe the
    programmer knows that''s okay, because expensive($x)
    isn''t expensive when it returns false. Or maybe the programmer knows
    that expensive($x) never returns a false value at
    all. Or maybe the programmer is just being sloppy. Sloppiness can be
    construed as a form of creativity.



  • Use loop controls as operators, not just as
    statements. And...



  • Use commas
    like small semicolons:


    while (<>) {
    $comments++, next if /^#/;
    $blank++, next if /^\s*$/;
    last if /^__END__/;
    $code++;
    }
    print "comment = $comments\nblank = $blank\ncode = $code\n";


    This shows an understanding that statement modifiers
    modify statements, while next is a mere operator. It also shows
    the comma being idiomatically used to separate expressions much like
    you''d ordinarily use a semicolon. (The difference being that the
    comma keeps the two expressions as part of the same statement, under the
    control of the single statement modifier.)



  • Use flow control to your advantage:


    while (<>) {
    /^#/ and $comments++, next;
    /^\s*$/ and $blank++, next;
    /^__END__/ and last;
    $code++;
    }
    print "comment = $comments\nblank = $blank\ncode = $code\n";


    Here''s the exact same loop again, only this time with the patterns out in front. The
    perspicacious Perl programmer understands that it compiles down to exactly the
    same internal codes as the previous example. The if modifier is
    just a backward and (or &&) conjunction, and the unless
    modifier is just a backward or (or ||) conjunction.



  • Use the implicit loops provided by the

    -n and

    -p switches.



  • Don''t put semicolon at the end of a one-line block:


    #!/usr/bin/perl -n
    $comments++, next LINE if /#/;
    $blank++, next LINE if /^\s*$/;
    last LINE if /^__END__/;
    $code++;
    END { print "comment = $comments\nblank = $blank\ncode = $code\n" }


    This is essentially the same program as before. We put an explicit
    LINE label on the loop control operators because we felt like it, but
    we didn''t really need to, since the implicit LINE loop supplied by -n is the innermost
    enclosing loop. We used an END to get the final print statement
    outside the implicit main loop, just as in awk.



  • Use here docs when the printing gets ferocious.



  • Use a meaningful delimiter on the here doc:


    END { print <<"COUNTS" }
    comment = $comments
    blank = $blank
    code = $code
    COUNTS


    Rather than using multiple prints, the fluent Perl programmer uses a
    multiline string with interpolation. And despite our calling it a
    Common Goof earlier, we''ve brazenly left off the trailing
    semicolon because it''s not necessary at the end of the END block. (If we
    ever turn it into a multiline block, we''ll put the semicolon back in.)



  • Do substitutions and translations en passant on a scalar:


    ($new = $old) =~ s/bad/good/g;


    Since lvalues are lvaluable, so to speak, you''ll often see people
    changing a value "in passing" while it''s being assigned. This could
    actually save a string copy internally (if we ever get around to
    implementing the optimization):

    chomp($answer = <STDIN>);


    Any function that modifies an argument in place can do the en passant
    trick. But wait, there''s more!



  • Don''t limit yourself to changing scalars en passant:


    for (@new = @old) { s/bad/good/g }


    Here we copy @old into @new, changing everything in passing
    (not all at once, of course--the block is executed repeatedly, one "it" at a time).



  • Pass named parameters using the fancy => comma operator.



  • Rely on assignment to a hash to do even/odd argument processing:


    sub bark {
    my DOG $spot = shift;
    my %parm = @_;
    my $quality = $parm{QUALITY} || "yapping";
    my $quantity = $parm{QUANTITY} || "nonstop";
    ...
    }
    $fido->bark( QUANTITY => "once",
    QUALITY => "woof" );


    Named parameters are often an affordable luxury. And with Perl, you
    get them for free, if you don''t count the cost of the hash assignment.



  • Repeat Boolean expressions until false.



  • Use minimal matching when appropriate.



  • Use the /e modifier to evaluate a replacement expression:


    #!/usr/bin/perl -p
    1 while s/^(.*?)(\t+)/$1 . '' '' x (length($2) * 4 - length($1) % 4)/e;


    This program fixes any file you receive from someone who mistakenly
    thinks they can redefine hardware tabs to occupy 4 spaces instead
    of 8. It makes use of several important idioms. First, the 1 while idiom
    is handy when all the work you want to do in the loop is actually done
    by the conditional. (Perl is smart enough not to warn you that you''re
    using 1 in a void context.) We have to repeat this substitution because
    each time we substitute some number of spaces in for tabs, we have to
    recalculate the column position of the next tab from the beginning.

    The (.*?) matches the smallest string it can up until the first tab,
    using the minimal matching modifier (the question mark). In this case,
    we could have used an ordinary greedy * like this: ([^\t]*). But
    that only works because a tab is a single character, so we can use a
    negated character class to avoid running past the first tab. In general,
    the minimal matcher is much more elegant, and doesn''t break if the next
    thing that must match happens to be longer than one character.

    The /e modifier does a substitution using an expression rather than
    a mere string. This lets us do the calculations we need right when
    we need them.



  • Use creative formatting and comments on complex substitutions:


    #!/usr/bin/perl -p
    1 while s{
    ^ # anchor to beginning
    ( # start first subgroup
    .*? # match minimal number of characters
    ) # end first subgroup
    ( # start second subgroup
    \t+ # match one or more tabs
    ) # end second subgroup
    }
    {
    my $spacelen = length($2) * 4; # account for full tabs
    $spacelen -= length($1) % 4; # account for the uneven tab
    $1 . '' '' x $spacelen; # make correct number of spaces
    }ex;


    This is probably overkill, but some people find it more impressive
    than the previous one-liner. Go figure.



  • Go ahead and use $` if you feel like it:


    1 while s/(\t+)/'' '' x (length($1) * 4 - length($`) % 4)/e;


    Here''s the shorter version, which uses $`, which is
    known to impact performance. Except that we''re only using the length
    of it, so it doesn''t really count as bad.



  • Use the offsets directly from the @-
    (@LAST_MATCH_START) and @+
    (@LAST_MATCH_END) arrays:


    1 while s/\t+/'' '' x (($+[0] - $-[0]) * 4 - $-[0] % 4)/e;


    This one''s even shorter. (If you don''t see any arrays there, try looking for array elements instead.) See @- and @+ in Chapter 28, "Special Names".



  • Use eval with a constant return value:


    sub is_valid_pattern {
    my $pat = shift;
    return eval { " =~ /$pat/; 1 } || 0;
    }


    You don''t have to use the eval {} operator to return a real value. Here we always return 1 if it gets to the end. However, if the pattern
    contained in $pat blows up, the eval catches it and returns undef
    to the Boolean conditional of the || operator, which turns it into
    a defined 0 (just to be polite, since undef is also false but might
    lead someone to believe that the is_valid_pattern subroutine is
    misbehaving, and we wouldn''t want that, now would we?).



  • Use modules to do all the dirty work.



  • Use object factories.



  • Use callbacks.



  • Use stacks to keep track of context.



  • Use negative subscripts to access the end of an array or string:


    use XML::Parser;
    $p = new XML::Parser Style => ''subs'';
    setHandlers $p Char => sub { $out[-1] .= $_[1] };
    push @out, ";
    sub literal {
    $out[-1] .= "C<";
    push @out, ";
    }
    sub literal_ {
    my $text = pop @out;
    $out[-1] .= $text . ">";
    }
    ...


    This is a snippet from the 250-line program we used to translate the
    XML version of the old Camel book back into pod format so we could edit
    it for this edition with a Real Text Editor.

    The first thing you''ll notice is that we rely on the XML::Parser
    module (from CPAN) to parse our XML correctly, so we don''t have to
    figure out how. That cuts a few thousand lines out of our program
    right there (presuming we''re reimplementing in Perl everything
    XML::Parser does for us,[2]
    including translation from almost any character set into UTF-8).



    [2]Actually, XML::Parser is just a
    fancy wrapper around James Clark''s expat XML parser.


    XML::Parser uses a high-level idiom called an object factory. In
    this case, it''s a parser factory. When we create an XML::Parser
    object, we tell it which style of parser interface we want, and it
    creates one for us. This is an excellent way to build a testbed
    application when you''re not sure which kind of interface will turn out
    to be the best in the long run. The subs style is just one of
    XML::Parser''s interfaces. In fact, it''s one of the oldest
    interfaces, and probably not even the most popular one these days.

    The setHandlers line shows a method call on the parser, not in arrow
    notation, but in "indirect object" notation, which lets you omit the
    parens on the arguments, among other things. The line also uses the
    named parameter idiom we saw earlier.

    The line also shows another powerful concept, the notion of a
    callback. Instead of us calling the parser to get the next item, we
    tell it to call us. For named XML tags like <literal>, this
    interface style will automatically call a subroutine of that name (or the name
    with an underline on the end for the corresponding end tag). But the
    data between tags doesn''t have a name, so we set up a Char callback
    with the setHandlers method.

    Next we initialize the @out array, which is a stack of outputs. We
    put a null string into it to represent that we haven''t collected any
    text at the current tag embedding level (0 initially).

    Now is when that callback comes back in. Whenever we see text, it
    automatically gets appended to the final element of the array, via the
    $out[-1] idiom in the callback. At the outer tag level, $out[-1]
    is the same as $out[0], so $out[0] ends up with our whole
    output. (Eventually. But first we have to deal with tags.)

    Suppose we see a <literal> tag. Then the literal subroutine
    gets called, appends some text to the current output, then pushes a new
    context onto the @out stack. Now any text up until the closing tag
    gets appended to that new end of the stack. When we hit the closing
    tag, we pop the $text we''ve collected back off the @out stack,
    and append the rest of the transmogrified data to the new (that is, the
    old) end of stack, the result of which is to translate the XML string, <literal>text</literal>, into the corresponding pod string, C<text>.

    The subroutines for the other tags are just the same, only different.



  • Use my without assignment to create an empty array or hash.



  • Split the default string on whitespace.



  • Assign to lists of variables to collect however many you want.



  • Use autovivification of undefined references to create them.



  • Autoincrement undefined array and hash elements to create them.



  • Use autoincrement of a %seen array to determine uniqueness.



  • Assign to a handy my temporary in the conditional.



  • Use the autoquoting behavior of braces.



  • Use an alternate quoting mechanism to interpolate double quotes.



  • Use the ?: operator to switch between two arguments to a printf.



  • Line up printf args with their % field:


    my %seen;
    while (<>) {
    my ($a, $b, $c, $d) = split;
    print unless $seen{$a}{$b}{$c}{$d}++;
    }
    if (my $tmp = $seen{fee}{fie}{foe}{foo}) {
    printf qq(Saw "fee fie foe foo" [sic] %d time%s.\n"),
    $tmp, $tmp == 1 ? " : "s";
    }


    These nine lines are just chock full of idioms. The first line makes
    an empty hash because we don''t assign anything to it. We iterate over
    input lines setting "it", that is, $_, implicitly,
    then using an argumentless split which splits "it"
    on whitespace. Then we pick off the four first words with a list
    assignment, throwing any subsequent words away. Then we remember the
    first four words in a four-dimensional hash, which automatically
    creates (if necessary) the first three reference elements and final
    count element for the autoincrement to increment. (Under use
    warnings
    , the autoincrement will never warn that you''re
    using undefined values, because autoincrement is an accepted way to
    define undefined values.) We then print out the line if we''ve never
    seen a line starting with these four words before, because the
    autoincrement is a postincrement, which, in addition to incrementing
    the hash value, will return the old true value if there was one.

    After the loop, we test %seen again to see if a
    particular combination of four words was seen. We make use of the
    fact that we can put a literal identifier into braces and it will be
    autoquoted. Otherwise, we''d have to say
    $seen{"fee"}{"fie"}{"foe"}{"foo"}, which is a drag
    even when you''re not running from a giant.

    We assign the result of $seen{fee}{fie}{foe}{foo}
    to a temporary variable even before testing it in the Boolean context
    provided by the if. Because assignment returns its
    left value, we can still test the value to see if it was true. The
    my tells your eye that it''s a new variable, and
    we''re not testing for equality but doing an assignment. It would also
    work fine without the my, and an expert Perl
    programmer would still immediately notice that we used one
    = instead of two ==. (A
    semiskilled Perl programmer might be fooled, however. Pascal
    programmers of any skill level will foam at the mouth.)

    Moving on to the printf statement, you can see the
    qq() form of double quotes we used so that we could
    interpolate ordinary double quotes as well as a newline. We could''ve
    directly interpolated $tmp there as well, since
    it''s effectively a double-quoted string, but we chose to do further
    interpolation via printf. Our temporary
    $tmp variable is now quite handy, particularly
    since we don''t just want to interpolate it, but also test it in the
    conditional of a ?: operator to see whether we
    should pluralize the word "time". Finally, note that we lined up the
    two fields with their corresponding % markers in
    the printf format. If an argument is too long to
    fit, you can always go to the next line for the next argument, though
    we didn''t have to in this case.



Whew! Had enough? There are many more idioms we could discuss, but
this book is already sufficiently heavy. But we''d like to
talk about one more idiomatic use of Perl, the writing of program
generators.







/ 875