Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










24.2. Efficiency



While most of the work of programming may be simply getting your program
working properly, you may find yourself wanting more bang for the buck
out of your Perl program. Perl''s rich set of operators, data types, and
control constructs are not necessarily intuitive when it comes to speed
and space optimization. Many trade-offs were made during Perl''s design,
and such decisions are buried in the guts of the code. In general, the
shorter and simpler your code is, the faster it runs, but there are
exceptions. This section attempts to help you make it work just a wee
bit better.

If you want it to work a lot better, you can play with the Perl
compiler backend described in Chapter 18, "Compiling", or rewrite your
inner loop as a C extension as illustrated in Chapter 21, "Internals and Externals".


Note that optimizing for time may sometimes cost you in space or
programmer efficiency (indicated by conflicting hints below). Them''s
the breaks. If programming was easy, they wouldn''t need something as
complicated as a human being to do it, now would they?

24.2.1. Time Efficiency





  • Use hashes instead of linear searches. For example, instead of searching
    through @keywords to see if $_ is a keyword, construct a hash
    with:


    my %keywords;
    for (@keywords) {
    $keywords{$_}++;
    }


    Then you can quickly tell if $_ contains a keyword by testing
    $keyword{$_} for a nonzero value.




  • Avoid subscripting when a foreach or list operator
    will do. Not only is subscripting an extra operation, but if your
    subscript variable happens to be in floating point because you did
    arithmetic, an extra conversion from floating point back to integer is
    necessary. There''s often a better way to do it. Consider using
    foreach, shift, and
    splice operations. Consider saying use
    integer
    .




  • Avoid goto. It scans outward from your current location for the
    indicated label.




  • Avoid printf when print will do.




  • Avoid $& and its two buddies, $` and $''. Any occurrence in
    your program causes all matches to save the searched string for
    possible future reference. (However, once you''ve blown it, it doesn''t
    hurt to have more of them.)




  • Avoid using eval on a string. An
    eval of a string (although not of a
    BLOCK) forces recompilation every time
    through. The Perl parser is pretty fast for a parser, but that''s not
    saying much. Nowadays there''s almost always a better way to do what
    you want anyway. In particular, any code that uses
    eval merely to construct variable names is obsolete
    since you can now do the same directly using symbolic references:


    no strict ''refs'';
    $name = "variable";
    $$name = 7; # Sets $variable to 7





  • Avoid evalSTRING inside
    a loop. Put the loop into the eval instead, to
    avoid redundant recompilations of the code. See the
    study operator in Chapter 29, "Functions"
    for an example of this.



  • Avoid run-time-compiled patterns. Use the
    /pattern/o
    (once only) pattern modifier to avoid pattern recompilation when the
    pattern doesn''t change over the life of the process. For patterns that
    change occasionally, you can use the fact that a null pattern refers
    back to the previous pattern, like this:


    "foundstring" =~ /$currentpattern/;        # Dummy match (must succeed).
    while (<>) {
    print if //;
    }


    Alternatively, you can precompile your regular expression using the qr
    quote construct. You can also use eval to recompile a subroutine
    that does the match (if you only recompile occasionally). That works even better if you compile a bunch of matches into a single subroutine, thus amortizing the subroutine call overhead.



  • Short-circuit alternation is often faster than the corresponding regex. So:


    print if /one-hump/ || /two/;


    is likely to be faster than:

    print if /one-hump|two/;


    at least for certain values of one-hump and two. This is because the
    optimizer likes to hoist certain simple matching operations up into
    higher parts of the syntax tree and do very fast matching with a
    Boyer-Moore algorithm. A complicated pattern tends to defeat this.




  • Reject common cases early with next if. As with simple regular
    expressions, the optimizer likes this. And it just makes sense to avoid
    unnecessary work. You can typically discard comment lines and blank
    lines even before you do a split or chop:


    while (<>) {
    next if /^#/;
    next if /^$/;
    chop;
    @piggies = split(/,/);
    ...
    }





  • Avoid regular expressions with many quantifiers or with big
    {MIN,MAX} numbers on parenthesized expressions. Such patterns
    can result in exponentially slow backtracking behavior unless the
    quantified subpatterns match on their first "pass". You can also
    use the (?>...) construct to force a subpattern to either
    match completely or fail without backtracking.




  • Try to maximize the length of any nonoptional literal strings in
    regular expressions. This is counterintuitive, but longer patterns
    often match faster than shorter patterns. That''s because the optimizer
    looks for constant strings and hands them off to a Boyer-Moore search,
    which benefits from longer strings. Compile your pattern with Perl''s

    -Dr debugging switch to see what Dr. Perl thinks the longest literal
    string is.




  • Avoid expensive subroutine calls in tight loops. There is overhead
    associated with calling subroutines, especially when you pass lengthy
    parameter lists or return lengthy values. In order of increasing
    desperation, try passing values by reference, passing values as
    dynamically scoped globals, inlining the subroutine, or rewriting the
    whole loop in C. (Better than all of those solutions is if you can define the
    subroutine out of existence by using a smarter algorithm.)




  • Avoid getc for anything but single-character terminal I/O. In fact,
    don''t use it for that either. Use sysread.



  • Avoid frequent substrs on long strings, especially if the string
    contains UTF-8. It''s okay to use substr at the front of a string,
    and for some tasks you can keep the substr at the front by "chewing up"
    the string as you go with a four-argument substr, replacing the
    part you grabbed with ":


    while ($buffer) {
    process(substr($buffer, 0, 10, "));
    }





  • Use pack and unpack instead of multiple substr invocations.



  • Use substr as an lvalue rather than concatenating substrings. For
    example, to replace the fourth through seventh characters of $foo with
    the contents of the variable $bar, don''t do this:


    $foo = substr($foo,0,3) . $bar . substr($foo,7);


    Instead, simply identify the part of the string to be replaced and
    assign into it, as in:

    substr($foo, 3, 4) = $bar;


    But be aware that if $foo is a huge string and $bar isn''t
    exactly the length of the "hole", this can do a lot of copying too. Perl tries to minimize that by copying from either the front or the
    back, but there''s only so much it can do if the substr is in the
    middle.



  • Use s/// rather than concatenating substrings. This is especially
    true if you can replace one constant with another of the same size. This results in an in-place substitution.



  • Use statement modifiers and equivalent and and or operators
    instead of full-blown conditionals. Statement modifiers (like $ring
    = 0 unless $engaged
    ) and logical operators avoid the overhead of
    entering and leaving a block. They can often be more readable too.



  • Use $foo = $a || $b || $c. This is much faster (and shorter to say)
    than:


    if ($a) {
    $foo = $a;
    }
    elsif ($b) {
    $foo = $b;
    }
    elsif ($c) {
    $foo = $c;
    }


    Similarly, set default values with:

    $pi ||= 3;




  • Group together any tests that want the same initial string. When testing
    a string for various prefixes in anything resembling a switch structure,
    put together all the /^a/ patterns, all the /^b/ patterns, and so
    on.




  • Don''t test things you know won''t match. Use last or elsif to
    avoid falling through to the next case in your switch statement.



  • Use special operators like study, logical string operations, pack
    ''u''
    , and unpack ''%'' formats.



  • Beware of the tail wagging the dog. Misstatements resembling
    (<STDIN>)[0] can cause Perl much unnecessary work. In accordance
    with Unix philosophy, Perl gives you enough rope to hang yourself.



  • Factor operations out of loops. The Perl optimizer does not attempt to
    remove invariant code from loops. It expects you to exercise some sense.



  • Strings can be faster than arrays.



  • Arrays can be faster than strings. It all depends on
    whether you''re going to reuse the strings or arrays and which
    operations you''re going to perform. Heavy modification of each element
    implies that arrays will be better, and occasional modification of some
    elements implies that strings will be better. But you just have to try
    it and see.



  • my variables are faster than local variables.



  • Sorting on a manufactured key array may be faster than using a fancy
    sort subroutine. A given array value will usually be compared multiple
    times, so if the sort subroutine has to do much recalculation, it''s
    better to factor out that calculation to a separate pass before the
    actual sort.



  • If you''re deleting characters, tr/abc//d is faster than s/[abc]//g.



  • print with a comma separator may be faster than concatenating
    strings. For example:


    print $fullname{$name} . " has a new home directory " .
    $home{$name} . "\n";


    has to glue together the two hashes and the two fixed strings before
    passing them to the low-level print routines, whereas:

    print $fullname{$name}, " has a new home directory ",
    $home{$name}, "\n";


    doesn''t. On the other hand, depending on the values and the
    architecture, the concatenation may be faster. Try it.



  • Prefer join(", ...) to a series of concatenated strings. Multiple
    concatenations may cause strings to be copied back and forth multiple
    times. The join operator avoids this.



  • split on a fixed string is generally faster than split on a
    pattern. That is, use split(/ /, ...) rather than split(/ +/, ...)
    if you know there will only be one space. However, the patterns
    /\s+/, /^/, and / / are specially optimized, as is the special split
    on whitespace.




  • Pre-extending an array or string can save some time. As strings and
    arrays grow, Perl extends them by allocating a new copy with some room
    for growth and copying in the old value. Pre-extending a string with
    the x operator or an array by setting
    $#array can prevent this
    occasional overhead and reduce memory fragmentation.




  • Don''t undef long strings and arrays if they''ll be reused for the same
    purpose. This helps prevent reallocation when the string or array must
    be re-extended.



  • Prefer "\0" x 8192 over unpack("x8192",()).



  • system("mkdir ...") may be faster on multiple directories if the
    mkdir syscall isn''t available.




  • Avoid using eof if return values will already indicate it.




  • Cache entries from files (like passwd and
    group files) that are apt to be reused. It''s
    particularly important to cache entries from the network. For
    example, to cache the return value from
    gethostbyaddr when you are converting numeric
    addresses (like 204.148.40.9) to names (like
    "www.oreilly.com"), you can use something like:


    sub numtoname {
    local ($_) = @_;
    unless (defined $numtoname{$_}) {
    my (@a) = gethostbyaddr(pack(''C4'', split(/\./)),2);
    $numtoname{$_} = @a > 0 ? $a[0] : $_;
    }
    return $numtoname{$_};
    }





  • Avoid unnecessary syscalls. Operating system calls tend to be rather
    expensive. So for example, don''t call the time operator when a
    cached value of $now would do. Use the special _ filehandle to
    avoid unnecessary stat(2) calls. On some systems, even a minimal
    syscall may execute a thousand instructions.




  • Avoid unnecessary system calls. The system function has to fork a
    subprocess in order to execute the program you specify--or worse, execute a
    shell to execute the program. This can easily execute a
    million instructions.




  • Worry about starting subprocesses, but only if they''re frequent. Starting a single pwd, hostname, or find process isn''t going to
    hurt you much--after all, a shell starts subprocesses all day long. We
    do occasionally encourage the toolbox approach, believe it or not.



  • Keep track of your working directory yourself rather than calling pwd
    repeatedly. (A standard module is provided for this. See
    Cwd in Chapter 30, "The Standard Perl Library".)




  • Avoid shell metacharacters in commands--pass lists to system and
    exec where appropriate.




  • Set the sticky bit on the Perl interpreter on machines without demand
    paging:


    chmod +t /usr/bin/perl




  • Allowing built-in functions'' arguments to default to $_ doesn''t make your
    program faster.



24.2.2. Space Efficiency





  • You can use vec for compact integer array storage
    if the integers are of fixed width. (Integers of variable width can
    be stored in a UTF-8 string.)



  • Prefer numeric values over equivalent string values--they require less
    memory.



  • Use substr to store constant-length strings in a longer string.



  • Use the Tie::SubstrHash module for very compact storage of a hash array,
    if the key and value lengths are fixed.



  • Use __END__ and the DATA filehandle to avoid storing program data
    as both a string and an array.



  • Prefer each to keys where order doesn''t matter.



  • Delete or undef globals that are no longer in use.



  • Use some kind of DBM to store hashes.



  • Use temp files to store arrays.



  • Use pipes to offload processing to other tools.



  • Avoid list operations and entire file slurps.



  • Avoid using tr///. Each tr/// expression must store a
    sizable translation table.



  • Don''t unroll your loops or inline your subroutines.



24.2.3. Programmer Efficiency





  • Use defaults.



  • Use funky shortcut command-line switches like

    -a ,

    -n ,

    -p ,

    -s , and

    -i .



  • Use for to mean foreach.



  • Run system commands with backticks.



  • Use <*> and such.



  • Use patterns created at run time.



  • Use *, +, and {} liberally in your patterns.



  • Process whole arrays and slurp entire files.



  • Use getc.



  • Use $`, $&, and $''.



  • Don''t check error values on open, since <HANDLE> and
    printHANDLE will simply behave as no-ops when given an invalid handle.



  • Don''t close your files--they''ll be closed on the next open.



  • Don''t pass subroutine arguments. Use globals.



  • Don''t name your subroutine parameters. You can access them directly as
    $_[EXPR].



  • Use whatever you think of first.



24.2.4. Maintainer Efficiency




  • Don''t use defaults.



  • Use foreach to mean foreach.



  • Use meaningful loop labels with next and last.



  • Use meaningful variable names.



  • Use meaningful subroutine names.



  • Put the important thing first on the line using and, or, and
    statement modifiers (like exit if $done).



  • Close your files as soon as you''re done with them.



  • Use packages, modules, and classes to hide your implementation details.



  • Pass arguments as subroutine parameters.



  • Name your subroutine parameters using my.



  • Parenthesize for clarity.



  • Put in lots of (useful) comments.



  • Include embedded pod documentation.



  • use warnings.



  • use strict.



24.2.5. Porter Efficiency





  • Wave a handsome tip under his nose.



  • Avoid functions that aren''t implemented everywhere. You can use eval
    tests to see what''s available.



  • Use the Config module or the $^O variable to find out what kind of
    machine you''re running on.



  • Don''t expect native float and double to pack and unpack on foreign
    machines.



  • Use network byte order (the "n" and "N" formats for pack) when
    sending binary data over the network.



  • Don''t send binary data over the network. Send ASCII. Better, send UTF-8.
    Better yet, send money.



  • Check $] or $^V to see if the current version supports all the
    features you use.



  • Don''t use $] or $^V. Use require or use with a version number.



  • Put in the eval exec hack even if you don''t use it, so your program
    will run on those few systems that have Unix-like shells but don''t
    recognize the #! notation.



  • Put the #!/usr/bin/perl line in even if you don''t use it.



  • Test for variants of Unix commands. Some find programs can''t handle the -xdev switch,
    for example.



  • Avoid variant Unix commands if you can do it internally. Unix commands
    don''t work too well on MS-DOS or VMS.



  • Put all your scripts and manpages into a single network filesystem that''s
    mounted on all your machines.



  • Publish your module on CPAN. You''ll get lots of feedback if it''s not
    portable.



24.2.6. User Efficiency





  • Instead of making users enter data line by line, pop users into
    their favorite editor.




  • Better yet, use a GUI like the Perl/Tk extension, where users can
    control the order of events. (Perl/Tk is available on CPAN.)



  • Put up something for users to read while you continue doing work.



  • Use autoloading so that the program appears to run faster.



  • Give the option of helpful messages at every prompt.



  • Give a helpful usage message if users don''t give correct input.



  • Display the default action at every prompt, and maybe a few
    alternatives.



  • Choose defaults for beginners. Allow experts to change the defaults.



  • Use single character input where it makes sense.



  • Pattern the interaction after other things the user is familiar with.




  • Make error messages clear about what needs fixing. Include all
    pertinent information such as filename and error code, like this:


    open(FILE, $file) or die "$0: Can''t open $file for reading: $!\n";




  • Use fork && exit to detach from the terminal when the rest of the script is just batch
    processing.



  • Allow arguments to come from either the command line or standard
    input.



  • Don''t put arbitrary limitations into your program.



  • Prefer variable-length fields over fixed-length fields.



  • Use text-oriented network protocols.



  • Tell everyone else to use text-oriented network protocols!



  • Tell everyone else to tell everyone else to use text-oriented network protocols!!!



  • Be vicariously lazy.



  • Be nice.









/ 875