Perl Best Practices [Electronic resources]

Damian Conway

نسخه متنی -صفحه : 317/ 283

19.10. Caching

Look for opportunities to use caches .

It makes sense not to do the same calculation twice, if the result is small enough that it can reasonably be stored for reuse. The simplest form of that is putting a result into an interim variable whenever it will be used more than once. That is, instead of calling the same functions twice on the same data:

print form(
'hash alone: {>>>,>>>,>>} bytes', size(\%lookup),
'data alone: {>>>,>>>,>>} bytes', total_size(\%lookup)-size(\%lookup),
'==============================',
'total:      {>>>,>>>,>>} bytes', total_size(\%lookup),
);

call them once, store the results temporarily, and retrieve them each time they're needed:

my $hash_mem  = size(\%lookup);
my $total_mem = total_size(\%lookup);
my $data_mem  = $total_mem - $hash_mem;
print form(
'hash alone: {>>>,>>>,>>} bytes',  $hash_mem,
'data alone: {>>>,>>>,>>} bytes',  $data_mem,
'==============================',
'total:      {>>>,>>>,>>} bytes',  $total_mem,
);

This often has the additional benefit of allowing you to name the interim values in ways that make the code more comprehensible.

Subroutines like size( ) and total_size( ) and functions like rand( ) or readline( ) don't always return the same result when called with the same arguments. Such subroutines are good candidates for temporary and localized reuse of results, but not for longer-term caching.

On the other hand, pure functions like sqrt( ) and int( ) and crypt( )

do always return the same result for the same list of arguments, so their return values can be stored long-term and reused whenever they're needed again. For example, if you have a subroutine that returns a case-insensitive SHA-512 digest:

sub lc_digest {
my ($text) = @_;
use Digest::SHA qw( sha512 );
return sha512(lc $text);
}

then you could (potentially) speed it up over many calls by giving it a private look-up table in which results can be cached as they're computed, as shown in Example 19-9.

Example 19-9. Adding a cache to a digest subroutine

{
my %cache;
sub lc_digest {
my $text = lc shift;
# Compute the answer only if it's not already known...
if (!exists $cache{$text}) {
use Digest::SHA qw( sha512 );
$cache{$text} = sha512($text);
}
return $cache{$text};
}
}

On the other hand, if the range of possible data for a computation is small and the number of computations is large, then it's often simpler and more efficient to pre-compute the entire look-up table and then access it directly, thereby eliminating the cost of a subroutine call. For example, suppose you were doing some kind of image processing and needed square roots for pixel intensity values in the range 0 to 255. You could write:

for my $row (@image_rows) {
for my $pixel_value (@{$row}) {
$pixel_value = sqrt($pixel_value);
}
}

or you could dramatically reduce the number of sqrt operations by precomputing all possible values and creating a look-up table:

my @sqrt_of = map { sqrt $_ } 0..255;
for my $row (@image_rows) {
for my $pixel_value (@{$row}) {
$pixel_value = $sqrt_of[$pixel_value];
}
}

For a thorough discussion of the many applications and advantages of caching, see Chapter 3 of

Higher-Order Perl , by Mark Jason Dominus (Morgan Kaufmann, 2005)