19.10. Caching
Look for opportunities to use caches .
It makes sense not to do the same calculation twice, if the result is small enough that it can reasonably be stored for reuse. The simplest form of that is putting a result into an interim variable whenever it will be used more than once. That is, instead of calling the same functions twice on the same data:
print form(
'hash alone: {>>>,>>>,>>} bytes', size(\%lookup),
'data alone: {>>>,>>>,>>} bytes', total_size(\%lookup)-size(\%lookup),
'==============================',
'total: {>>>,>>>,>>} bytes', total_size(\%lookup),
);
call them once, store the results temporarily, and retrieve them each time they're needed:
my $hash_mem = size(\%lookup);
my $total_mem = total_size(\%lookup);
my $data_mem = $total_mem - $hash_mem;
print form(
'hash alone: {>>>,>>>,>>} bytes', $hash_mem,
'data alone: {>>>,>>>,>>} bytes', $data_mem,
'==============================',
'total: {>>>,>>>,>>} bytes', $total_mem,
);
This often has the additional benefit of allowing you to name the interim values in ways that make the code more comprehensible.Subroutines like size( ) and total_size( ) and functions like rand( ) or readline( ) don't always return the same result when called with the same arguments. Such subroutines are good candidates for temporary and localized reuse of results, but not for longer-term caching.On the other hand, pure functions like sqrt( ) and int( ) and crypt( ) do always return the same result for the same list of arguments, so their return values can be stored long-term and reused whenever they're needed again. For example, if you have a subroutine that returns a case-insensitive SHA-512 digest:
sub lc_digest {
my ($text) = @_;
use Digest::SHA qw( sha512 );
return sha512(lc $text);
}
then you could (potentially) speed it up over many calls by giving it a private look-up table in which results can be cached as they're computed, as shown in Example 19-9.
Example 19-9. Adding a cache to a digest subroutine
{
my %cache;
sub lc_digest {
my $text = lc shift;
# Compute the answer only if it's not already known...
if (!exists $cache{$text}) {
use Digest::SHA qw( sha512 );
$cache{$text} = sha512($text);
}
return $cache{$text};
}
}
On the other hand, if the range of possible data for a computation is small and the number of computations is large, then it's often simpler and more efficient to pre-compute the entire look-up table and then access it directly, thereby eliminating the cost of a subroutine call. For example, suppose you were doing some kind of image processing and needed square roots for pixel intensity values in the range 0 to 255. You could write:
for my $row (@image_rows) {
for my $pixel_value (@{$row}) {
$pixel_value = sqrt($pixel_value);
}
}
or you could dramatically reduce the number of sqrt operations by precomputing all possible values and creating a look-up table:
my @sqrt_of = map { sqrt $_ } 0..255;
for my $row (@image_rows) {
for my $pixel_value (@{$row}) {
$pixel_value = $sqrt_of[$pixel_value];
}
}
For a thorough discussion of the many applications and advantages of caching, see Chapter 3 of Higher-Order Perl , by Mark Jason Dominus (Morgan Kaufmann, 2005)