6.10. Speeding Up Interpolated Matches
6.10.1. Problem
You want your function or program to take one or more regular
expressions as arguments, but doing so seems to run slower than using
literals.
6.10.2. Solution
To
overcome this bottleneck, if you have only one pattern whose value
won''t change during the entire run of a program, store it in a string
and use /$pattern/o:
while ($line = <>) {
if ($line =~ /$pattern/o) {
# do something
}
}
However, that won''t work for more than one pattern. Precompile the
pattern strings using the qr// operator, then
match each result against each of the targets:
@pats = map { qr/$_/ } @strings;
while ($line = <>) {
for $pat (@pats) {
if ($line =~ /$pat/) {
# do something;
}
}
}
6.10.3. Discussion
When Perl compiles a program, it converts patterns into an internal
form. This conversion occurs at compile time for patterns without
variables, but at runtime for those that do. Interpolating variables
into patterns, as in /$pattern/, can slow your
program down—sometimes substantially. This is particularly
noticeable when $pattern changes often.The /o modifier locks in the values from variables
interpolated into the pattern. That is, variables are interpolated
only once: the first time the match is run. Because Perl ignores any
later changes to those variables, make sure to use it only on
unchanging variables.Using /o on patterns without interpolated
variables doesn''t hurt, but it also doesn''t help. The
/o modifier is also of no help when you have an
unknown number of regular expressions and need to check one or more
strings against all of these patterns, since you need to vary the
patterns'' contents. Nor is it of any use when the interpolated
variable is a function argument, since each call to the function
gives the variable a new value.Example 6-4 is an example of the slow but
straightforward technique for matching many patterns against many
lines. The array @popstates contains the standard
two-letter abbreviations for some of the places in the heartland of
North America where we normally refer to soft drinks as
pop (soda to us means
either plain soda water or else handmade delicacies from the soda
fountain at the corner drugstore, preferably with ice cream). The
goal is to print any line of input that contains any of those places,
matching them at word boundaries only. It doesn''t use
/o, because the variable that holds the pattern
keeps changing.
Example 6-4. popgrep1
#!/usr/bin/perl
# popgrep1 - grep for abbreviations of places that say "pop"
# version 1: slow but obvious way
@popstates = qw(CO ON MI WI MN);
LINE: while (defined($line = <>)) {
for $state (@popstates) {
if ($line =~ /\b$state\b/) { # this is s l o o o w
print; next LINE;
}
}
}
Such a direct, obvious, brute-force approach is also distressingly
slow, because Perl has to recompile all patterns with each line of
input. A better solution is the qr// operator
(used in Example 6-5), which first appeared in v5.6
and offers a way to step around this bottleneck. The
qr// operator quotes and possibly compiles its
string argument, returning a scalar to use in later pattern matches.
If that scalar is used by itself in the interpolated match, Perl uses
the cached compiled form and so avoids recompiling the pattern.
Example 6-5. popgrep2
#!/usr/bin/perl
# popgrep2 - grep for abbreviations of places that say "pop"
# version 2: fast way using qr//
@popstates = qw(CO ON MI WI MN);
@poppats = map { qr/\b$_\b/ } @popstates;
LINE: while (defined($line = <>)) {
for $pat (@poppats) {
if ($line =~ /$pat/) { # this is fast
print; next LINE;
}
}
}
Print the array @poppats and you''ll see strings
like this:
(?-xism:\bCO\b)
(?-xism:\bON\b)
(?-xism:\bMI\b)
(?-xism:\bWI\b)
(?-xism:\bMN\b)
Those are used for the stringified
print value of the qr// operator, or to build up a
larger pattern if the result is interpolated into a larger string.
But also associated with each is a cached, compiled version of that
string as a pattern, and this is what Perl uses when the
interpolation into a match or substitution operator contains nothing
else.
6.10.4. See Also
The qr// operator in
perlop(1) and in the section on "The qr// quote
regex operator" in Chapter 5 of Programming
Perl