
![]() | ![]() |
6.5. Finding the Nth Occurrence of a Match
6.5.1. Problem
You want to find the
Nth match in a string, not just the first one.
For example, you'd like to find the word preceding the third
occurrence of "fish":One fish two fish red fish blue fish
6.5.2. Solution
Use the /g modifier
in a while loop, keeping count of matches:$WANT = 3;
$count = 0;
while (/(\w+)\s+fish\b/gi) {
if (++$count = = $WANT) {
print "The third fish is a $1 one.\n";
# Warning: don't 'last' out of this loop
}
}
The third fish is a red one.
Or use a repetition count and repeated pattern like this:/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
6.5.3. Discussion
As explained in this chapter's
Introduction, using the /g modifier in scalar
context creates something of a progressive
match, useful in while loops. This is
commonly used to count the number of times a pattern matches in a
string:# simple way with while loop
$count = 0;
while ($string =~ /PAT/g) {
$count++; # or whatever you'd like to do here
}
# same thing with trailing while
$count = 0;
$count++ while $string =~ /PAT/g;
# or with for loop
for ($count = 0; $string =~ /PAT/g; $count++) { }
# Similar, but this time count overlapping matches
$count++ while $string =~ /(?=PAT)/g;
To find the Nth match, it's easiest to keep your
own counter. When you reach the appropriate N,
do whatever you care to. A similar technique could be used to find
every Nth match by checking for multiples of
N using the modulus operator. For example,
(++$count %
3) = = 0
would be used to find every third match.If this is too much bother, you can always extract all matches and
then hunt for the ones you'd like.$pond = 'One fish two fish red fish blue fish';
# using a temporary
@colors = ($pond =~ /(\w+)\s+fish\b/gi); # get all matches
$color = $colors[2]; # then the one we want
# or without a temporary array
$color = ( $pond =~ /(\w+)\s+fish\b/gi )[2]; # just grab element 3
print "The third fish in the pond is $color.\n";
The third fish in the pond is red.
To find all even-numbered fish:$count = 0;
$_ = 'One fish two fish red fish blue fish';
@evens = grep { $count++ % 2 = = 0 } /(\w+)\s+fish\b/gi;
print "Even numbered fish are @evens.\n";
Even numbered fish are two blue.
For substitution, the replacement value should be a code expression
that returns the proper string. Make sure to return the original as a
replacement string for cases you aren't interested in changing. Here
we fish out the fourth specimen and turn it into a snack:$count = 0;
s{
\b # makes next \w more efficient
( \w+ ) # this is what we'll be changing
(
\s+ fish \b
)
}{
if (++$count = = 4) {
"sushi" . $2;
} else {
$1 . $2;
}
}gex;
One fish two fish red fish sushi fish
Picking out the last match instead of the first one is a fairly
common task. The easiest way is to skip the beginning part greedily.
After /.*\b(\w+)\s+fish\b/s, for example, the
$1 variable has the last fish.Another way to get arbitrary counts is to make a global match in list
context to produce all hits, then extract the desired element of that
list:$pond = 'One fish two fish red fish blue fish swim here.';
$color = ( $pond =~ /\b(\w+)\s+fish\b/gi )[-1];
print "Last fish is $color.\n";
Last fish is blue.
To express this same notion of finding the last match in a single
pattern without /g, use the negative lookahead
assertion (?!THING). When you want the last match
of arbitrary pattern P, you find P followed by any amount of not P
through the end of the string. The general construct is
P(?!.*P)*, which can be broken up for legibility:m{
P # find some pattern P
(?! # mustn't be able to find
.* # something
P # and P
)
}xs
That leaves us with this approach for selecting the last fish:$pond = 'One fish two fish red fish blue fish swim here.';
if ($pond =~ m{
\b ( \w+) \s+ fish \b
(?! .* \b fish \b )
}six )
{
print "Last fish is $1.\n";
} else {
print "Failed!\n";
}
Last fish is blue.
This approach has the advantage that it can fit in just one pattern,
which makes it suitable for similar situations as shown in Recipe 6.18. It has its disadvantages, though. It's
obviously much harder to read and understand, although once you learn
the formula, it's not too bad. However, it also runs more
slowly—around half as fast on the data set tested here.
6.5.4. See Also
The behavior of m//g in scalar context is given in
the "Regexp Quote-like Operators" section of
perlop(1), and in the "Pattern Matching
Operators" section of Chapter 5 of Programming
Perl; zero-width positive lookahead assertions are shown
in the "Regular Expressions" section of
perlre(1), and in the "Fancy Patterns" section
of Chapter 5 of Programming Perl
![]() | ![]() | ![]() |
6.4. Commenting Regular Expressions | ![]() | 6.6. Matching Within Multiple Lines |

Copyright © 2003 O'Reilly & Associates. All rights reserved.