4.8. Finding Elements in One Array but Not Another
4.8.1. Problem
You want to find elements that are in one
array but not another.
4.8.2. Solution
You want to find elements in @A that aren't in
@B. Build a hash of the keys of
@B to use as a lookup table. Then check each
element in @A to see whether it is in
@B.
4.8.2.1. Straightforward implementation
# assume @A and @B are already loaded
%seen = ( ); # lookup table to test membership of B
@aonly = ( ); # answer
# build lookup table
foreach $item (@B) { $seen{$item} = 1 }
# find only elements in @A and not in @B
foreach $item (@A) {
unless ($seen{$item}) {
# it's not in %seen, so add to @aonly
push(@aonly, $item);
}
}
4.8.2.2. More idiomatic version
my %seen; # lookup table
my @aonly; # answer
# build lookup table
@seen{@B} = ( );
foreach $item (@A) {
push(@aonly, $item) unless exists $seen{$item};
}
4.8.2.3. Loopless version
my @A = ...;
my @B = ...;
my %seen;
@seen {@A} = ( );
delete @seen {@B};
my @aonly = keys %seen;
4.8.3. Discussion
As with nearly any problem in Perl
that asks whether a scalar is in one list or
another, this one uses a hash. First, process @B
so that the %seen hash records each element from
@B by setting its value to 1. Then process
@A one element at a time, checking whether that
particular element had been in @B by consulting
the %seen hash.The given code retains duplicate elements in @A.
This can be fixed easily by adding the elements of
@A to %seen as they are
processed:
foreach $item (@A) {
push(@aonly, $item) unless $seen{$item};
$seen{$item} = 1; # mark as seen
}
The first two solutions differ mainly in how they build the hash. The
first iterates through @B. The second uses a
hash slice to initialize the hash. A hash slice
is easiest illustrated by this
example:
$hash{"key1"} = 1;
$hash{"key2"} = 2;
which is equivalent to:
@hash{"key1", "key2"} = (1,2);
The list in the curly braces holds the keys; the list on the right
holds the values. We initialize %seen in the first
solution by looping over each element in @B and
setting the appropriate value of %seen to 1. In
the second, we simply say:
@seen{@B} = ( );
This uses items in @B as keys for
%seen, setting each corresponding value to
undef, because there are fewer values on the right
than places to put them. This works out here because we check for
existence of the key, not logical truth or
defined ness of the value. If we needed true
values, a slice could still shorten our code:
@seen{@B} = (1) x @B;
In the third solution, we make use of this property even further and
avoid explicit loops altogether. (Not that avoiding loops should be
construed as being particularly virtuous; we're just showing you that
there's more than one way to do it.) The slice assignment makes any
element that was in @A a key, and the slice
deletion removes from the hash any keys that were elements of
@B, leaving those that were only in
@A.A fairly common situation where this might arise is when you have two
files and would like to know which lines from the second file either
were or weren't in the first. Here's a simple solution based on this
recipe:
open(OLD, $path1) || die "can't open $path1: $!";
@seen{ <OLD> } = ( );
open(NEW, $path2) || die "can't open $path2: $!";
while (<NEW>) {
print if exists $seen{$_};
}
This shows the lines in the second file that were already seen in the
first one. Use unless instead of
if to show the lines in the second file that were
not in the first.Imagine two files, the first containing the lines:
red
yellow
green
blue
and the second containing:
green
orange
purple
black
yellow
The output using if would be:
green
yellow
and the output using unless would be:
orange
purple
black
You could even do this from the command line; given a suitable
cat(1) program, it's easy:
% perl -e '@s{`cat OLD`}=( ); exists $s{$_} && print for `cat NEW'
% perl -e '@s{`cat OLD`}=( ); exists $s{$_} || print for `cat NEW'
You'd find that you just emulated these calls to the Unix
fgrep(1) program:
% fgrep -Ff OLD NEW
% fgrep -vFf OLD NEW
4.8.4. See Also
Hash slices are explained in perldata(1) and the
"Variables" section of Chapter 2 of Programming
Perl; Chapter 5; we use hashes in a
similar fashion in Recipe 4.7 and Recipe 4.9