
![]() | ![]() |
4.9. Computing Union, Intersection, or Difference of Unique Lists
4.9.1. Problem
You have a pair of lists, each holding
unduplicated items. You'd like to find out which items are in both
lists (intersection), one but not the other
(difference), or either
(union).
4.9.2. Solution
The following solutions need the listed initializations:@a = (1, 3, 5, 6, 7, 8);
@b = (2, 3, 5, 7, 9);
@union = @isect = @diff = ( );
%union = %isect = ( );
%count = ( );
4.9.2.1. Simple solution for union and intersection
foreach $e (@a) { $union{$e} = 1 }
foreach $e (@b) {
if ( $union{$e} ) { $isect{$e} = 1 }
$union{$e} = 1;
}
@union = keys %union;
@isect = keys %isect;
4.9.2.2. More idiomatic version
foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ }
@union = keys %union;
@isect = keys %isect;
4.9.2.3. Union, intersection, and symmetric difference
foreach $e (@a, @b) { $count{$e}++ }
@union = keys %count;
foreach $e (keys %count) {
if ($count{$e} = = 2) {
push @isect, $e;
} else {
push @diff, $e;
}
}
4.9.2.4. Indirect solution
@isect = @diff = @union = ( );
foreach $e (@a, @b) { $count{$e}++ }
@union = keys %count;
foreach $e (keys %count) {
push @{ $count{$e} = = 2 ? \@isect : \@diff }, $e;
}
4.9.3. Discussion
The first solution most directly computes the union and intersection
of two lists, neither containing duplicates. Two hashes are used to
record whether a particular item goes in the union or the
intersection. We put every element of the first array in the union
hash, giving it a true value. Then, processing each element of the
second array, we check whether that element is already present in the
union. If it is, we put it in the intersection as well. In any event,
it goes into the union. When we're done, we extract the keys of both
the union and intersection hashes. The values aren't needed.The second solution (Recipe 4.8.2.2) is essentially
the same but relies on familiarity with the Perl (and
awk, C, C++, and Java) ++ and
&& operators. By placing the
++ after the variable, we first look at its old
value before incrementing it. The first time through it won't be in
the union, which makes the first part of the
&& false, so the second part is
consequently ignored. The second time that we encounter the same
element, it's already present in the union, so we put it in the
intersection.The third solution uses just one hash to track how many times each
element is seen. Once both arrays have their elements recorded in the
hash, we grab those keys and put them in the union. Then we process
those hash keys one at a time. Keys whose values are 2 were in both
arrays, so they go in the intersection array. Keys whose values are 1
were in just one of the two arrays, so they go in the difference
array. Elements of the output arrays are not in the same order as
those in the input arrays.The last solution, like the previous one, uses just one hash to count
how many times each element is encountered. Here, though, we
dynamically select one of two possible arrays by placing within the
@{...} array-dereferencing block an expression
whose evaluation yields a reference to whichever array is demanded by
the situation.
In this recipe we compute the
symmetric difference, not the simple difference. These are terms from
set theory. A symmetric difference is the set
of all elements that are members of either @A or
@B, but not both. A simple
difference is the set of members of @A
but not @B, which we calculated in Recipe 4.8.
4.9.4. See Also
The "Hashes" section of Chapter 2 of Programming
Perl; Chapter 5; we use hashes in a
similar fashion in Recipe 4.7 and Recipe 4.8
![]() | ![]() | ![]() |
4.8. Finding Elements in One Array but Not Another | ![]() | 4.10. Appending One Array to Another |

Copyright © 2003 O'Reilly & Associates. All rights reserved.