The set-covering problem is an optimization problem that models many resource-selection problems. Its corresponding decision problem generalizes the NP-complete vertex-cover problem and is therefore also NP-hard. The approximation algorithm developed to handle the vertex-cover problem doesn't apply here, however, and so we need to try other approaches. We shall examine a simple greedy heuristic with a logarithmic approximation ratio. That is, as the size of the instance gets larger, the size of the approximate solution may grow, relative to the size of an optimal solution. Because the logarithm function grows rather slowly, however, this approximation algorithm may nonetheless give useful results.
An instance (X,
We say that a subset
(35.8) |
We say that any
Figure 35.3: An instance (X,
The set-covering problem is an abstraction of many commonly arising combinatorial problems. As a simple example, suppose that X represents a set of skills that are needed to solve a problem and that we have a given set of people available to work on the problem. We wish to form a committee, containing as few people as possible, such that for every requisite skill in X, there is a member of the committee having that skill. In the decision version of the set-covering problem, we ask whether or not a covering exists with size at most k, where k is an additional parameter specified in the problem instance. The decision version of the problem is NP-complete, as Exercise 35.3-2 asks you to show.
The greedy method works by picking, at each stage, the set S that covers the greatest number of remaining elements that are uncovered.
GREEDY-SET-COVER(X,) 1 U ← X 2 3 while U ≠ Ø 4 do select an that maximizes |S ∪ U | 5 U ← U - S 6 7 return
In the example of Figure 35.3, GREEDY-SET-COVER adds to
The algorithm works as follows. The set U contains, at each stage, the set of remaining uncovered elements. The set
The algorithm GREEDY-SET-COVER can easily be implemented to run in time polynomial in |X| and |
We now show that the greedy algorithm returns a set cover that is not too much larger than an optimal set cover. For convenience, in this chapter we denote the dth harmonic number
Theorem 35.4
GREEDY-SET-COVER is a polynomial-time ρ(n)-approximation algorithm, where
Proof We have already shown that GREEDY-SET-COVER runs in polynomial time.
To show that GREEDY-SET-COVER is a ρ(n)-approximation algorithm, we assign a cost of 1 to each set selected by the algorithm, distribute this cost over the elements covered for the first time, and then use these costs to derive the desired relationship between the size of an optimal set cover
At each step of the algorithm, 1 unit of cost is assigned, and so
The cost assigned to the optimal cover is
and since each x ∈ X is in at least one set
Combining the previous two inequalities, we have that
(35.9) |
The remainder of the proof rests on the following key inequality, which we shall prove shortly. For any set S belonging to the family
(35.10) |
From inequalities (35.9) and (35.10), it follows that
thus proving the theorem.
All that remains is to prove inequality (35.10). Consider any set
ui = |S - (S1 ∪; S2 ∪; ··· ∪ Si)|
be the number of elements in S remaining uncovered after S1, S2, ..., Si have been selected by the algorithm. We define u0 = |S| to be the number of elements of S, which are all initially uncovered. Let k be the least index such that uk = 0, so that each element in S is covered by at least one of the sets S1, S2, ..., Sk. Then, ui-1 ≥ ui, and ui-1 - ui elements of S are covered for the first time by Si, for i = 1, 2, ..., k. Thus,
Observe that
|Si - (S1 ∪ S2 ∪ ··· ∪ Si-1)| |
≥ |
|S - (S1 ∪ S2 ∪ ··· ∪ Si-1)| |
= |
ui-1, |
because the greedy choice of Si guarantees that S cannot cover more new elements than Si does (otherwise, S would have been chosen instead of Si). Consequently, we obtain
We now bound this quantity as follows:
which completes the proof of inequality (35.10).
GREEDY-SET-COVER is a polynomial-time (ln |X|+ 1)-approximation algorithm.
Proof Use inequality (A.14) and Theorem 35.4.
In some applications,
Consider each of the following words as a set of letters: {arid, dash, drain, heard, lost, nose, shun, slate, snare, thread}. Show which set cover GREEDY-SET-COVER produces when ties are broken in favor of the word that appears first in the dictionary.
Show that the decision version of the set-covering problem is NP-complete by reduction from the vertex-cover problem.
Show how to implement GREEDY-SET-COVER in such a way that it runs in time
Show that the following weaker form of Theorem 35.4 is trivially true:
GREEDY-SET-COVER can return a number of different solutions, depending on how we break ties in line 4. Give a procedure BAD-SET-COVER-INSTANCE(n) that returns an n-element instance of the set-covering problem for which, depending on how ties are broken in line 4, GREEDY-SET-COVER can return number of different solutions that is exponential in n.