35.3 The set-covering problem
The set-covering problem is an optimization problem that models many resource-selection problems. Its corresponding decision problem generalizes the NP-complete vertex-cover problem and is therefore also NP-hard. The approximation algorithm developed to handle the vertex-cover problem doesn't apply here, however, and so we need to try other approaches. We shall examine a simple greedy heuristic with a logarithmic approximation ratio. That is, as the size of the instance gets larger, the size of the approximate solution may grow, relative to the size of an optimal solution. Because the logarithm function grows rather slowly, however, this approximation algorithm may nonetheless give useful results.An instance (X,




We say that a subset


(35.8) | ![]() |
We say that any



Figure 35.3: An instance (X,



The set-covering problem is an abstraction of many commonly arising combinatorial problems. As a simple example, suppose that X represents a set of skills that are needed to solve a problem and that we have a given set of people available to work on the problem. We wish to form a committee, containing as few people as possible, such that for every requisite skill in X, there is a member of the committee having that skill. In the decision version of the set-covering problem, we ask whether or not a covering exists with size at most k, where k is an additional parameter specified in the problem instance. The decision version of the problem is NP-complete, as Exercise 35.3-2 asks you to show.
A greedy approximation algorithm
The greedy method works by picking, at each stage, the set S that covers the greatest number of remaining elements that are uncovered.
GREEDY-SET-COVER(X,)
1 U ← X
2
3 while U ≠ Ø
4 do select anthat maximizes |S ∪ U |
5 U ← U - S
6
7 return
In the example of Figure 35.3, GREEDY-SET-COVER adds to










Analysis
We now show that the greedy algorithm returns a set cover that is not too much larger than an optimal set cover. For convenience, in this chapter we denote the dth harmonic number

GREEDY-SET-COVER is a polynomial-time ρ(n)-approximation algorithm, where

Proof We have already shown that GREEDY-SET-COVER runs in polynomial time.To show that GREEDY-SET-COVER is a ρ(n)-approximation algorithm, we assign a cost of 1 to each set selected by the algorithm, distribute this cost over the elements covered for the first time, and then use these costs to derive the desired relationship between the size of an optimal set cover




At each step of the algorithm, 1 unit of cost is assigned, and so

The cost assigned to the optimal cover is

and since each x ∈ X is in at least one set


Combining the previous two inequalities, we have that
(35.9) | ![]() |
The remainder of the proof rests on the following key inequality, which we shall prove shortly. For any set S belonging to the family

(35.10) | ![]() |
From inequalities (35.9) and (35.10), it follows that

thus proving the theorem.All that remains is to prove inequality (35.10). Consider any set



Observe that
|Si - (S1 ∪ S2 ∪ ··· ∪ Si-1)| | ≥ | |S - (S1 ∪ S2 ∪ ··· ∪ Si-1)| |
= | ui-1, |
because the greedy choice of Si guarantees that S cannot cover more new elements than Si does (otherwise, S would have been chosen instead of Si). Consequently, we obtain

We now bound this quantity as follows:

which completes the proof of inequality (35.10).
Corollary 35.5
GREEDY-SET-COVER is a polynomial-time (ln |X|+ 1)-approximation algorithm.Proof Use inequality (A.14) and Theorem 35.4.
In some applications,

Consider each of the following words as a set of letters: {arid, dash, drain, heard, lost, nose, shun, slate, snare, thread}. Show which set cover GREEDY-SET-COVER produces when ties are broken in favor of the word that appears first in the dictionary.
Exercises 35.3-2
Show that the decision version of the set-covering problem is NP-complete by reduction from the vertex-cover problem.
Exercises 35.3-3
Show how to implement GREEDY-SET-COVER in such a way that it runs in time

Exercises 35.3-4
Show that the following weaker form of Theorem 35.4 is trivially true:

Exercises 35.3-5
GREEDY-SET-COVER can return a number of different solutions, depending on how we break ties in line 4. Give a procedure BAD-SET-COVER-INSTANCE(n) that returns an n-element instance of the set-covering problem for which, depending on how ties are broken in line 4, GREEDY-SET-COVER can return number of different solutions that is exponential in n.