Maximum coverage problem

Maximum coverage problem

The maximum coverage problem is a classical question in computer science and computational complexity theory. It is a problem that is widely taught in approximation algorithms.

As input you are given several sets and a number k. The sets may have some elements in common. You must select at most k of these sets such that the maximum number of elements are covered, i.e. the union of the selected sets has maximal size.

Formally, (unweighted) Maximum Coverage

Instance: A number k and a collection of sets  S = S_1, S_2, \ldots, S_m , where S_i \subseteq \left\{e_1, e_2, \ldots, e_n \right\} .
Objective: Find a subset  S^{'} \subseteq S of sets, such that  \left| S^{'} \right| \leq k and the number of covered elements  \left| \bigcup_{S_i \in S^{'}}{S_i} \right| is maximized.

The maximum coverage problem is NP-hard, and cannot be approximated within \frac{e}{e-1}-o(1) \approx 1.58 under standard assumptions. This result essentially matches the approximation ratio achieved by the greedy algorithm.

Contents

ILP formulation

The maximum coverage problem can be formulated as the following integer linear program.

maximize \sum_{e_j \in E} y_j. (maximizing the sum of covered elements).
subject to  \sum{x_i}  \leq k ; (no more than k sets are selected).
 \sum_{\, e_j \in S_i} x_i \geq y_j ; (if y_j \geq 0 then at least one set e_j \in S_i is selected).
0 \leq y_j \leq 1; (if yj = 1 then ej is covered)
x_i \in \{0,1\} (if xi = 1 then Si is selected for the cover).

Greedy algorithm

The greedy algorithm for maximum coverage chooses sets according to one rule: at each stage, choose a set which contains the largest number of uncovered elements. It can be shown that this algorithm achieves an approximation ratio of \frac{e}{e-1}.[1] Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for maximum coverage.[2]

Known extensions

The inapproximability results apply to all extension all maximum coverage since they hold maximum coverage as a special case.

Weighted version

In the weighted version every element ej has a weight w(ej). The task is to find a maximum coverage which has maximum weight. The basic version is a special case when all weights are 1.

maximize \sum_{e \in E} w(e_j) \cdot y_j . (maximizing the weighted sum of covered elements).
subject to  \sum{x_i}  \leq k ; (no more than k sets are selected).
 \sum_{e_j \in S_i} x_i \geq y_j ; (if y_j \geq 0 then at least one set e_j \in S_i is selected).
0 \leq y_j \leq 1; (if yj = 1 then ej is covered)
x_i \in \{0,1\} (if xi = 1 then Si is selected for the cover).

The greedy algorithm for the weighted maximum coverage at each stage chooses a set which contains the maximum weight of uncovered elements. This algorithm achieves an approximation ratio of \frac{e}{e-1}.[citation needed]

Budgeted maximum coverage

In the budgeted maximum coverage version, not only does every element ej have a weight w(ej), but also every set Si has a cost c(Si). Instead of k that limits the number of sets in the cover a budget B is given. This budget B limits the weight of the cover that can be chosen.

maximize \sum_{e \in E} w(e_j) \cdot y_j . (maximizing the weighted sum of covered elements).
subject to  \sum{c(S_i) \cdot x_i}  \leq B ; (the cost of the selected sets cannot exceed B).
 \sum_{e_j \in S_i} x_i \geq y_j ; (if y_j \geq 0 then at least one set e_j \in S_i is selected).
0 \leq y_j \leq 1; (if yj = 1 then ej is covered)
x_i \in \{0,1\} (if xi = 1 then Si is selected for the cover).

A greedy algorithm will no longer produce solutions with a performance guarantee. Namely, the worst case behavior of this algorithm might be very far from the optimal solution. The approximation algorithm is extended by the following way. First, after finding a solution using the greedy algorithm, return the better of the greedy algorithm's solution and the set of largest weight. Call this algorithm the modified greedy algorithm. Second, starting with all possible families of sets of sizes from one to (at least) three, augment these solutions with the modified greedy algorithm. Third, return the best out of all augmented solutions. This algorithm achieves an approximation ratio of \frac{e}{e-1}. This is the best possible approximation ratio unless NP \subseteq DTIME(n^{O(\log\log n)}).[3]

Generalized maximum coverage

In the generalized maximum coverage version every set Si has a cost c(Si), element ej has a different weight and cost depending on which set covers it. Namely, if ej is covered by set Si the weight of ej is wi(ej) and its cost is ci(ej). A budget B is given for the total cost of the solution.

maximize \sum_{e \in E, S_i} w_i(e_j) \cdot y_{ij} . (maximizing the weighted sum of covered elements in the sets in which they are covered).
subject to  \sum{c_i(e_j) \cdot y_{ij}} + \sum{c(S_i) \cdot x_i}  \leq B ; (the cost of the selected sets cannot exceed B).
 \sum_{i} y_{ij} \leq 1 ; (element ej = 1 can only be covered by at most one set).
 \sum_{S_i} x_i \geq y_{ij} ; (if y_j \geq 0 then at least one set e_j \in S_i is selected).
y_{ij} \in \{0,1\} ; (if yij = 1 then ej is covered by set Si)
x_i \in \{0,1\} (if xi = 1 then Si is selected for the cover).

Generalized maximum coverage algorithm

The algorithm uses the concept of residual cost/weight. The residual cost/weight is measured against a tentative solution and its the difference of the cost/weight from the cost/weight gained by a tentative solution.

The algorithm has several stages. First, find a solution using greedy algorithm. In each iteration of the greedy algorithm the tentative solution is added the set which contains the maximum residual weight of elements divided by the residual cost of these elements along with the residual cost of the set. Second, compare the solution gained by the first step to the best solution which uses a small number of sets. Third, return the best out of all examined solutions. This algorithm achieves an approximation ratio of \frac{e}{e-1} + o(1).[4]

Related problems

References

  1. ^ Hochbaum, D. S. (1997), "Approximating covering and packing problems: Set cover, vertex cover, independent set, and related problems", in Approximation algorithms for NP-hard problems, PWS Publishing Company, Boston, 94-143.
  2. ^ Feige, U., "A threshold of ln n for approximating set cover", J. ACM 45, 634-652.
  3. ^ Khuller, S., Moss, A., and Naor, J. 1999. The budgeted maximum coverage problem. Inf. Process. Lett. 70, 1 (Apr. 1999), 39-45.
  4. ^ Cohen, R. and Katzir, L. 2008. The Generalized Maximum Coverage Problem. Inf. Process. Lett. 108, 1 (Sep. 2008), 15-22.
  • Vazirani, Vijay V. (2001). Approximation Algorithms. Springer-Verlag. ISBN 3-540-65367-8. 
  • Uriel Feige, A Threshold of ln n for Approximating Set Cover, Journal of the ACM (JACM), v.45 n.4, p. 634 - 652, July 1998.

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Media coverage of the Arab–Israeli conflict — This article is part of the Arab Israeli conflict series. History Views on the conflict …   Wikipedia

  • Hidden node problem — In wireless networking, the hidden node problem occurs when a Node (networking) node is visible from a wireless access point (AP), but not from other nodes communicating with said AP. This leads to difficulties in media access control.… …   Wikipedia

  • broadcasting — /brawd kas ting, kah sting/, n. 1. the act of transmitting speech, music, visual images, etc., as by radio or television. 2. radio or television as a business or profession: She s training for a career in broadcasting. [1920 25; BROADCAST + ING1] …   Universalium

  • Zero tolerance — policies are studied in criminology and are common in formal and informal policing systems around the world.Fact|date=December 2007 The policies also appear in informal situations where there may be sexual harassment or Internet misuse in… …   Wikipedia

  • Orthogonal frequency-division multiple access — Multiplex techniques Circuit mode (constant bandwidth) TDM · FDM · SDM Polarization multiplexing Spatial multiplexing (MIMO) …   Wikipedia

  • HRS antenna — HRS type antennas are more or less the standard antenna used for long distance high power shortwave broadcasting (> 1000 km). Contents 1 History of HRS design 2 Nomenclature 3 A versatile antenna type …   Wikipedia

  • Nominal power — is a measurement of a mediumwave radio station s output used in the United States. AM broadcasters are licensed by the Federal Communications Commission to operate at a specific nominal power, which may be (and usually is) different from the… …   Wikipedia

  • Kharavela — Maximum extent of Kharavela (ଖାରବେଳ) Kalingan Empire: 2nd century B.C.EKharavela (ଖାରେବଳ) (IAST: Khāravela, Devanagari: खारवेल, Oriya: ଖାରେବଳ) (?209 – after 170 BCE) was the greatest Oriya emperor of Kalinga, the ancient name of Orissa state of… …   Wikipedia

  • Social Protection — ▪ 2006 Introduction With medical costs skyrocketing and government programs scaled back, citizens bore more responsibility for their health care costs; irregular migration, human trafficking, and migrant smuggling posed challenges for… …   Universalium

  • United States — a republic in the N Western Hemisphere comprising 48 conterminous states, the District of Columbia, and Alaska in North America, and Hawaii in the N Pacific. 267,954,767; conterminous United States, 3,022,387 sq. mi. (7,827,982 sq. km); with… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”