 Knapsack problem

"BKP" redirects here. For other uses, see BKP (disambiguation).
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixedsize knapsack and must fill it with the most useful items.
The problem often arises in resource allocation with financial constraints. A similar problem also appears in combinatorics, complexity theory, cryptography and applied mathematics.
The decision problem form of the knapsack problem is the question "can a value of at least V be achieved without exceeding the weight W?"
Contents
Definition
In the following, we have n kinds of items, 1 through n. Each kind of item i has a value v_{i} and a weight w_{i}. We usually assume that all values and weights are nonnegative. To simplify the representation, we can also assume that the items are listed in increasing order of weight. The maximum weight that we can carry in the bag is W.
The most common formulation of the problem is the 01 knapsack problem, which restricts the number x_{i} of copies of each kind of item to zero or one. Mathematically the 01knapsack problem can be formulated as:
 maximize
 subject to
The bounded knapsack problem restricts the number x_{i} of copies of each kind of item to a maximum integer value c_{i}. Mathematically the bounded knapsack problem can be formulated as:
 maximize
 subject to
The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item.
Of particular interest is the special case of the problem with these properties:
 it is a decision problem,
 it is a 01 problem,
 for each kind of item, the weight equals the value: w_{i} = v_{i}.
Notice that in this special case, the problem is equivalent to this: given a set of nonnegative integers, does any subset of it add up to exactly W? Or, if negative weights are allowed and W is chosen to be zero, the problem is: given a set of integers, does any nonempty subset add up to exactly 0? This special case is called the subset sum problem. In the field of cryptography, the term knapsack problem is often used to refer specifically to the subset sum problem.
If multiple knapsacks are allowed, the problem is better thought of as the bin packing problem.
Computational complexity
The knapsack problem is interesting from the perspective of computer science because
 there is a pseudopolynomial time algorithm using dynamic programming
 there is a fully polynomialtime approximation scheme, which uses the pseudopolynomial time algorithm as a subroutine
 the problem is NPcomplete to solve exactly, thus it is expected that no algorithm can be both correct and fast (polynomialtime) on all cases
 many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly.
The subset sum version of the knapsack problem is commonly known as one of Karp's 21 NPcomplete problems.
There have been attempts to use subset sum as the basis for public key cryptography systems, such as the MerkleHellman knapsack cryptosystem. These attempts typically used some group other than the integers. MerkleHellman and several similar algorithms were later broken, because the particular subset sum problems they produced were in fact solvable by polynomialtime algorithms.
One theme in research literature is to identify what the "hard" instances of the knapsack problem look like,^{[1]}^{[2]} or viewed another way, to identify what properties of instances in practice might make them more amenable than their worstcase NPcomplete behaviour suggests.^{[3]}
Several algorithms are freely available to solve knapsack problems, based on dynamic programming approach,^{[4]} branch and bound approach^{[5]} or hybridizations of both approaches.^{[3]}^{[6]}^{[7]}^{[8]}
Dynamic programming solution
Unbounded knapsack problem
If all weights () are nonnegative integers, the knapsack problem can be solved in pseudopolynomial time using dynamic programming. The following describes a dynamic programming solution for the unbounded knapsack problem.
To simplify things, assume all weights are strictly positive (w_{i} > 0). We wish to maximize total value subject to the constraint that total weight is less than or equal to W. Then for each w ≤ W, define m[w] to be the maximum value that can be attained with total weight less than or equal to w. m[W] then is the solution to the problem.
Observe that m[w] has the following properties:
 (the sum of zero items, i.e., the summation of the empty set)
where v_{i} is the value of the ith kind of item.
Here the maximum of the empty set is taken to be zero. Tabulating the results from m[0] up through m[W] gives the solution. Since the calculation of each m[w] involves examining n items, and there are W values of m[w] to calculate, the running time of the dynamic programming solution is O(nW). Dividing by their greatest common divisor is an obvious way to improve the running time.
The O(nW) complexity does not contradict the fact that the knapsack problem is NPcomplete, since W, unlike n, is not polynomial in the length of the input to the problem. The length of the W input to the problem is proportional to the number of bits in W, log W, not to W itself.
01 knapsack problem
A similar dynamic programming solution for the 01 knapsack problem also runs in pseudopolynomial time. As above, assume are strictly positive integers. Define m[i,w] to be the maximum value that can be attained with weight less than or equal to w using items up to i.
We can define m[i,w] recursively as follows:
 if (the new item is more than the current weight limit)
 if .
The solution can then be found by calculating m[n,W]. To do this efficiently we can use a table to store previous computations. This solution will therefore run in O(nW) time and O(nW) space. Additionally, if we use only a 1dimensional array m[w] to store the current optimal values and pass over this array i + 1 times, rewriting from m[W] to m[1] every time, we get the same result for only O(W) space.
Another algorithm for 01 knapsack, discovered in 1974 ^{[9]} and sometimes called "meetinthemiddle" due to parallels to a similarlynamed algorithm in cryptography, is exponential in the number of different items but may be preferable to the DP algorithm when W is large compared to n. In particular, if the w_{i} are nonnegative but not integers, we could still use the dynamic programming algorithm by scaling and rounding (i.e. using fixedpoint arithmetic), but if the problem requires d fractional digits of precision to arrive at the correct answer, W will need to be scaled by 10^{d}, and the DP algorithm will require O(W * 10^{d}) space and O(nW * 10^{d}) time.
The "meetinthemiddle" algorithm is as follows:
 Partition the set {1...n} into two sets A and B of approximately equal size
 Compute the weights and values of all subsets of each set.
 For each subset of A, find the "best matching" subset of B, i.e. the subset of B of greatest value such that the combined weight is less than W. Keep track of the greatest combined value seen so far.
The algorithm takes O(2^{n / 2}) space, and efficient implementations of step 3 (for instance, sorting the subsets of B by weight, discarding subsets of B which weigh more than other subsets of B of greater or equal value, and using binary search to find the best match) result in a runtime of O(n * 2^{n / 2}). As with the meet in the middle attack in cryptography, this improves on the O(n * 2^{n}) runtime of a naive brute force approach (examining all subsets of {1...n}), at the cost of using exponential rather than constant space.
Greedy approximation algorithm
George Dantzig proposed a greedy approximation algorithm to solve the unbounded knapsack problem.^{[10]} His version sorts the items in decreasing order of value per unit of weight, v_{i} / w_{i}. It then proceeds to insert them into the sack, starting with as many copies as possible of the first kind of item until there is no longer space in the sack for more. Provided that there is an unlimited supply of each kind of item, if m is the maximum value of items that fit into the sack, then the greedy algorithm is guaranteed to achieve at least a value of m / 2. However, for the bounded problem, where the supply of each kind of item is limited, the algorithm may be far from optimal.
Dominance relations in the UKP
Solving the unbounded knapsack problem can be made easier by throwing away items which will never be needed. For a given item i, suppose we could find a set of items J such that their total weight is less than the weight of i, and their total value is greater than the value of i. Then i cannot appear in the optimal solution, because we could always improve any potential solution containing i by replacing i with the set J. Therefore we can disregard the ith item altogether. In such cases, J is said to dominate i. (Note that this does not apply to bounded knapsack problems, since we may have already used up the items in J.)
Finding dominance relations allows us to significantly reduce the size of the search space. There are several different types of dominance relations,^{[3]} which all satisfy an inequality of the form: , and for some
where and . The vector x denotes the number of copies of each member of J.
Collective dominance
The ith item is collectively dominated by J, written as , if the total weight of some combination of items in J is less than w_{i} and their total value is greater than v_{i}. Formally, and for some , i.e. α = 1. Verifying this dominance is computationally hard, so it can only be used with a dynamic programming approach. In fact, this is equivalent to solving a smaller knapsack decision problem where2 V = v_{i}, W = w_{i}, and the items are restricted to J.
Threshold dominance
The ith item is threshold dominated by J, written as , if some number of copies of i are dominated by J. Formally, , and for some and . This is a generalization of collective dominance, first introduced in^{[4]} and used in the EDUK algorithm. The smallest such α defines the threshold of the item i, written t_{i} = (α − 1)w_{i}. In this case, the optimal solution could contain at most α − 1 copies of i.
Multiple dominance
The ith item is multiply dominated by a single item j, written as , if i is dominated by some number of copies of j. Formally, , and for some i.e. . This dominance could be efficiently used during preprocessing because it can be detected relatively easily.
Modular dominance
Let b be the best item, i.e. for all i. This is the item with the greatest density of value.
The ith item is modularly dominated by a single item j, written as , if i is dominated by j plus several copies of b. Formally, , and i.e. J = {b,j},α = 1,x_{b} = t,x_{j} = 1.
Applications
Knapsack problems can be applied to realworld decisionmaking processes in a wide variety of fields, such as the finding the least wasteful cutting of raw materials,^{[11]} selection of capital investments and financial portfolios,^{[12]} selection of assets for assetbacked securitization,^{[13]} and generating keys for the Merkle–Hellman knapsack cryptosystem.^{[14]}
One early application of knapsack algorithms was in the construction and scoring of tests in which the testtakers have a choice as to which questions they answer. On tests with a homogeneous distribution of point values for each question, it is a fairly simple process to provide the testtakers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the testtaker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values—that is, when different questions or sections are worth different amounts of points— it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score.^{[15]}
History
The knapsack problem has been studied for more than a century, with early works dating as far back as 1897.^{[16]} It is not known how the name "knapsack problem" originated, though the problem was referred to as such in the early works of mathematician Tobias Dantzig (1884–1956)^{[citation needed]}, suggesting that the name could have existed in folklore before a mathematical problem had been fully defined.^{[17]}
The quadratic knapsack problem was first introduced by Gallo, Hammer, and Simeone in 1960.^{[18]}
A 1998 study of the Stony Brook University algorithms repository showed that, out of 75 algorithmic problems, the knapsack problem was the 18th most popular and the 4th most needed after kdtrees, suffix trees, and the bin packing problem.^{[19]}
See also
Notes
 ^ Pisinger, D. 2003. Where are the hard knapsack problems? Technical Report 2003/08, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
 ^ L. Caccetta, A. Kulanoot, Computational Aspects of Hard Knapsack Problems, Nonlinear Analysis 47 (2001) 5547–5558.
 ^ ^{a} ^{b} ^{c} Vincent Poirriez, Nicola Yanev, Rumen Andonov (2009) A Hybrid Algorithm for the Unbounded Knapsack Problem Discrete Optimization http://dx.doi.org/10.1016/j.disopt.2008.09.004
 ^ ^{a} ^{b} Rumen Andonov, Vincent Poirriez, Sanjay Rajopadhye (2000) Unbounded Knapsack Problem : dynamic programming revisited European Journal of Operational Research 123: 2. 168–181 http://dx.doi.org/10.1016/S03772217(99)002659
 ^ S. Martello, P. Toth, Knapsack Problems: Algorithms and Computer Implementation, John Wiley and Sons, 1990
 ^ S. Martello, D. Pisinger, P. Toth, Dynamic programming and strong bounds for the 01 knapsack problem , Manag. Sci., 45:414–424, 1999.
 ^ G. Plateau, M. Elkihel, A hybrid algorithm for the 01 knapsack problem, Methods of Oper. Res., 49:277–293, 1985.
 ^ S. Martello, P. Toth, A mixture of dynamic programming and branchandbound for the subsetsum problem, Manag. Sci., 30:765–771
 ^ Horowitz, Ellis; Sahni, Sartaj (1974), "Computing partitions with applications to the knapsack problem", Journal of the Association for Computing Machinery 21: 277–292, doi:10.1145/321812.321823, MR0354006
 ^ George B. Dantzig, DiscreteVariable Extremum Problems, Operations Research Vol. 5, No. 2, April 1957, pp. 266–288, DOI: http://dx.doi.org/10.1287/opre.5.2.266
 ^ Kellerer, Pferschy, and Pisinger 2004, p. 449
 ^ Kellerer, Pferschy, and Pisinger 2004, p. 461
 ^ Kellerer, Pferschy, and Pisinger 2004, p. 465
 ^ Kellerer, Pferschy, and Pisinger 2004, p. 472
 ^ Feuerman, Martin; Weiss, Harvey (April 1973). "A Mathematical Programming Model for Test Construction and Scoring". Management Science 19 (8): 961–966. JSTOR 2629127.
 ^ Mathews, G. B. (25 June 1897). "On the partition of numbers". Proceedings of the London Mathematical Society 28: 486–490. http://plms.oxfordjournals.org/content/s128/1/486.full.pdf.
 ^ Kellerer, Pferschy, and Pisinger 2004, p. 3
 ^ Gallo, G.; Hammer, P. L.; Simeone, B. (1980). "Quadratic knapsack problems". Mathematical Programming Studies 12: 132–149. doi:10.1007/BFb0120892. http://www.springerlink.com/content/x804231403086x51/.
 ^ Skiena, S. S. (September 1999). "Who is Interested in Algorithms and Why? Lessons from the Stony Brook Algorithm Repository". AGM SIGACT News 30 (3): 65–74. ISSN 01635700. http://delivery.acm.org/10.1145/340000/333627/p65skiena.pdf?key1=333627&key2=9434996821&coll=GUIDE&dl=GUIDE&CFID=108583297&CFTOKEN=90100478.
References
 Garey, Michael R.; David S. Johnson (1979). Computers and Intractability: A Guide to the Theory of NPCompleteness. W.H. Freeman. ISBN 0716710455. A6: MP9, pg.247.
 Kellerer, Hans; Pferschy, Ulrich; Pisinger, David (2004). Knapsack Problems. Springer. ISBN 3540402861. MR2161720.
 Martello, Silvano; Toth, Paolo (1990). Knapsack problems: Algorithms and computer interpretations. WileyInterscience. ISBN 0471924202. MR1086874.
External links
 Lecture slides on the knapsack problem
 PYAsUKP: Yet Another solver for the Unbounded Knapsack Problem, with code taking advantage of the dominance relations in an hybride algorithm, benchmarks and downloadable copies of some papers.
 Home page of David Pisinger with downloadable copies of some papers on the publication list (including "Where are the hard knapsack problems?")
 Knapsack Problem solutions in many languages at Rosetta Code
 Dynamic Programming algorithm to 0/1 Knapsack problem
 01 Knapsack Problem in Python
 Interactive JavaScript branchandbound solver
 Solving 01KNAPSACK with Genetic Algorithms in Ruby
Categories: Cryptography
 NPcomplete problems
 Operations research
 Dynamic programming
 Combinatorial optimization
 Weakly NPcomplete problems
 Pseudopolynomial time algorithms
Wikimedia Foundation. 2010.