8 The Knapsack Problem

The knapsack problem is a classic optimization problem in the field of operations research. It involves selecting a subset of items from a given set of items to maximize the total value/profit while satisfying certain constraints. This problem is NP-hard, meaning that it is computationally difficult to find an optimal solution for large instances of the problem. Various algorithms have been developed to solve this problem, including dynamic programming, branch and bound, and heuristic methods. The knapsack problem has applications in a variety of fields, including computer science, finance, and logistics, among others. In this chapter, we’ll examine two variants of the knapsack problem, namely, the single-dimensional knapsack problem and the multi-dimensional knapsack problem.

8.1 Single-dimensional Knapsack Problem

In the single-dimensional knapsack problem (SDKP), we are given a set of \(n\) items, each with a weight \(w_j\) and a value \(v_j\), and a knapsack with a maximum weight capacity \(W\), the goal is to select a subset of items such that the sum of their weights is less than or equal to \(W\), and the sum of their values is maximized. The SDKP can be formally formulated as follows (Dudziński and Walukiewicz (1987)).

\[\begin{align} \text{max.} &\quad \sum_{j = 1}^nv_i x_j \label{knapsack1-obj} \\ \text{s.t.} &\quad \sum_{j=1}^n w_j x_j \leq W \label{knapsack1-cons1} \\ &\quad x_j \in \{0, 1\}, \ j = 1, \cdots, n \label{knapsack1-cons2} \end{align}\]

In the model, the \(n\) represents the total amount of items that are being considered. The binary decision variable \(x_i\) has a value of 1 when item \(i\) is chosen, and 0 when it’s not. The value or profit of selecting item \(i\) is indicated by \(v_i\), and the weight of item \(i\) is represented by \(w_i\). Finally, \(W\) specifies the maximum weight capacity of the knapsack.

To represent the list of items in the problem, we first define a class Item that has three attributes:

_index: this is the index of an item and starts from 0
_profit: this is the profit or value of selecting the item
_properties: this dictionary saves an item’s properties, including weight

class Item:
    """An item represents an object that can be placed within a knapsack
    """
    
    def __init__(self, index, profit):
        """constructor

        Args:
            index (int): index of the item, starting from 0
            profit (float): profit of choosing the item
        """
        self._index = index
        self._profit = profit
        self._class = 0
        self._properties = {}
    
    @property
    def index(self): return self._index
    
    @property
    def profit(self): return self._profit
    
    def get_class(self): return self._class
    
    def set_class(self, cls): self._class = cls
    
    def get_property(self, name):
        return self._properties[name]

    def set_property(self, name, value):
        self._properties[name] = value
        
    def __str__(self):
        p_str = ""
        for attr in self._properties:
            p_str += f'{attr}: {self._properties[attr]}'
        return f"index: {self._index}, profit: {self._profit}, " + \
                p_str

Next, we define a class KnapsackDataCenter to hold all the information we need to solve a knapsack problem. The class has two attributes:

_items: this is the full list of items being considered for putting into the knapsack
_capacities: this saves the maximal capacity of the knapsack corresponding to different properties, including weight.

The class also defines a method read_data_set_f() that reads and parses some benchmarking instancese available online.

class KnapsackDataCenter:
    
    def __init__(self):
        self._items = []
        self._capacities = {}
                
    @property
    def items(self): return self._items
    
    @property
    def capacities(self): return self._capacities

class SDKPDataCenter(KnapsackDataCenter):
    
    def __init__(self):
        super().__init__()
        
    def read_sdkp_dataset_f(self, data_file: str):
        """this function reads and parses data presented in 
        http://artemisa.unicauca.edu.co/~johnyortega/instances_01_KP/

        Args:
            data_file (str): path to the data file
            num_items (int): number of items in the file
            capacity (int): knapsack capacity
        """
        with open(data_file) as f:
            first_line = f.readline()
            num_items, capacity = first_line.split()
            self._capacities['weight'] = float(capacity)
            
            item_idx = 0
            rest_lines = f.readlines()
            for line in rest_lines:
                profit, weight = line.split()
                item = Item(item_idx, float(profit))
                item.set_property('weight', float(weight))
                self._items.append(item)
                item_idx += 1
                if item_idx == int(num_items): break

The code snippet below reads the instance f1_l-d_kp_10_269 and shows its key information.

data_file = "./data/knapsack/SDKP/instances_01_KP/low-dimensional/f1_l-d_kp_10_269"

data_center = SDKPDataCenter()
data_center.read_sdkp_dataset_f(data_file)

capacities = data_center.capacities
print(f"Knapsack info: {capacities}")

items = data_center.items
print("Item info: ")
for item in items:
    print(item)

Knapsack info: {'weight': 269.0}
Item info: 
index: 0, profit: 55.0, weight: 95.0
index: 1, profit: 10.0, weight: 4.0
index: 2, profit: 47.0, weight: 60.0
index: 3, profit: 5.0, weight: 32.0
index: 4, profit: 4.0, weight: 23.0
index: 5, profit: 50.0, weight: 72.0
index: 6, profit: 8.0, weight: 80.0
index: 7, profit: 61.0, weight: 62.0
index: 8, profit: 85.0, weight: 65.0
index: 9, profit: 87.0, weight: 46.0

Now we are ready to solve the SDKP using OR-Tools. The code below shows the completion definition of the class SDKnapsackSolver which has a number of attributes:

_data_center: this should be an instantiated KnapsackDataCenter object
_solver: this is the solver object used for modeling and problem solving
_var_x: this is the object that holds all the decision variables used in the model
_opt_obj: this is the optimal solution value found by the solver
_opt_x: this is the optimal solution identified when the solving process completes

The class has two major methods:

build_model(): this is responsible for instantiating the solver object, creating variables, generating constraints and defining the objective function
optimize(): this is the place where the OR-Tools searches for the optimal solution

from ortools.linear_solver import pywraplp

class SDKnapsackSolver:
    
    def __init__(self, data_center):
        self._data_center = data_center

        self._solver = None
        self._var_x = None

        self._opt_obj = None
        self._opt_x = None
        
    def build_model(self):
        self._solver = pywraplp.Solver.CreateSolver('SCIP')

        self._create_variables()
        self._create_objective()
        self._create_constraints()
        
    def optimize(self):
        status = self._solver.Solve()
        if status == pywraplp.Solver.OPTIMAL:
            self._opt_obj = self._solver.Objective().Value()
            items = self._data_center.items
            self._opt_x = [
                    self._var_x[item.index].solution_value()
                    for item in items
                ]
    
    def _create_variables(self):
        items = self._data_center.items
        self._var_x = [self._solver.BoolVar(name=f'x_{i}')
                    for i, item in enumerate(items)]
    
    def _create_objective(self):
        items = self._data_center.items
        obj_expr = [
                self._var_x[item.index] * item.profit
                for item in items
                ]
        self._solver.Maximize(self._solver.Sum(obj_expr))
        
    def _create_constraints(self):
        items = self._data_center.items
        capacities = self._data_center.capacities
        expr = [
            self._var_x[item.index] * 
                item.get_property('weight')
            for item in items
            ]
        self._solver.Add(
                self._solver.Sum(expr) <= capacities['weight']
            )
        
    @property
    def opt_obj(self): return self._opt_obj
    
    def get_num_chosen_items(self):
        return sum(self._opt_x)

We employ the model to tackle 10 benchmark instances that were downloaded from an online source (http://artemisa.unicauca.edu.co/~johnyortega/instances_01_KP/). Table 8.1 displays the computational results.

More specifically, the first column of the table lists the names of the instances, while the second and third columns indicate the number of items and the capacity of the knapsack for each instance, respectively. The last two columns report the optimal solution’s objective value and the number of selected items, respectively.

Table 8.1: Computational results of Knapsack problems

Instance	No. of Items	Capacity	Optimal Value	No. of Chosen Items
f1_l-d_kp_10_269	10	269	295	6
f2_l-d_kp_20_878	20	878	1024	17
f3_l-d_kp_4_20	4	20	35	3
f4_l-d_kp_4_11	4	11	23	2
f5_l-d_kp_15_375	15	375	481.069	9
f6_l-d_kp_10_60	10	60	52	7
f7_l-d_kp_7_50	7	50	107	2
f8_l-d_kp_23_10000	23	10000	9767	11
f9_l-d_kp_5_80	5	80	130	4
f10_l-d_kp_20_879	20	879	1025	17

8.2 Multi-Dimensional Knapsack Problem

The multi-dimensional knapsack problem is a variant of the classical knapsack problem where there are multiple candidate items and each item has multiple attributes or dimensions (Petersen (1967)). The goal is to select a subset of items that maximizes the total value or profit subject to the constraint that the sum of the attribute values of the selected items does not exceed certain limits.

Formally, let there be \(n\) items, and for each item \(j \ (1 ≤ j ≤ n)\), let:

\(v_j\) be the profit of selecting it
\(w_{ij}\) be the \(i\)th attribute value of item \(j\)
\(m\) is the number of dimensions or attributes
\(W_i\) is the capacity of the knapsack for dimension \(i\)

The multi-dimensional knapsack problem can then be formulated as the following optimization problem:

\[\begin{align} \text{max.} &\quad \sum_{j = 1}^n v_j x_j \label{knapsack2-obj} \\ \text{s.t.} &\quad \sum_{j=1}^n w_{ij} x_j \leq W_i, \ \forall i = 1, \cdots, m \label{knapsack2-cons1} \\ &\quad x_j \in \{0, 1\}, \ j = 1, \cdots, n \label{knapsack2-cons2} \end{align}\]

To test the model, we take data from online source (http://people.brunel.ac.uk/~mastjjb/jeb/orlib/mknapinfo.html) and define a reading function read_mdkp_dataset_1() in class MDKPDataCenter. Note that the class inherits the KnapsackDataCenter class defined in the previous section, and all the parsed contents of an instance will be saved in the _items and _capacities attributes.

class MDKPDataCenter(KnapsackDataCenter):
    
    def read_mdkp_dataset_1(self, data_file: str):
        """read data from testing instance

        Args:
            data_file (str): data file
        """
        with open(data_file) as f:
            first_line = f.readline()
            num_items, \
            num_constraints, \
            opt_val = first_line.split()
            
            second_line = f.readline()
            profits = second_line.split()
            item_idx = 0
            for p in profits:
                item = Item(item_idx, float(p))
                self._items.append(item)
                item_idx += 1

            for i in range(int(num_constraints)):
                line = f.readline()
                weights = line.split()
                prop = f'prop_{i}'
                for idx, val in enumerate(weights):
                    self._items[idx].set_property(
                        prop,
                        float(val)
                    )

            last_line = f.readline()
            for idx, val in enumerate(last_line.split()):
                prop = f'prop_{idx}'
                self._capacities[prop] = float(val)

Now we define in the code below the class MDKnapsackSolver that performs a couple of things:

create variables in lines 31 - 34
define objective in lines 36 - 42
create constraints in line 44 - 58

from ortools.linear_solver import pywraplp

class MDKnapsackSolver:
    
    def __init__(self, data_center):
        self._data_center = data_center

        self._solver = None
        self._var_x = None

        self._opt_obj = None
        self._opt_x = None
        
    def build_model(self):
        self._solver = pywraplp.Solver.CreateSolver('SCIP')

        self._create_variables()
        self._create_objective()
        self._create_constraints()
        
    def optimize(self):
        status = self._solver.Solve()
        if status == pywraplp.Solver.OPTIMAL:
            self._opt_obj = self._solver.Objective().Value()
            items = self._data_center.items
            self._opt_x = [
                    self._var_x[item.index].solution_value()
                    for item in items
                ]
    
    def _create_variables(self):
        items = self._data_center.items
        self._var_x = [self._solver.BoolVar(name=f'x_{i}')
                    for i, item in enumerate(items)]
    
    def _create_objective(self):
        items = self._data_center.items
        obj_expr = [
                self._var_x[item.index] * item.profit
                for item in items
                ]
        self._solver.Maximize(self._solver.Sum(obj_expr))
        
    def _create_constraints(self):
        items = self._data_center.items
        capacities = self._data_center.capacities
        num_properties = len(capacities)
        for p_idx in range(num_properties):
            prop = f'prop_{p_idx}'
            expr = [
                self._var_x[item.index] * 
                    item.get_property(prop)
                for item in items
                ]
            self._solver.Add(
                    self._solver.Sum(expr) <= 
                    capacities[prop], name=f'cons_{p_idx}'
                )
        
    @property
    def opt_obj(self): return self._opt_obj
    
    def get_num_chosen_items(self):
        return sum(self._opt_x)

Table 8.2 shows the computational results of some testing instances.

Table 8.2: Computational results of multi-dimensional knapsack problems

Instance	No. of Items	Optimal Value	No. of Chosen Items
inst1.txt	6	3800	3
inst2.txt	10	8706.1	5
inst3.txt	15	4015	9
inst4.txt	20	6120	9
inst5.txt	28	12400	18
inst6.txt	39	10618	27
inst7.txt	50	16537	35

8.3 Multi-Choice Knapsack Problem

In the multi-choice knapsack problem, there are \(n\) candidate items to be placed in a knapsack and each item belongs to a specific class \(k = 1, \cdots m\). It is required that only one item can be selected from each class. The problem can be formulated as below (Sinha and Zoltners (1979)).

\[\begin{align} \text{max.} &\quad \sum_{j=1}^n v_j x_j \label{knapsack3-obj} \\ \text{s.t.} &\quad \sum_{j=1}^n w_j x_j \leq W \label{knapsack3-cons1}\\ &\quad \sum_{j=1}^n c_{jk} x_j = 1, \ \forall k = 1, \cdots m \label{knapsack3-cons2}\\ &\quad x_j = \{0, 1\}, \ \forall j = 1, \cdots n \label{knapsack3-cons3} \end{align}\]

The class MCKPDataCenter defines a reader function to parse the instances available online (https://or-dii.unibs.it/index.php?page=tiks).

class MCKPDataCenter(KnapsackDataCenter):
    
    def __init__(self):
        super().__init__()
        
        self._num_classes = 0
        self._num_choices = 0
    
    def read_mcmdkp_dataset_1(self, data_file: str):
        with open(data_file) as f:
            f.readline()
            num_classes,\
            num_choices,\
            num_constraints = f.readline().split()
            self._num_classes = int(num_classes)
            self._num_choices = int(num_choices)
            
            capacities = f.readline().split()
            for p_idx in range(int(num_constraints)):
                prop = f'prop_{p_idx}'
                self._capacities[prop] = float(capacities[p_idx])
            
            item_idx = 0
            for i in range(int(num_classes)):
                f.readline()
                for c in range(int(num_choices)):
                    line = f.readline().split()
                    item = Item(item_idx, profit=float(line[0]))
                    for p_idx in range(int(num_constraints)):
                        prop = f'prop_{p_idx}'
                        item.set_property(prop, float(line[p_idx + 1]))
                        item.set_class(i)
                    item_idx += 1
                    self._items.append(item)
                    
    @property
    def num_classes(self): return self._num_classes
    
    @property
    def num_choices(self): return self._num_choices

The complete code to solve the problem is given below.

from ortools.linear_solver import pywraplp

class MCKnapsackSolver:
    
    def __init__(self, data_center):
        self._data_center = data_center

        self._solver = None
        self._var_x = None

        self._opt_obj = None
        self._opt_x = None
        
    def build_model(self):
        self._solver = pywraplp.Solver.CreateSolver('SCIP')

        self._create_variables()
        self._create_objective()
        self._create_constraints()
        
    def optimize(self):
        status = self._solver.Solve()
        if status == pywraplp.Solver.OPTIMAL:
            self._opt_obj = self._solver.Objective().Value()
            items = self._data_center.items
            self._opt_x = [
                    self._var_x[item.index].solution_value()
                    for item in items
                ]
    
    def _create_variables(self):
        items = self._data_center.items
        self._var_x = [self._solver.BoolVar(name=f'x_{i}')
                    for i, item in enumerate(items)]
    
    def _create_objective(self):
        items = self._data_center.items
        obj_expr = [
                self._var_x[item.index] * item.profit
                for item in items
                ]
        self._solver.Maximize(self._solver.Sum(obj_expr))
        
    def _create_constraints(self):
        items = self._data_center.items
        capacities = self._data_center.capacities
        p_idx = 0
        prop = f'prop_{p_idx}'
        expr = [
            self._var_x[item.index] * 
                item.get_property(prop)
            for item in items
            ]
        self._solver.Add(
                self._solver.Sum(expr) <= 
                capacities[prop], name=f'cons_{p_idx}'
            )
        
        num_classes = self._data_center.num_classes
        for k in range(num_classes):
            expr = [self._var_x[item.index]
                for item in items
                if item.get_class() == k]
            self._solver.Add(
                self._solver.Sum(expr) == 1
            )
        
    @property
    def opt_obj(self): return self._opt_obj
    
    def get_num_chosen_items(self):
        return sum(self._opt_x)

Table 8.3 shows some computational experiments.

Table 8.3: Computational results of multi-choice knapsack problems

Instance	No. of Items	Optimal Value	No. of Chosen Items
INST01.txt	500	13411	50
INST02.txt	500	13953	50
INST03.txt	600	15727	60
INST04.txt	700	18928	70
INST05.txt	750	20314	75
INST06.txt	750	20277	75
INST07.txt	800	21372	80
INST08.txt	800	21556	80
INST09.txt	800	21581	80
INST10.txt	900	24232	90
INST11.txt	900	24267	90
INST12.txt	1000	26206	100
INST13.txt	3000	24382	100
INST14.txt	4500	36971	150
INST15.txt	5400	44001	180
INST16.txt	6000	48833	200
INST17.txt	7500	61056	250
INST18.txt	5600	68021	280
INST19.txt	6000	73054	300
INST20.txt	7000	84958	350

8.4 Multi-Choice Multi-Dimensional Knapsack Problem

In the multi-choice multi-dimensional knapsack problem (MCMDKP), there are \(n\) candidate items and each item belongs to a specific class \(k = 1, \cdots, m\). Each item also has \(d\) attributes. The requirement is to select exactly one item from each class such that the total profit is maximized. Note that the knapsack capacity cannot be violated for any item attribute. The MCMDKP can be formulated as follows (Chen and Hao (2014)).

\[\begin{align} \text{max.} &\quad \sum_{j=1}^n v_j x_j \label{knapsack4-obj} \\ \text{s.t.} &\quad \sum_{j=1}^n w_{ij} x_j \leq W_i, \ \forall i = 1, \cdots d \\ &\quad \sum_{j=1}^n c_{jk} x_j = 1, \ \forall k = 1, \cdots m\\ &\quad x_j = \{0, 1\}, \ \forall j = 1, \cdots n \end{align}\]

The code snippet below gives the complete program to solve this problem.

from ortools.linear_solver import pywraplp

class MCMDKnapsackSolver:
    
    def __init__(self, data_center):
        self._data_center = data_center

        self._solver = None
        self._var_x = None

        self._opt_obj = None
        self._opt_x = None
        
    def build_model(self):
        self._solver = pywraplp.Solver.CreateSolver('SCIP')

        self._create_variables()
        self._create_objective()
        self._create_constraints()
        
    def optimize(self):
        status = self._solver.Solve()
        if status == pywraplp.Solver.OPTIMAL:
            self._opt_obj = self._solver.Objective().Value()
            items = self._data_center.items
            self._opt_x = [
                    self._var_x[item.index].solution_value()
                    for item in items
                ]
    
    def _create_variables(self):
        items = self._data_center.items
        self._var_x = [self._solver.BoolVar(name=f'x_{i}')
                    for i, item in enumerate(items)]
    
    def _create_objective(self):
        items = self._data_center.items
        obj_expr = [
                self._var_x[item.index] * item.profit
                for item in items
                ]
        self._solver.Maximize(self._solver.Sum(obj_expr))
        
    def _create_constraints(self):
        items = self._data_center.items
        capacities = self._data_center.capacities
        num_properties = len(capacities)
        for p_idx in range(num_properties):
            prop = f'prop_{p_idx}'
            expr = [
                self._var_x[item.index] * 
                    item.get_property(prop)
                for item in items
                ]
            self._solver.Add(
                    self._solver.Sum(expr) <= 
                    capacities[prop], name=f'cons_{p_idx}'
                )
        
        num_classes = self._data_center.num_classes
        for k in range(num_classes):
            expr = [self._var_x[item.index]
                for item in items
                if item.get_class() == k]
            self._solver.Add(
                self._solver.Sum(expr) == 1
            )
        
    @property
    def opt_obj(self): return self._opt_obj
    
    def get_num_chosen_items(self):
        return sum(self._opt_x)

Table 8.4 shows some empirical computational results.

Table 8.4: Computational results of multi-choice multi-dimensional knapsack problems

Instance	No. of Items	Optimal Value	No. of Chosen Items
INST01.txt	250	7059	25
INST02.txt	250	6998	25
INST03.txt	300	8418	30
INST04.txt	300	8518	30
INST05.txt	300	8418	30
INST06.txt	300	8418	30
INST07.txt	300	8418	30
INST08.txt	300	8418	30
INST09.txt	300	8418	30
INST10.txt	300	8418	30
INST11.txt	300	8418	30
INST12.txt	300	8418	30
INST13.txt	900	8833	30
INST14.txt	900	8841	30
INST15.txt	900	8833	30
INST16.txt	900	8788	30
INST17.txt	900	8820	30
INST18.txt	600	8664	30
INST19.txt	600	8667	30
INST20.txt	600	8714	30