2 Benders Decomposition Theories
In this chapter, we will explain the theories behind Benders decomposition and demonstrate its usage on a trial linear programming problem. Keep in mind that Benders decomposition is not limited to solving linear programming problems. In fact, it is one of the most powerful techniques to solve some large-scale mixed-integer linear programming problems.
In the following sections, we will go through the critical steps during the decomposition process when applying the algorithm on optimization problems represented in standard forms. This is important as it helps build up the intuition of when we should consider applying Benders decomposition to a problem at hand. Often times, recognizing the applicability of Benders decomposition is the most important and challenging step when solving an optimization problem. Once we know that the problem structure is suitable to solve via Benders decomposition, it is straightforward to follow the decomposition steps and put it into work.
Generally speaking, Benders decomposition is a good solution candidate when the resulting problem is much easier to solve if some of the variables in the original problem are fixed. We will illustrate this point using an example in the following sections. In the optimization world, the first candidate that should come to mind when we say a problem is easy to solve is a linear programming formulation, which is indeed the case in Benders decomposition applications.
2.1 The Decomposition Logic
To explain the workings of Benders decomposition, let us look at the standard form of linear programming problems that involve two vector variables, \(\mathbf{x}\) and \(\mathbf{y}\). Let \(p\) and \(q\) indicate the dimensions of \(\mathbf{x}\) and \(\mathbf{y}\), respectively. Below is the original problem (P) we intend to solve.
\[ \begin{aligned} \text{min.} &\quad \mathbf{c}^T \mathbf{x} + \mathbf{f}^T \mathbf{y} \\ \text{s.t.} &\quad \mathbf{A} \mathbf{x} + \mathbf{B} \mathbf{y} = \mathbf{b} \\ &\quad \mathbf{x} \geq 0, \mathbf{y} \geq 0 \end{aligned} \]
In this formulation, \(\mathbf{c}\) and \(\mathbf{f}\) in the objective function represent the cost coefficients associated with decision variables \(\mathbf{x}\) and \(\mathbf{y}\), respectively. Both of them are column vectors of corresponding dimensions. In the constraints, matrix \(\mathbf{A}\) is of dimension \(m \times p\), and matrix \(\mathbf{B}\) is of dimension \(m \times q\). \(\mathbf{b}\) is a column vector of dimension \(m\).
Suppose the variable \(\mathbf{y}\) is a complicating variable in the sense that the resulting problem is substantially easier to solve if the value of \(\mathbf{y}\) is fixed. In this case, we could rewrite problem \(\mathbf{P}\) as the following form:
\[ \begin{aligned} \text{min.} &\quad \mathbf{f}^T \mathbf{y} + g(\mathbf{y}) \\ \text{s.t.} &\quad \mathbf{y} \geq 0 \end{aligned} \]
where \(g(\mathbf{y})\) is a function of \(\mathbf{y}\) and is defined as the subproblem \(\mathbf{SP}\) of the form below:
\[ \begin{aligned} \text{min.} &\quad \mathbf{c}^T \mathbf{x} \\ \text{s.t.} &\quad \mathbf{A} \mathbf{x} = \mathbf{b} - \mathbf{B} \mathbf{y} \label{bd-cons1} \\ &\quad \mathbf{x} \geq 0 \end{aligned} \]
Note that the \(\mathbf{y}\) in constraint \(\eqref{bd-cons1}\) takes on some known values when the problem is solved and the only decision variable in the above formulation is \(\mathbf{x}\). The dual problem of \(\mathbf{SP}\), \(\mathbf{DSP}\), is given below.
\[ \begin{aligned} \text{max.} &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u} \\ \text{s.t.} &\quad \mathbf{A}^T \mathbf{u} \leq \mathbf{c} \label{bd-cons2} \\ &\quad \mathbf{u}\ \text{unrestricted} \end{aligned} \]
A key characteristic of the above \(\mathbf{DSP}\) is that its solution space does not depend on the value of \(\mathbf{y}\), which only affects the objective function. According to the Minkowski’s representation theorem, any \(\bar{\mathbf{u}}\) satisfying the constraints \(\eqref{bd-cons2}\) can be expressed as
\[ \begin{aligned} \bar{\mathbf{u}} = \sum_{j \in \mathbf{J}} \lambda_j \mathbf{u}_{j}^{point} + \sum_{k \in \mathbf{K}} \mu_k \mathbf{u}_k^{ray} \end{aligned} \]
where \(\mathbf{u}_j^{point}\) and \(\mathbf{u}_k^{ray}\) represent an extreme point and extreme ray, respectively. In addition, \(\lambda_j \geq 0\) for all \(j \in \mathbf{J}\) and \(\sum_{j \in \mathbf{J}}\lambda_j = 1\), and \(\mu_k \geq 0\) for all \(k \in \mathbf{K}\). It follows that the \(\mathbf{DSP}\) is equivalent to
\[ \begin{aligned} \text{max.} &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} (\sum_{j \in \mathbf{J}} \lambda_j \mathbf{u}_{j}^{point} + \sum_{k \in \mathbf{K}} \mu_k \mathbf{u}_k^{ray}) \\ \text{s.t.} &\quad \sum_{j \in \mathbf{J}}\lambda_j = 1 \\ &\quad \lambda_j \geq 0, \ \forall j \in \mathbf{J} \\ &\quad \mu_k \geq 0, \ \forall k \in \mathbf{K} \end{aligned} \]
We can therefore conclude that
- The \(\mathbf{DSP}\) becomes unbounded if any \(\mathbf{u}_k^{ray}\) exists such that \((\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_k^{ray} > 0\). Note that an unbounded \(\mathbf{DSP}\) implies an infeasible \(\mathbf{SP}\) and to prevent this from happening, we have to ensure that \((\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_k^{ray} \leq 0\) for all \(k \in \mathbf{K}\).
- If an optimal solution to \(\mathbf{DSP}\) exists, it must occur at one of the extreme points. Let \(g\) denote the optimal objective value, it follows that \((\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_j^{point} \leq g\) for all \(j \in \mathbf{J}\).
Based on this idea, the \(\mathbf{DSP}\) can be reformulated as follows:
\[ \begin{aligned} \text{min.} &\quad g \\ \text{s.t.} &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_k^{ray} \leq 0, \ \forall j \in \mathbf{J} \label{bd-feas} \\ &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_j^{point} \leq g, \ \forall k \in \mathbf{K} \label{bd-opt} \\ &\quad j \in \mathbf{J}, k \in \mathbf{K} \end{aligned} \]
Constraints \(\eqref{bd-feas}\) are called Benders feasibility cuts, while constraints \(\eqref{bd-opt}\) are called Benders optimality cuts. Now we are ready to define the Benders Master Problem (\(\mathbf{BMP}\)) as follows:
\[ \begin{aligned} \text{min.} &\quad \mathbf{f}^T \mathbf{y} + g \\ \text{s.t.} &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_k^{ray} \leq 0, \ \forall j \in \mathbf{J} \\ &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_j^{point} \leq g, \ \forall k \in \mathbf{K} \\ &\quad j \in \mathbf{J}, k \in \mathbf{K}, \mathbf{y} \geq 0 \end{aligned} \]
Typically \(J\) and \(K\) are too large to enumerate upfront and we have to work with subsets of them, denoted by \(J_s\) and \(K_s\), respectively. Hence we have the following Restricted Benders Master Problem (\(\mathbf{RBMP}\)):
\[ \begin{aligned} \text{min.} &\quad \mathbf{f}^T \mathbf{y} + g \\ \text{s.t.} &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_k^{ray} \leq 0, \ \forall j \in \mathbf{J}_s \\ &\quad (\mathbf{b} - \mathbf{B} \mathbf{y})^{T} \mathbf{u}_j^{point} \leq g, \ \forall k \in \mathbf{K}_s \\ &\quad j \in \mathbf{J}, k \in \mathbf{K}, \mathbf{y} \geq 0 \end{aligned} \]