9  Column Generation Theories

If I have to choose my favorite decomposition algorithm, column generation is the one!

I used column generation in my Ph.D. dissertation and a couple industry projects. It proved to be a very powerful modeling and problem solving technique that I resort to frequently.

In this chapter, I will explain the basic theories behind column generation.

9.1 Linear Programs that are Conducive to Column Generation

Column generation is a linear programming (LP) technique primarily used in LP problems with too many variables. The idea behind column generation is to start with a smaller, more manageable subset of variables and iteratively add new variables (or “columns”) to the model, as needed, to improve the solution.

Define the following linear program as the Master Problem (MP):

\[ \begin{aligned} \text{min.} &\quad \sum_{j \in \Omega} c_jx_j \\ \text{s.t.} &\quad \sum_{j \in \Omega} a_j x_j = b \\ &\quad x_j \geq 0, j \in \Omega \end{aligned} \]

In this formulation, \(\Omega\) represents the set of all candidate columns. In many real-world applications, \(\Omega\) could be prohibitively large that it is impossible to include all the columns in the model at the same time. For these problems, column generation enables us to work with a subset of columns and employ a special pricing subproblem to iteratively identify new columns.

9.2 Restricted Master Problem

Let \(\bar{\Omega}\) represent a subset of \(\Omega\), that is, \(\bar{\Omega} \subset \Omega\). The restricted mater problem (RMP) is defined as follows:

\[ \begin{aligned} \text{min.} &\quad \sum_{j \in \bar{\Omega}} c_jx_j \\ \text{s.t.} &\quad \sum_{j \in \bar{\Omega}} a_j x_j = b \\ &\quad x_j \geq 0, j \in \bar{\Omega} \end{aligned} \]

Let \(\pi\) denote the dual variables associated with the constraints \(\sum_{j \in \bar{\Omega}} a_j x_j = b\). After solving the RMP, we could obtain the optimal dual values \(\pi^*\), which is used to identify new columns to be added to the RMP.

9.3 Pricing Problem

Linear programming optimality condition tells us that a current solution is optimal if there is no variable with negative reduced cost. Note that the reduced cost of the variable \(x_j\) is defined as:

\[ \begin{aligned} \bar{c_j} = c_j - \pi^* a_j \end{aligned} \]

Then the subproblem aims to find a column with \(\bar{c_j} < 0\), which could be formulated as:

\[ \begin{aligned} \text{min.} &\quad \bar{c_j} \\ \text{s.t.} &\quad \bar{c_j} = c_j - \pi^* a_j \\ &\quad j \in \Omega \backslash \bar{\Omega} \end{aligned} \]

This problem is also called the pricing problem.

9.4 Algorithm Workflow

Figure 9.1 shows the workflow of the column generation algorithm. Here’s a detailed workflow of how the column generation algorithm typically operates:

  1. Initialization
    • Identify a subset of columns to initialize the RMP. This subset should still allow for a feasible solution to the problem, though not necessarily optimal.
  2. Solving the RMP
    • Solve the RMP using a linear programming solver to find an initial solution. This solution provides values for the objective function and dual variables, which are used to evaluate the potential of other columns not included in the RMP.
  3. Pricing Problem
    • Based on the dual prices obtained from the RMP, the subproblem is formulated to identify new columns that have the potential to improve the objective of the master problem. The subproblem is typically specific to the application domain, such as route generation in vehicle routing problems.
    • The goal here is to find columns whose reduced costs indicate that they can improve the objective function if added to the RMP.
  4. Column Evaluation
    • If the subproblem generates new columns that can improve the solution, these columns are added to the RMP. If no such columns are found, the current solution to the RMP is optimal under the current column set.
  5. Iteration
    • The process returns to step 2, where the RMP is solved again with the new set of columns. This iterative process continues until the pricing problem indicates that no further improvements can be made (i.e., no columns with negative reduced cost can be found).
  6. Termination
    • The algorithm stops when either no new improving columns can be generated, or another stopping criterion (such as a maximum number of iterations or a satisfactory solution quality) is reached.
  7. Result
    • The final solution of the RMP, after the last iteration, represents the best solution to the overall problem, considering all the columns evaluated during the process.
flowchart TD
   A[Start] --> B[Solve RMP]
    B -->|optimal dual values| D[Solve pricing problem]
    D --> NewColumnFound{New column found?}
    NewColumnFound -- Yes --> C[Retrieve new columns]
    NewColumnFound -- No --> End
    C --> B
Figure 9.1: Column generation workflow

We will illustrate the usage of column generation in the following chapters.