Created: July 07, 2022
Modified: July 07, 2022

Karush-Kuhn-Tucker conditions

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Given a constrained optimization problem over a convex function $f$ ,

\begin{align*} \min_\mathbf{x} f(&\mathbf{x})\\ \text{s.t. }g(\mathbf{x}) &\le \mathbf{0}\\ h(\mathbf{x}) &= \mathbf{0} \end{align*}

we consider the Lagrangian function

\mathcal{L}\left(\mathbf{x}, \mu, \nu \right) = f(\mathbf{x}) + \mu^Tg(\mathbf{x}) + \nu h(\mathbf{x})

introducing variables $\mu \ge 0$ and $\nu$ which we'll call KKT mutipliers (a generalization of Lagrange multipliers). This has associated primal and dual functions

\begin{align*}p(\mathbf{x}) &= \arg\max_{\mu\ge 0,\nu} \mathcal{L}(\mathbf{x}, \mu, \nu) = \left\{\begin{array}{ll}f(\mathbf{x}) &\text{if } g(\mathbf{x})\le0, h(\mathbf{x})=0\\\infty &\text{otherwise}\end{array}\right.\\ d(\mu, \nu) &= \arg\min_\mathbf{x} \mathcal{L}(\mathbf{x}, \mu, \nu)\end{align*}

Now suppose we have (somehow) found primal and dual optimal points

\begin{align*} \mathbf{x}^* &= \arg\min_\mathbf{x} p(\mathbf{x})\\ \mu^*, \nu^* &= \arg\max_{\mu\ge 0,\nu} d(\mu, \nu)\end{align*}

and that the duality gap is zero, i.e., we have

p(\mathbf{x}^*) = \mathcal{L}(\mathbf{x}^*, \mu^*, \nu^*) = d(\mu^*, \nu^*)

so in particular all of these quantities equal to $f(\mathbf{x}^*)$ by definition of $p$ . Together $(\mathbf{x}^*, \mu^*, \nu^*)$ thus constitute a saddle point of the Lagrangian, meaning that

\begin{align*} \mathbf{x}^* &= \arg\min_\mathbf{x} \mathcal{L}(\mathbf{x}, \mu^*, \nu^*)\\ \mu^*, \nu^* &= \arg\max_\mathbf{\mu\ge 0,\nu} \mathcal{L}(\mathbf{x}^*, \mu, \nu) \end{align*}

In this case we can say a few things:

Feasibility: we have $g(\mathbf{x}^*) \le \mathbf{0}$ and $h(\mathbf{x}^*) = \mathbf{0}$ ("primal feasibility"), as well as $\mu^* \ge \mathbf{0}$ ("dual feasibility)". That is, the constraints are satisfied and the multipliers on the inequality constraints are positive.

Complementary slackness: because we have $\mathcal{L}(\mathbf{x}^*, \mu^*, \nu^*) = f(\mathbf{x}^*)$ , it follows that the remaining Lagrangian terms sum to zero,

\sum_i \mu^*_i g_i(\mathbf{x}^*) + \sum_j \nu^*_j h_j(\mathbf{x}^*) = 0,

which together with the feasibility conditions implies for each for each constraint $i$ that

\mu^*_i g_i(\mathbf{x}^*) = 0\;\;\forall i.

This means that we've either pushed the constraint right up to its boundary, $g_i(\mathbf{x}^*) = 0$ , or its Lagrange multiplier $\mu^*_i$ is zero (the case where the constraint just happens to end up satisfied 'for free', without the need for an explicit penalty).

First-order condition: because $\mathbf{x}^*$ minimizes $\mathcal{L}(\cdot, \mu^*, \nu^*)$ , then in particular the gradient at that point is zero: $\nabla_\mathbf{x} \mathcal{L}(\mathbf{x} = \mathbf{x}^*, \mu^*, \nu^*) = \mathbf{0}$ , i.e., we are at a critical point in $\mathbf{x}$ .

These three conditions (feasibility, complementary slackness, and the first-order condition) are collectively known as the Karush-Kuhn-Tucker (KKT) conditions. They apply necessarily when we have $\mathbf{x}^*,\mu^*,\nu^*$ that realize a duality gap of zero.

For a convex optimization problem (where $f, g$ are convex and $h$ is affine), it turns out that satisfying the KKT conditions is sufficient for points $\mathbf{x}^*,\mu^*,\nu^*$ to be primal and dual optimal and implement a zero duality gap. (but note that a set of such points may not exist, e.g., if a zero duality gap isn't possible because strong duality doesn't hold).

References:

Convex Optimization book (Boyd and Vandenberghe)
Scribe notes from Ryan Tibshirani's class: https://www.stat.cmu.edu/~ryantibs/convexopt-F16/scribes/kkt-scribed.pdf

Karush-Kuhn-Tucker conditions

Links to this note

penalties are constraints

Lagrange multiplier

Meta