Karush-Kuhn-Tucker conditions: Nonlinear Function
Created: July 07, 2022
Modified: July 07, 2022

Karush-Kuhn-Tucker conditions

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Given a constrained optimization problem over a convex function ff,

minxf(x)s.t. g(x)0h(x)=0\begin{align*} \min_\mathbf{x} f(&\mathbf{x})\\ \text{s.t. }g(\mathbf{x}) &\le \mathbf{0}\\ h(\mathbf{x}) &= \mathbf{0} \end{align*}

we consider the Lagrangian function

L(x,μ,ν)=f(x)+μTg(x)+νh(x)\mathcal{L}\left(\mathbf{x}, \mu, \nu \right) = f(\mathbf{x}) + \mu^Tg(\mathbf{x}) + \nu h(\mathbf{x})

introducing variables μ0\mu \ge 0 and ν\nu which we'll call KKT mutipliers (a generalization of Lagrange multipliers). This has associated primal and dual functions

p(x)=argmaxμ0,νL(x,μ,ν)={f(x)if g(x)0,h(x)=0otherwised(μ,ν)=argminxL(x,μ,ν)\begin{align*}p(\mathbf{x}) &= \arg\max_{\mu\ge 0,\nu} \mathcal{L}(\mathbf{x}, \mu, \nu) = \left\{\begin{array}{ll}f(\mathbf{x}) &\text{if } g(\mathbf{x})\le0, h(\mathbf{x})=0\\\infty &\text{otherwise}\end{array}\right.\\ d(\mu, \nu) &= \arg\min_\mathbf{x} \mathcal{L}(\mathbf{x}, \mu, \nu)\end{align*}

Now suppose we have (somehow) found primal and dual optimal points

x=argminxp(x)μ,ν=argmaxμ0,νd(μ,ν)\begin{align*} \mathbf{x}^* &= \arg\min_\mathbf{x} p(\mathbf{x})\\ \mu^*, \nu^* &= \arg\max_{\mu\ge 0,\nu} d(\mu, \nu)\end{align*}

and that the duality gap is zero, i.e., we have

p(x)=L(x,μ,ν)=d(μ,ν)p(\mathbf{x}^*) = \mathcal{L}(\mathbf{x}^*, \mu^*, \nu^*) = d(\mu^*, \nu^*)

so in particular all of these quantities equal to f(x)f(\mathbf{x}^*) by definition of pp. Together (x,μ,ν)(\mathbf{x}^*, \mu^*, \nu^*) thus constitute a saddle point of the Lagrangian, meaning that

x=argminxL(x,μ,ν)μ,ν=argmaxμ0,νL(x,μ,ν)\begin{align*} \mathbf{x}^* &= \arg\min_\mathbf{x} \mathcal{L}(\mathbf{x}, \mu^*, \nu^*)\\ \mu^*, \nu^* &= \arg\max_\mathbf{\mu\ge 0,\nu} \mathcal{L}(\mathbf{x}^*, \mu, \nu) \end{align*}

In this case we can say a few things:

Feasibility: we have g(x)0g(\mathbf{x}^*) \le \mathbf{0} and h(x)=0h(\mathbf{x}^*) = \mathbf{0} ("primal feasibility"), as well as μ0\mu^* \ge \mathbf{0} ("dual feasibility)". That is, the constraints are satisfied and the multipliers on the inequality constraints are positive.

Complementary slackness: because we have L(x,μ,ν)=f(x)\mathcal{L}(\mathbf{x}^*, \mu^*, \nu^*) = f(\mathbf{x}^*) , it follows that the remaining Lagrangian terms sum to zero,

iμigi(x)+jνjhj(x)=0,\sum_i \mu^*_i g_i(\mathbf{x}^*) + \sum_j \nu^*_j h_j(\mathbf{x}^*) = 0,

which together with the feasibility conditions implies for each for each constraint iithat

μigi(x)=0    i.\mu^*_i g_i(\mathbf{x}^*) = 0\;\;\forall i.

This means that we've either pushed the constraint right up to its boundary, gi(x)=0g_i(\mathbf{x}^*) = 0, or its Lagrange multiplier μi\mu^*_i is zero (the case where the constraint just happens to end up satisfied 'for free', without the need for an explicit penalty).

First-order condition: because x\mathbf{x}^* minimizes L(,μ,ν)\mathcal{L}(\cdot, \mu^*, \nu^*), then in particular the gradient at that point is zero: xL(x=x,μ,ν)=0\nabla_\mathbf{x} \mathcal{L}(\mathbf{x} = \mathbf{x}^*, \mu^*, \nu^*) = \mathbf{0}, i.e., we are at a critical point in x\mathbf{x}.

These three conditions (feasibility, complementary slackness, and the first-order condition) are collectively known as the Karush-Kuhn-Tucker (KKT) conditions. They apply necessarily when we have x,μ,ν\mathbf{x}^*,\mu^*,\nu^* that realize a duality gap of zero.

For a convex optimization problem (where f,gf, g are convex and hh is affine), it turns out that satisfying the KKT conditions is sufficient for points x,μ,ν\mathbf{x}^*,\mu^*,\nu^* to be primal and dual optimal and implement a zero duality gap. (but note that a set of such points may not exist, e.g., if a zero duality gap isn't possible because strong duality doesn't hold).

References: