6 minute read

总结自 section note 5: Convex Optimization Overview, Part II


0. IntroductionPermalink

In the study of convex optimization, the mathematical optimization problems follow the form below:

minimizexRnf(x)(1)subject toxC.

where

  • xRn is a vector known as the optimization variable
  • f:RnR is a convex function that we want to minimize
  • CRn is a convex set describing the set of feasible (capable of being done or carried out) solutions

A necessary and sufficient condition for xRn to be the globally optimal of problem (1) is that xf(x)=0.

In the more general setting of convex optimization problem with constraints, however, this simple optimality condition does not work. One primary goal of duality theory is to characterize the optimal points of convex programs in a mathematically rigorous way.

Here we propose a form of generic differentiable convex optimization problems, as:

minimizexRnf(x)subject togi(x)0,i=1,,m(OPT)hi(x)=0,i=1,,p

where

  • xRn is the optimization variable
  • f:RnR and gi:RnR are differentiable convex functions
  • hi:RnR are affine functions
    • An affine function is a function of the form f(x)=aTx+b for some aRn, bR
  • gi is one of the m convex inequality constraints
  • hi is one of the p affine equality constraints

1. The LagrangianPermalink

Given a convex constrained minimization problem of the form (OPT), the (generalized) Lagrangian is a function L:Rn×Rm×RpR defined as

(2)L(x,α,β)=f(x)+i=1mαigi(x)+i=1pβihi(x)

where

  • xRn is the primal variables of the Lagrangian
  • αRm and βRp are the dual variables of the Lagrangian, a.k.a Lagrange multipliers

Intuitively, the Lagrangian can be thought of as a modified version of the objective function to the original convex optimization problem (OPT) which accounts for each of the constraints. The Lagrange multipliers αi and βi can be thought of penalties of violating the constraints. The key intuition behind the theory of Lagrange duality is the following:

For any convex optimization problem, there always exist settings of the dual variables such that the unconstrained minimum of the Lagrangian with respect to the primal variables (keeping the dual variables fixed) coincides with the solution of the original constrained minimization problem.

We formalize this intuition when we describe the KKT conditions in Section 6.

2. Primal and dual problemsPermalink

To show the relationship between the Lagrangian and the original convex optimization problem (OPT), we introduce the notions of the “primal” and “dual problems” associated with a Lagrangian:

The primal problemPermalink

The primal objective function θP:RnR is defined as

θP(x)=maxα,β:αi0,iL(x,α,β)

Then the unconstrained minimization problem

(P)minxθP(x)

is known as the primal problem.

  • Generally, we say that a point xRn is primal feasible if gi(x)0, i=1,,m and hi(x)=0, i=1,,p
  • We typically use the vector xRn to denote the solution of (P), and we let p=θP(x) denote the optimal value of the primal objective.

The dual problemPermalink

By switching the order of the minimization and maximization above, we obtain an entirely different optimization problem.

The dual objective function θD:Rm×RpR is defined as

θD(α,β)=minxL(x,α,β)

Then the constrained maximization problem

(D)maxα,β:αi0,iθD(α,β)

is known as the dual problem.

  • Generally, we say that (α,β) are dual feasible if αi(x)0, i=1,,m
  • We typically use the vector (α,β)Rm×Rp to denote the solution of (D), and we let d=θD(α,β) denote the optimal value of the dual objective.

3. Interpreting the primal problemPermalink

θP(x)=maxα,β:αi0,iL(x,α,β)=maxα,β:αi0,i[f(x)+i=1mαigi(x)+i=1pβihi(x)]=f(x)+maxα,β:αi0,i[i=1mαigi(x)+i=1pβihi(x)]

x is primal feasible 时,gi(x)0, hi(x)=0,又因为 αi0,所以 αigi(x) 最大为 0,βihi(x) 恒为 0。 此时 θP(x)=f(x)

x is not primal feasible 时,αigi(x)βihi(x) 都可以任意大,取 max 直接可以得到 。此时 θP(x)=f(x)+

考虑到我们最原始的问题 (OPT) 是求 minf(x),primal problem (P)是求 minθP(x)。所以我们可以认为:当 x is primal feasible 时,求 minθP(x) 等价于求 minf(x)

4. Interpreting the dual problemPermalink

The dual objective, θD(α,β), is a concave function of α and β. To interpret the dual problem, first we make the following observation:

Lemma 1: If (α,β) are dual feasible, then θD(α,β)p.

Proof:

θD(α,β)=minxL(x,α,β)L(x,α,β)=f(x)+i=1mαigi(x)+i=1pβihi(x)f(x)=p

Lemma 2 (Weak Duality): For any pair of primal and dual problems, d=θD(α,β)p.

This is obvious following Lemma 1.

Lemma 3 (Strong Duality): For any pair of primal and dual problems which satisfy certain technical conditions called constraint qualifications, then d=p.

Slater’s condition: a primal/dual problem pair satisfy Slater’s condition if there exists some feasible primal solution x for which all inequality constraints are strictly satisfied (i.e., gi(x)<0, i=1,,m).

注意 Slater’s condition 并不能实现 Strong Duality,书上只是拿它举个例子说明啥叫 “technical conditions”。要实现 Strong Duality,还得靠下文的 KTT conditions。

注意,Strong Duality 的意义并不是简单地得到 d=p,而是说明了 “只要满足了 constraint qualifications,那么 primal problem 等价于 dual problem”,进一步就有:

  • 求 dual problem (D)
    • 等价于
  • 求 primal problem (P)
    • 等价于
  • 求最原始的 optimization 问题 (OPT)

5. Complementary slacknessPermalink

在我们讲 KTT conditions 之前,我们先来说一下 Complementary slackness。

Lemma 4 (Complementary Slackness): If strong duality holds, then αigi(x)=0 for each i=1,,m.

证明比较简单。之前 Lemma 2 有 dp,现在 strong duality holds 了有 d=p。所以按 Lemma 1 的证明走,所有的 现在都要取 =,必然要求 αigi(x)=0

又因为 αi0,所以有:

αi>0gi(x)=0

gi(x)<0αi=0

这个 αigi(x)=0 的 constraint 我们又称为 active constraint。在 SVM 中,support vector 其实就是以 active constraint 的形式存在的。

6. The KKT (Karush-Kuhn-Tucker) conditionsPermalink

Theorem 1: Suppose that xRn, αRm and βRp satisfy the following conditions:

  1. (Primal feasibility) gi(x)0, i=1,,m and hi(x)=0, i=1,,p,
  2. (Dual feasibility) αi0, i=1,,m,
  3. (Complementary slackness) αigi(x)=0, i=1,,m, and
  4. (Lagrangian stationarity) xL(x,α,β)=0.

Then x is primal optimal and (α,β) are dual optimal. Furthermore, if strong duality holds, then any primal optimal x and dual optimal (α,β) must satisfy the conditions 1 through 4.

These conditions are known as the Karush-Kuhn-Tucker (KKT) conditions.

Comments