Misplaced Pages

Karush–Kuhn–Tucker conditions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In mathematical optimization , the Karush–Kuhn–Tucker ( KKT ) conditions , also known as the Kuhn–Tucker conditions , are first derivative tests (sometimes called first-order necessary conditions ) for a solution in nonlinear programming to be optimal , provided that some regularity conditions are satisfied.

#496503

99-518: Allowing inequality constraints, the KKT approach to nonlinear programming generalizes the method of Lagrange multipliers , which allows only equality constraints. Similar to the Lagrange approach, the constrained maximization (minimization) problem is rewritten as a Lagrange function whose optimal point is a global maximum or minimum over the domain of the choice variables and a global minimum (maximum) over

198-794: A i {\displaystyle a_{i}} is interpreted as a resource constraint, the coefficients tell you how much increasing a resource will increase the optimum value of our function f {\displaystyle f} . This interpretation is especially important in economics and is used, for instance, in utility maximization problems . With an extra multiplier μ 0 ≥ 0 {\displaystyle \mu _{0}\geq 0} , which may be zero (as long as ( μ 0 , μ , λ ) ≠ 0 {\displaystyle (\mu _{0},\mu ,\lambda )\neq 0} ), in front of ∇ f ( x ∗ ) {\displaystyle \nabla f(x^{*})}

297-473: A i , i ∈ { 1 , … , m } } . {\displaystyle \{a\in \mathbb {R} ^{m}\mid {\text{for some }}x\in X,g_{i}(x)\leq a_{i},i\in \{1,\ldots ,m\}\}.} Given this definition, each coefficient μ i {\displaystyle \mu _{i}} is the rate at which the value function increases as a i {\displaystyle a_{i}} increases. Thus if each

396-450: A profit maximizing firm, which operates at a level at which they are equal. If we reconsider the optimization problem as a maximization problem with constant inequality constraints: The value function is defined as so the domain of V {\displaystyle V} is { a ∈ R m ∣ for some  x ∈ X , g i ( x ) ≤

495-709: A smooth manifold of dimension   m   . {\displaystyle \ m~.} Suppose that we wish to find the stationary points   x   {\displaystyle \ x\ } of a smooth function   f : M → R   {\displaystyle \ f:M\to \mathbb {R} \ } when restricted to the submanifold   N   {\displaystyle \ N\ } defined by   g ( x ) = 0   , {\displaystyle \ g(x)=0\ ,} where   g : M → R   {\displaystyle \ g:M\to \mathbb {R} \ }

594-445: A Euclidean space, or even a Riemannian manifold. All appearances of the gradient   ∇   {\displaystyle \ \nabla \ } (which depends on a choice of Riemannian metric) can be replaced with the exterior derivative   d ⁡   . {\displaystyle \ \operatorname {d} ~.} Let   M   {\displaystyle \ M\ } be

693-490: A constrained minimizer also satisfies the KKT conditions. Some common examples for conditions that guarantee this are tabulated in the following, with the LICQ the most frequently used one: The strict implications can be shown and In practice weaker constraint qualifications are preferred since they apply to a broader selection of problems. In some cases, the necessary conditions are also sufficient for optimality. In general,

792-408: A function f ( x ) {\displaystyle f(x)} in an unconstrained problem has to satisfy the condition ∇ f ( x ∗ ) = 0 {\displaystyle \nabla f(x^{*})=0} . For the constrained case, the situation is more complicated, and one can state a variety of (increasingly complicated) "regularity" conditions under which

891-429: A function of x {\displaystyle x} and the Lagrange multiplier λ   {\displaystyle \lambda ~} . This means that all partial derivatives should be zero, including the partial derivative with respect to λ   {\displaystyle \lambda ~} . or equivalently The solution corresponding to the original constrained optimization

990-552: A new variable ( λ {\displaystyle \lambda } ) called a Lagrange multiplier (or Lagrange undetermined multiplier ) and study the Lagrange function (or Lagrangian or Lagrangian expression ) defined by L ( x , y , λ ) = f ( x , y ) + λ ⋅ g ( x , y ) , {\displaystyle {\mathcal {L}}(x,y,\lambda )=f(x,y)+\lambda \cdot g(x,y),} where

1089-699: A point x ∗ ∈ R n {\displaystyle x^{*}\in \mathbb {R} ^{n}} . If x ∗ {\displaystyle x^{*}} is a local optimum and the optimization problem satisfies some regularity conditions (see below), then there exist constants μ i   ( i = 1 , … , m ) {\displaystyle \mu _{i}\ (i=1,\ldots ,m)} and λ j   ( j = 1 , … , ℓ ) {\displaystyle \lambda _{j}\ (j=1,\ldots ,\ell )} , called KKT multipliers, such that

SECTION 10

#1732783390497

1188-444: A positive first derivative and with a zero value at zero output, C ( Q ) {\displaystyle C(Q)} be production costs with a positive first derivative and with a non-negative value at zero output, and G min {\displaystyle G_{\min }} be the positive minimal acceptable level of profit , then the problem is a meaningful one if the revenue function levels off so it eventually

1287-563: A reformulation of the original problem, known as the Lagrangian function or Lagrangian. In the general case, the Lagrangian is defined as for functions f , g {\displaystyle f,g} ; the notation ⟨ ⋅ , ⋅ ⟩ {\displaystyle \langle \cdot ,\cdot \rangle } denotes an inner product . The value λ {\displaystyle \lambda }

1386-423: A similar argument. Consider a paraboloid subject to two line constraints that intersect at a single point. As the only feasible solution, this point is obviously a constrained extremum. However, the level set of f {\displaystyle f} is clearly not parallel to either constraint at the intersection point (see Figure 3); instead, it is a linear combination of the two constraints' gradients. In

1485-521: A unique Lagrange multiplier λ ⋆ ∈ R c {\displaystyle \lambda _{\star }\in \mathbb {R} ^{c}} such that D ⁡ f ( x ⋆ ) = λ ⋆ T D ⁡ g ( x ⋆ )   . {\displaystyle \operatorname {D} f(x_{\star })=\lambda _{\star }^{\mathsf {T}}\operatorname {D} g(x_{\star })~.} (Note that this

1584-1245: Is   g ( x , y )   . {\displaystyle \ g(x,y)~.} To summarize ∇ x , y , λ L ( x , y , λ ) = 0 ⟺ { ∇ x , y f ( x , y ) = − λ ∇ x , y g ( x , y ) g ( x , y ) = 0 {\displaystyle \nabla _{x,y,\lambda }{\mathcal {L}}(x,y,\lambda )=0\iff {\begin{cases}\nabla _{x,y}f(x,y)=-\lambda \,\nabla _{x,y}g(x,y)\\g(x,y)=0\end{cases}}} The method generalizes readily to functions on n {\displaystyle n} variables ∇ x 1 , … , x n , λ L ( x 1 , … , x n , λ ) = 0 {\displaystyle \nabla _{x_{1},\dots ,x_{n},\lambda }{\mathcal {L}}(x_{1},\dots ,x_{n},\lambda )=0} which amounts to solving n + 1 equations in n + 1 unknowns. The constrained extrema of f are critical points of

1683-400: Is a stationary point for the Lagrange function (stationary points are those points where the first partial derivatives of L {\displaystyle {\mathcal {L}}} are zero). The assumption ∇ g ≠ 0 {\displaystyle \nabla g\neq 0} is called constraint qualification. However, not all stationary points yield a solution of

1782-1238: Is a regular value . Let N {\displaystyle N} be the submanifold of   M   {\displaystyle \ M\ } defined by   G ( x ) = 0   . {\displaystyle \ G(x)=0~.}   x   {\displaystyle \ x\ } is a stationary point of f | N {\displaystyle f|_{N}} if and only if   ker ⁡ ( d ⁡ f x )   {\displaystyle \ \ker(\operatorname {d} f_{x})\ } contains   ker ⁡ ( d ⁡ G x )   . {\displaystyle \ \ker(\operatorname {d} G_{x})~.} For convenience let   L x = d ⁡ f x   {\displaystyle \ L_{x}=\operatorname {d} f_{x}\ } and   K x = d ⁡ G x   , {\displaystyle \ K_{x}=\operatorname {d} G_{x}\ ,} where   d ⁡ G {\displaystyle \ \operatorname {d} G} denotes

1881-468: Is a good counter-example, see also Peano surface . Often in mathematical economics the KKT approach is used in theoretical models in order to obtain qualitative results. For example, consider a firm that maximizes its sales revenue subject to a minimum profit constraint. Letting Q {\displaystyle Q} be the quantity of output produced (to be chosen), R ( Q ) {\displaystyle R(Q)} be sales revenue with

1980-554: Is a saddle point of L ( x , α ) {\displaystyle L(\mathbf {x} ,\mathbf {\alpha } )} . Since the idea of this approach is to find a supporting hyperplane on the feasible set Γ = { x ∈ X : g i ( x ) ≤ 0 , i = 1 , … , m } {\displaystyle \mathbf {\Gamma } =\left\{\mathbf {x} \in \mathbf {X} :g_{i}(\mathbf {x} )\leq 0,i=1,\ldots ,m\right\}} ,

2079-473: Is a smooth function for which 0 is a regular value . Let   d ⁡ f   {\displaystyle \ \operatorname {d} f\ } and   d ⁡ g   {\displaystyle \ \operatorname {d} g\ } be the exterior derivatives of   f   {\displaystyle \ f\ } and   g   {\displaystyle \ g\ } . Stationarity for

SECTION 20

#1732783390497

2178-481: Is a somewhat conventional thing where λ ⋆ {\displaystyle \lambda _{\star }} is clearly treated as a column vector to ensure that the dimensions match. But, we might as well make it just a row vector without taking the transpose.) The Lagrange multiplier theorem states that at any local maximum (or minimum) of the function evaluated under the equality constraints, if constraint qualification applies (explained below), then

2277-408: Is a strict constrained local minimum in the case the inequality is also strict. If s T ∇ x x 2 L ( x ∗ , λ ∗ , μ ∗ ) s = 0 {\displaystyle s^{T}\nabla _{xx}^{2}L(x^{*},\lambda ^{*},\mu ^{*})s=0} , the third order Taylor expansion of

2376-445: Is always a saddle point of the Lagrangian function, which can be identified among the stationary points from the definiteness of the bordered Hessian matrix . The great advantage of this method is that it allows the optimization to be solved without explicit parameterization in terms of the constraints. As a result, the method of Lagrange multipliers is widely used to solve challenging constrained optimization problems. Further,

2475-945: Is an optimal vector for the above optimization problem. (necessity) Suppose that f ( x ) {\displaystyle f(\mathbf {x} )} and g i ( x ) {\displaystyle g_{i}(\mathbf {x} )} , i = 1 , … , m {\displaystyle i=1,\ldots ,m} , are convex in X {\displaystyle \mathbf {X} } and that there exists x 0 ∈ relint ⁡ ( X ) {\displaystyle \mathbf {x} _{0}\in \operatorname {relint} (\mathbf {X} )} such that g ( x 0 ) < 0 {\displaystyle \mathbf {g} (\mathbf {x} _{0})<\mathbf {0} } (i.e., Slater's condition holds). Then with an optimal vector x ∗ {\displaystyle \mathbf {x} ^{\ast }} for

2574-496: Is called the Lagrange multiplier. In simple cases, where the inner product is defined as the dot product , the Lagrangian is The method can be summarized as follows: in order to find the maximum or minimum of a function f {\displaystyle f} subject to the equality constraint g ( x ) = 0 {\displaystyle g(x)=0} , find the stationary points of L {\displaystyle {\mathcal {L}}} considered as

2673-524: Is equivalent to primal stationarity. Fix x ∗ {\displaystyle x^{*}} , and vary ( μ , λ ) {\displaystyle (\mu ,\lambda )} : equilibrium is equivalent to primal feasibility and complementary slackness. Sufficiency: the solution pair x ∗ , ( μ ∗ , λ ∗ ) {\displaystyle x^{*},(\mu ^{*},\lambda ^{*})} satisfies

2772-442: Is less steep than the cost function. The problem expressed in the previously given minimization form is and the KKT conditions are Since Q = 0 {\displaystyle Q=0} would violate the minimum profit constraint, we have Q > 0 {\displaystyle Q>0} and hence the third condition implies that the first condition holds with equality. Solving that equality gives Because it

2871-626: Is necessary and sufficient that the following system of   1 2 m ( m − 1 )   {\displaystyle \ {\tfrac {1}{2}}m(m-1)\ } equations holds: d ⁡ f x ∧ d ⁡ g x = 0 ∈ Λ 2 ( T x ∗ M ) {\displaystyle \operatorname {d} f_{x}\wedge \operatorname {d} g_{x}=0\in \Lambda ^{2}(T_{x}^{\ast }M)} where   ∧   {\displaystyle \ \wedge \ } denotes

2970-603: Is not necessary to explicitly find the Lagrange multipliers, the numbers   λ 1 , … , λ p   {\displaystyle \ \lambda _{1},\ldots ,\lambda _{p}\ } such that   d ⁡ f x = ∑ i = 1 p λ i d ⁡ ( g i ) x   . {\displaystyle \ \operatorname {d} f_{x}=\sum _{i=1}^{p}\lambda _{i}\operatorname {d} (g_{i})_{x}~.} In this section, we modify

3069-1340: Is perpendicular to ∇ f ( x ) {\displaystyle \nabla f(\mathbf {x} )} (otherwise we could increase f {\displaystyle f} by moving along that allowable direction). In other words, ∇ f ( x ) ∈ A ⊥ = S . {\displaystyle \nabla f(\mathbf {x} )\in A^{\perp }=S\,.} Thus there are scalars λ 1 , λ 2 ,   … , λ M {\displaystyle \lambda _{1},\lambda _{2},\ \dots ,\lambda _{M}} such that ∇ f ( x ) = ∑ k = 1 M λ k ∇ g k ( x ) ⟺ ∇ f ( x ) − ∑ k = 1 M λ k ∇ g k ( x ) = 0   . {\displaystyle \nabla f(\mathbf {x} )=\sum _{k=1}^{M}\lambda _{k}\,\nabla g_{k}(\mathbf {x} )\quad \iff \quad \nabla f(\mathbf {x} )-\sum _{k=1}^{M}{\lambda _{k}\nabla g_{k}(\mathbf {x} )}=0~.} These scalars are

Karush–Kuhn–Tucker conditions - Misplaced Pages Continue

3168-406: Is positive and so the revenue-maximizing firm operates at a level of output at which marginal revenue d R / d Q {\displaystyle {\text{d}}R/{\text{d}}Q} is less than marginal cost d C / d Q {\displaystyle {\text{d}}C/{\text{d}}Q} — a result that is of interest because it contrasts with the behavior of

3267-475: Is shown separately rather than being included in g {\displaystyle g} , in which case the constraint is written g ( x , y ) = c , {\displaystyle g(x,y)=c,} as in Figure 1.) We assume that both f {\displaystyle f} and g {\displaystyle g} have continuous first partial derivatives . We introduce

3366-408: Is that the constraint gradients at the relevant point are linearly independent. The problem of finding the local maxima and minima subject to constraints can be generalized to finding local maxima and minima on a differentiable manifold   M   . {\displaystyle \ M~.} In what follows, it is not necessary that M {\displaystyle M} be

3465-512: Is the marginal cost of the constraint, and is referred to as the shadow price . Sufficient conditions for a constrained local maximum or minimum can be stated in terms of a sequence of principal minors (determinants of upper-left-justified sub-matrices) of the bordered Hessian matrix of second derivatives of the Lagrangian expression. Suppose we wish to maximize   f ( x , y ) = x + y   {\displaystyle \ f(x,y)=x+y\ } subject to

3564-553: Is the method of Lagrange multipliers. Note that   ∇ λ L ( x , y , λ ) = 0   {\displaystyle \ \nabla _{\lambda }{\mathcal {L}}(x,y,\lambda )=0\ } implies   g ( x , y ) = 0   , {\displaystyle \ g(x,y)=0\ ,} as the partial derivative of L {\displaystyle {\mathcal {L}}} with respect to λ {\displaystyle \lambda }

3663-577: Is the optimization variable chosen from a convex subset of R n {\displaystyle \mathbb {R} ^{n}} , f {\displaystyle f} is the objective or utility function, g i   ( i = 1 , … , m ) {\displaystyle g_{i}\ (i=1,\ldots ,m)} are the inequality constraint functions and h j   ( j = 1 , … , ℓ ) {\displaystyle h_{j}\ (j=1,\ldots ,\ell )} are

3762-420: Is the rate of change of the quantity being optimized as a function of the constraint parameter. As examples, in Lagrangian mechanics the equations of motion are derived by finding stationary points of the action , the time integral of the difference between kinetic and potential energy. Thus, the force on a particle due to a scalar potential, F = −∇ V , can be interpreted as a Lagrange multiplier determining

3861-754: The λ {\displaystyle \lambda } term may be either added or subtracted. If f ( x 0 , y 0 ) {\displaystyle f(x_{0},y_{0})} is a maximum of f ( x , y ) {\displaystyle f(x,y)} for the original constrained problem and ∇ g ( x 0 , y 0 ) ≠ 0 , {\displaystyle \nabla g(x_{0},y_{0})\neq 0,} then there exists λ 0 {\displaystyle \lambda _{0}} such that ( x 0 , y 0 , λ 0 {\displaystyle x_{0},y_{0},\lambda _{0}} )

3960-418: The ∂ g i ( x ∗ ) {\displaystyle \partial g_{i}(x^{*})} forces must be one-sided, pointing inwards into the feasible set for x {\displaystyle x} . Complementary slackness states that if g i ( x ∗ ) < 0 {\displaystyle g_{i}(x^{*})<0} , then

4059-487: The exterior product . The stationary points   x   {\displaystyle \ x\ } are the solutions of the above system of equations plus the constraint   g ( x ) = 0   . {\displaystyle \ g(x)=0~.} Note that the   1 2 m ( m − 1 )   {\displaystyle \ {\tfrac {1}{2}}m(m-1)\ } equations are not independent, since

Karush–Kuhn–Tucker conditions - Misplaced Pages Continue

4158-406: The gradient of the function (at that point) can be expressed as a linear combination of the gradients of the constraints (at that point), with the Lagrange multipliers acting as coefficients . This is equivalent to saying that any direction perpendicular to all gradients of the constraints is also perpendicular to the gradient of the function. Or still, saying that the directional derivative of

4257-566: The objective function f : R n → R {\displaystyle f\colon \mathbb {R} ^{n}\rightarrow \mathbb {R} } and the constraint functions g i : R n → R {\displaystyle g_{i}\colon \mathbb {R} ^{n}\rightarrow \mathbb {R} } and h j : R n → R {\displaystyle h_{j}\colon \mathbb {R} ^{n}\rightarrow \mathbb {R} } have subderivatives at

4356-441: The KKT conditions turn into the Lagrange conditions, and the KKT multipliers are called Lagrange multipliers . Theorem  —  (sufficiency) If there exists a solution x ∗ {\displaystyle x^{*}} to the primal problem, a solution ( μ ∗ , λ ∗ ) {\displaystyle (\mu ^{*},\lambda ^{*})} to

4455-432: The KKT conditions, thus is a Nash equilibrium, and therefore closes the duality gap. Necessity: any solution pair x ∗ , ( μ ∗ , λ ∗ ) {\displaystyle x^{*},(\mu ^{*},\lambda ^{*})} must close the duality gap, thus they must constitute a Nash equilibrium (since neither side could do any better), thus they satisfy

4554-520: The KKT conditions. First, for the x ∗ , ( μ ∗ , λ ∗ ) {\displaystyle x^{*},(\mu ^{*},\lambda ^{*})} to satisfy the KKT conditions is equivalent to them being a Nash equilibrium . Fix ( μ ∗ , λ ∗ ) {\displaystyle (\mu ^{*},\lambda ^{*})} , and vary x {\displaystyle x} : equilibrium

4653-708: The KKT conditions. The primal problem can be interpreted as moving a particle in the space of x {\displaystyle x} , and subjecting it to three kinds of force fields: Primal stationarity states that the "force" of ∂ f ( x ∗ ) {\displaystyle \partial f(x^{*})} is exactly balanced by a linear sum of forces ∂ h j ( x ∗ ) {\displaystyle \partial h_{j}(x^{*})} and ∂ g i ( x ∗ ) {\displaystyle \partial g_{i}(x^{*})} . Dual feasibility additionally states that all

4752-513: The KKT stationarity conditions turn into which are called the Fritz John conditions . This optimality conditions holds without constraint qualifications and it is equivalent to the optimality condition KKT or (not-MFCQ) . Lagrange multipliers In mathematical optimization , the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints (i.e., subject to

4851-2139: The Lagrange multipliers. We now have M {\displaystyle M} of them, one for every constraint. As before, we introduce an auxiliary function L ( x 1 , … , x n , λ 1 , … , λ M ) = f ( x 1 , … , x n ) − ∑ k = 1 M λ k g k ( x 1 , … , x n )   {\displaystyle {\mathcal {L}}\left(x_{1},\ldots ,x_{n},\lambda _{1},\ldots ,\lambda _{M}\right)=f\left(x_{1},\ldots ,x_{n}\right)-\sum \limits _{k=1}^{M}{\lambda _{k}g_{k}\left(x_{1},\ldots ,x_{n}\right)}\ } and solve ∇ x 1 , … , x n , λ 1 , … , λ M L ( x 1 , … , x n , λ 1 , … , λ M ) = 0 ⟺ { ∇ f ( x ) − ∑ k = 1 M λ k ∇ g k ( x ) = 0 g 1 ( x ) = ⋯ = g M ( x ) = 0 {\displaystyle \nabla _{x_{1},\ldots ,x_{n},\lambda _{1},\ldots ,\lambda _{M}}{\mathcal {L}}(x_{1},\ldots ,x_{n},\lambda _{1},\ldots ,\lambda _{M})=0\iff {\begin{cases}\nabla f(\mathbf {x} )-\sum _{k=1}^{M}{\lambda _{k}\,\nabla g_{k}(\mathbf {x} )}=0\\g_{1}(\mathbf {x} )=\cdots =g_{M}(\mathbf {x} )=0\end{cases}}} which amounts to solving n + M {\displaystyle n+M} equations in   n + M   {\displaystyle \ n+M\ } unknowns. The constraint qualification assumption when there are multiple constraints

4950-459: The Lagrangian L {\displaystyle {\mathcal {L}}} , but they are not necessarily local extrema of L {\displaystyle {\mathcal {L}}} (see § Example 2 below). One may reformulate the Lagrangian as a Hamiltonian , in which case the solutions are local minima for the Hamiltonian. This is done in optimal control theory, in

5049-1510: The Lagrangian expression L {\displaystyle {\mathcal {L}}} . Often the Lagrange multipliers have an interpretation as some quantity of interest. For example, by parametrising the constraint's contour line, that is, if the Lagrangian expression is L ( x 1 , x 2 , … ; λ 1 , λ 2 , … ; c 1 , c 2 , … ) = f ( x 1 , x 2 , … ) + λ 1 ( c 1 − g 1 ( x 1 , x 2 , … ) ) + λ 2 ( c 2 − g 2 ( x 1 , x 2 , … ) ) + ⋯ {\displaystyle {\begin{aligned}&{\mathcal {L}}(x_{1},x_{2},\ldots ;\lambda _{1},\lambda _{2},\ldots ;c_{1},c_{2},\ldots )\\[4pt]={}&f(x_{1},x_{2},\ldots )+\lambda _{1}(c_{1}-g_{1}(x_{1},x_{2},\ldots ))+\lambda _{2}(c_{2}-g_{2}(x_{1},x_{2},\dots ))+\cdots \end{aligned}}} then   ∂ L ∂ c k = λ k   . {\displaystyle \ {\frac {\partial {\mathcal {L}}}{\partial c_{k}}}=\lambda _{k}~.} So, λ k

SECTION 50

#1732783390497

5148-501: The Lagrangian function, L ( x , y , λ ) = f ( x , y ) + λ ⋅ g ( x , y ) = x + y + λ ( x 2 + y 2 − 1 )   , {\displaystyle {\begin{aligned}{\mathcal {L}}(x,y,\lambda )&=f(x,y)+\lambda \cdot g(x,y)\\[4pt]&=x+y+\lambda (x^{2}+y^{2}-1)\ ,\end{aligned}}}

5247-437: The Lagrangian should be used to verify if x ∗ {\displaystyle x^{*}} is a local minimum. The minimization of f ( x 1 , x 2 ) = ( x 2 − x 1 2 ) ( x 2 − 3 x 1 2 ) {\displaystyle f(x_{1},x_{2})=(x_{2}-x_{1}^{2})(x_{2}-3x_{1}^{2})}

5346-648: The above optimization problem there is associated a vector α ∗ = [ μ ∗ λ ∗ ] {\displaystyle \mathbf {\alpha } ^{\ast }={\begin{bmatrix}\mu ^{*}\\\lambda ^{*}\end{bmatrix}}} satisfying μ ∗ ≥ 0 {\displaystyle \mathbf {\mu } ^{*}\geq \mathbf {0} } such that ( x ∗ , α ∗ ) {\displaystyle (\mathbf {x} ^{\ast },\mathbf {\alpha } ^{\ast })}

5445-496: The above section is a constrained local minimum if for the Lagrangian, then, where s ≠ 0 {\displaystyle s\neq 0} is a vector satisfying the following, where only those active inequality constraints g i ( x ) {\displaystyle g_{i}(x)} corresponding to strict complementarity (i.e. where μ i > 0 {\displaystyle \mu _{i}>0} ) are applied. The solution

5544-629: The above section regarding the case of a single constraint. Rather than the function g {\displaystyle g} described there, now consider a smooth function   G : M → R p ( p > 1 )   , {\displaystyle \ G:M\to \mathbb {R} ^{p}(p>1)\ ,} with component functions   g i : M → R   , {\displaystyle \ g_{i}:M\to \mathbb {R} \ ,} for which 0 ∈ R p {\displaystyle 0\in \mathbb {R} ^{p}}

5643-430: The case of multiple constraints, that will be what we seek in general: The method of Lagrange seeks points not at which the gradient of f {\displaystyle f} is a multiple of any single constraint's gradient necessarily, but in which it is a linear combination of all the constraints' gradients. Concretely, suppose we have M {\displaystyle M} constraints and are walking along

5742-401: The change in action (transfer of potential to kinetic energy) following a variation in the particle's constrained trajectory. In control theory this is formulated instead as costate equations . Moreover, by the envelope theorem the optimal value of a Lagrange multiplier has an interpretation as the marginal effect of the corresponding constraint constant upon the optimal attainable value of

5841-435: The condition that one or more equations have to be satisfied exactly by the chosen values of the variables ). It is named after the mathematician Joseph-Louis Lagrange . The basic idea is to convert a constrained problem into a form such that the derivative test of an unconstrained problem can still be applied. The relationship between the gradient of the function and gradients of the constraints rather naturally leads to

5940-517: The constraint   x 2 + y 2 = 1   . {\displaystyle \ x^{2}+y^{2}=1~.} The feasible set is the unit circle, and the level sets of f are diagonal lines (with slope −1), so we can see graphically that the maximum occurs at   ( 1 2 , 1 2 )   , {\displaystyle \ \left({\tfrac {1}{\sqrt {2}}},{\tfrac {1}{\sqrt {2}}}\right)\ ,} and that

6039-485: The constraint equations from the form g i ( x ) = 0 {\displaystyle g_{i}({\bf {x}})=0} to the form   g i ( x ) = c i   , {\displaystyle \ g_{i}({\bf {x}})=c_{i}\ ,} where the   c i   {\displaystyle \ c_{i}\ } are m real constants that are considered to be additional arguments of

SECTION 60

#1732783390497

6138-1520: The constraint functions. Let g ( x ) : R n → R m {\displaystyle \mathbf {g} (x):\,\!\mathbb {R} ^{n}\rightarrow \mathbb {R} ^{m}} be defined as g ( x ) = ( g 1 ( x ) , … , g m ( x ) ) ⊤ {\displaystyle \mathbf {g} (x)=\left(g_{1}(x),\ldots ,g_{m}(x)\right)^{\top }} and let h ( x ) : R n → R ℓ {\displaystyle \mathbf {h} (x):\,\!\mathbb {R} ^{n}\rightarrow \mathbb {R} ^{\ell }} be defined as h ( x ) = ( h 1 ( x ) , … , h ℓ ( x ) ) ⊤ {\displaystyle \mathbf {h} (x)=\left(h_{1}(x),\ldots ,h_{\ell }(x)\right)^{\top }} . Let μ = ( μ 1 , … , μ m ) ⊤ {\displaystyle {\boldsymbol {\mu }}=\left(\mu _{1},\ldots ,\mu _{m}\right)^{\top }} and λ = ( λ 1 , … , λ ℓ ) ⊤ {\displaystyle {\boldsymbol {\lambda }}=\left(\lambda _{1},\ldots ,\lambda _{\ell }\right)^{\top }} . Then

6237-500: The dual problem, such that together they satisfy the KKT conditions, then the problem pair has strong duality, and x ∗ , ( μ ∗ , λ ∗ ) {\displaystyle x^{*},(\mu ^{*},\lambda ^{*})} is a solution pair to the primal and dual problems. (necessity) If the problem pair has strong duality, then for any solution x ∗ {\displaystyle x^{*}} to

6336-2948: The equality constraint functions. The numbers of inequalities and equalities are denoted by m {\displaystyle m} and ℓ {\displaystyle \ell } respectively. Corresponding to the constrained optimization problem one can form the Lagrangian function L ( x , μ , λ ) = f ( x ) + μ ⊤ g ( x ) + λ ⊤ h ( x ) = L ( x , α ) = f ( x ) + α ⊤ ( g ( x ) h ( x ) ) {\displaystyle {\mathcal {L}}(\mathbf {x} ,\mathbf {\mu } ,\mathbf {\lambda } )=f(\mathbf {x} )+\mathbf {\mu } ^{\top }\mathbf {g} (\mathbf {x} )+\mathbf {\lambda } ^{\top }\mathbf {h} (\mathbf {x} )=L(\mathbf {x} ,\mathbf {\alpha } )=f(\mathbf {x} )+\mathbf {\alpha } ^{\top }{\begin{pmatrix}\mathbf {g} (\mathbf {x} )\\\mathbf {h} (\mathbf {x} )\end{pmatrix}}} where g ( x ) = [ g 1 ( x ) ⋮ g i ( x ) ⋮ g m ( x ) ] , h ( x ) = [ h 1 ( x ) ⋮ h j ( x ) ⋮ h ℓ ( x ) ] , μ = [ μ 1 ⋮ μ i ⋮ μ m ] , λ = [ λ 1 ⋮ λ j ⋮ λ ℓ ] and α = [ μ λ ] . {\displaystyle \mathbf {g} \left(\mathbf {x} \right)={\begin{bmatrix}g_{1}\left(\mathbf {x} \right)\\\vdots \\g_{i}\left(\mathbf {x} \right)\\\vdots \\g_{m}\left(\mathbf {x} \right)\end{bmatrix}},\quad \mathbf {h} \left(\mathbf {x} \right)={\begin{bmatrix}h_{1}\left(\mathbf {x} \right)\\\vdots \\h_{j}\left(\mathbf {x} \right)\\\vdots \\h_{\ell }\left(\mathbf {x} \right)\end{bmatrix}},\quad \mathbf {\mu } ={\begin{bmatrix}\mu _{1}\\\vdots \\\mu _{i}\\\vdots \\\mu _{m}\\\end{bmatrix}},\quad \mathbf {\lambda } ={\begin{bmatrix}\lambda _{1}\\\vdots \\\lambda _{j}\\\vdots \\\lambda _{\ell }\end{bmatrix}}\quad {\text{and}}\quad \mathbf {\alpha } ={\begin{bmatrix}\mu \\\lambda \end{bmatrix}}.} The Karush–Kuhn–Tucker theorem then states

6435-695: The exterior product of the columns of the matrix of   K x ∗   , {\displaystyle \ K_{x}^{\ast }\ ,} the stationary condition for   f | N   {\displaystyle \ f|_{N}\ } at   x   {\displaystyle \ x\ } becomes L x ∧ ω x = 0 ∈ Λ p + 1 ( T x ∗ M ) {\displaystyle L_{x}\wedge \omega _{x}=0\in \Lambda ^{p+1}\left(T_{x}^{\ast }M\right)} Once again, in this formulation it

6534-1182: The first possibility (we touch a contour line of f ), notice that since the gradient of a function is perpendicular to the contour lines, the tangents to the contour lines of f and g are parallel if and only if the gradients of f and g are parallel. Thus we want points ( x , y ) where g ( x , y ) = c and ∇ x , y f = λ ∇ x , y g , {\displaystyle \nabla _{x,y}f=\lambda \,\nabla _{x,y}g,} for some λ {\displaystyle \lambda } where ∇ x , y f = ( ∂ f ∂ x , ∂ f ∂ y ) , ∇ x , y g = ( ∂ g ∂ x , ∂ g ∂ y ) {\displaystyle \nabla _{x,y}f=\left({\frac {\partial f}{\partial x}},{\frac {\partial f}{\partial y}}\right),\qquad \nabla _{x,y}g=\left({\frac {\partial g}{\partial x}},{\frac {\partial g}{\partial y}}\right)} are

6633-500: The following four groups of conditions hold: The last condition is sometimes written in the equivalent form: μ i g i ( x ∗ ) = 0 ,  for  i = 1 , … , m . {\displaystyle \mu _{i}g_{i}(x^{*})=0,{\text{ for }}i=1,\ldots ,m.} In the particular case m = 0 {\displaystyle m=0} , i.e., when there are no inequality constraints,

6732-934: The following optimization problem such that, for the matrix of partial derivatives [ D ⁡ g ( x ⋆ ) ] j , k =   ∂ g j   ∂ x k {\displaystyle {\Bigl [}\operatorname {D} g(x_{\star }){\Bigr ]}_{j,k}={\frac {\ \partial g_{j}\ }{\partial x_{k}}}} , rank ⁡ ( D ⁡ g ( x ⋆ ) ) = c ≤ n {\displaystyle \operatorname {rank} (\operatorname {D} g(x_{\star }))=c\leq n} : maximize  f ( x ) subject to:  g ( x ) = 0 {\displaystyle {\begin{aligned}&{\text{maximize }}f(x)\\&{\text{subject to: }}g(x)=0\end{aligned}}} Then there exists

6831-682: The following. Theorem  —  (sufficiency) If ( x ∗ , α ∗ ) {\displaystyle (\mathbf {x} ^{\ast },\mathbf {\alpha } ^{\ast })} is a saddle point of L ( x , α ) {\displaystyle L(\mathbf {x} ,\mathbf {\alpha } )} in x ∈ X {\displaystyle \mathbf {x} \in \mathbf {X} } , μ ≥ 0 {\displaystyle \mathbf {\mu } \geq \mathbf {0} } , then x ∗ {\displaystyle \mathbf {x} ^{\ast }}

6930-459: The force coming from ∂ g i ( x ∗ ) {\displaystyle \partial g_{i}(x^{*})} must be zero i.e., μ i ( x ∗ ) = 0 {\displaystyle \mu _{i}(x^{*})=0} , since the particle is not on the boundary, the one-sided constraint force cannot activate. The necessary conditions can be written with Jacobian matrices of

7029-567: The form of Pontryagin's minimum principle . The fact that solutions of the method of Lagrange multipliers are not necessarily extrema of the Lagrangian, also poses difficulties for numerical optimization. This can be addressed by minimizing the magnitude of the gradient of the Lagrangian, as these minima are the same as the zeros of the magnitude, as illustrated in Example 5: Numerical optimization . The method of Lagrange multipliers can be extended to solve problems with multiple constraints using

7128-539: The function is 0 in every feasible direction. For the case of only one constraint and only two choice variables (as exemplified in Figure 1), consider the optimization problem maximize x , y f ( x , y ) subject to g ( x , y ) = 0. {\displaystyle {\begin{aligned}{\underset {x,y}{\text{maximize}}}\quad &f(x,y)\\{\text{subject to}}\quad &g(x,y)=0.\end{aligned}}} (Sometimes an additive constant

7227-401: The image of   K x ∗ : R p ∗ → T x ∗ M   . {\displaystyle \ K_{x}^{\ast }:\mathbb {R} ^{p\ast }\to T_{x}^{\ast }M~.} Computationally speaking, the condition is that L x {\displaystyle L_{x}} belongs to the row space of

7326-420: The inequality constraints g j {\displaystyle g_{j}} are differentiable convex functions , the equality constraints h i {\displaystyle h_{i}} are affine functions , and Slater's condition holds. Similarly, if the objective function f {\displaystyle f} of a minimization problem is a differentiable convex function ,

7425-681: The kernel   ker ⁡ ( d ⁡ f x )   {\displaystyle \ \ker(\operatorname {d} f_{x})\ } contains   T x N = ker ⁡ ( d ⁡ g x )   . {\displaystyle \ T_{x}N=\ker(\operatorname {d} g_{x})~.} In other words,   d ⁡ f x   {\displaystyle \ \operatorname {d} f_{x}\ } and   d ⁡ g x   {\displaystyle \ \operatorname {d} g_{x}\ } are proportional 1-forms. For this it

7524-853: The left-hand side of the equation belongs to the subvariety of   Λ 2 ( T x ∗ M )   {\displaystyle \ \Lambda ^{2}(T_{x}^{\ast }M)\ } consisting of decomposable elements . In this formulation, it is not necessary to explicitly find the Lagrange multiplier, a number   λ   {\displaystyle \ \lambda \ } such that   d ⁡ f x = λ ⋅ d ⁡ g x   . {\displaystyle \ \operatorname {d} f_{x}=\lambda \cdot \operatorname {d} g_{x}~.} Let   M   {\displaystyle \ M\ } and   f   {\displaystyle \ f\ } be as in

7623-494: The matrix of   K x   , {\displaystyle \ K_{x}\ ,} or equivalently the column space of the matrix of K x ∗ {\displaystyle K_{x}^{\ast }} (the transpose). If   ω x ∈ Λ p ( T x ∗ M )   {\displaystyle \ \omega _{x}\in \Lambda ^{p}(T_{x}^{\ast }M)\ } denotes

7722-600: The maximum. Viewed in this way, it is an exact analogue to testing if the derivative of an unconstrained function is 0 , that is, we are verifying that the directional derivative is 0 in any relevant (viable) direction. We can visualize contours of f given by f ( x , y ) = d for various values of d , and the contour of g given by g ( x , y ) = c . Suppose we walk along the contour line with g = c . We are interested in finding points where f almost does not change as we walk, since these points might be maxima. There are two ways this could happen: To check

7821-587: The method of Lagrange multipliers is generalized by the Karush–Kuhn–Tucker conditions , which can also take into account inequality constraints of the form h ( x ) ≤ c {\displaystyle h(\mathbf {x} )\leq c} for a given constant c {\displaystyle c} . The following is known as the Lagrange multiplier theorem. Let f : R n → R {\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} } be

7920-502: The minimum occurs at   ( − 1 2 , − 1 2 )   . {\displaystyle \ \left(-{\tfrac {1}{\sqrt {2}}},-{\tfrac {1}{\sqrt {2}}}\right)~.} For the method of Lagrange multipliers, the constraint is g ( x , y ) = x 2 + y 2 − 1 = 0   , {\displaystyle g(x,y)=x^{2}+y^{2}-1=0\ ,} hence

8019-552: The multipliers. The Karush–Kuhn–Tucker theorem is sometimes referred to as the saddle-point theorem. The KKT conditions were originally named after Harold W. Kuhn and Albert W. Tucker , who first published the conditions in 1951. Later scholars discovered that the necessary conditions for this problem had been stated by William Karush in his master's thesis in 1939. Consider the following nonlinear optimization problem in standard form : where x ∈ X {\displaystyle \mathbf {x} \in \mathbf {X} }

8118-530: The necessary conditions are also sufficient for optimality. It was shown by Martin in 1985 that the broader class of functions in which KKT conditions guarantees global optimality are the so-called Type 1 invex functions . For smooth, non-linear optimization problems, a second order sufficient condition is given as follows. The solution x ∗ , λ ∗ , μ ∗ {\displaystyle x^{*},\lambda ^{*},\mu ^{*}} found in

8217-523: The necessary conditions are not sufficient for optimality and additional information is required, such as the Second Order Sufficient Conditions (SOSC). For smooth functions, SOSC involve the second derivatives, which explains its name. The necessary conditions are sufficient for optimality if the objective function f {\displaystyle f} of a maximization problem is a differentiable concave function ,

8316-403: The necessary conditions can be written as: One can ask whether a minimizer point x ∗ {\displaystyle x^{*}} of the original, constrained optimization problem (assuming one exists) has to satisfy the above KKT conditions. This is similar to asking under what conditions the minimizer x ∗ {\displaystyle x^{*}} of

8415-432: The objective function, g : R n → R c {\displaystyle g:\mathbb {R} ^{n}\to \mathbb {R} ^{c}} be the constraints function, both belonging to C 1 {\displaystyle C^{1}} (that is, having continuous first derivatives). Let x ⋆ {\displaystyle x_{\star }} be an optimal solution to

8514-395: The optimal profit to a player is calculated subject to a constrained space of actions, where a Lagrange multiplier is the change in the optimal value of the objective function (profit) due to the relaxation of a given constraint (e.g. through a change in income); in such a context   λ ⋆ k   {\displaystyle \ \lambda _{\star k}\ }

8613-839: The original objective function: If we denote values at the optimum with a star ( ⋆ {\displaystyle \star } ), then it can be shown that   d ⁡ f (   x 1 ⋆ ( c 1 , c 2 , … ) ,   x 2 ⋆ ( c 1 , c 2 , … ) ,   …   )   d ⁡ c k = λ ⋆ k   . {\displaystyle {\frac {\ \operatorname {d} f\left(\ x_{1\star }(c_{1},c_{2},\dots ),\ x_{2\star }(c_{1},c_{2},\dots ),\ \dots \ \right)\ }{\operatorname {d} c_{k}}}=\lambda _{\star k}~.} For example, in economics

8712-453: The original problem, as the method of Lagrange multipliers yields only a necessary condition for optimality in constrained problems. Sufficient conditions for a minimum or maximum also exist , but if a particular candidate solution satisfies the sufficient conditions, it is only guaranteed that that solution is the best one locally – that is, it is better than any permissible nearby points. The global optimum can be found by comparing

8811-407: The primal problem and any solution ( μ ∗ , λ ∗ ) {\displaystyle (\mu ^{*},\lambda ^{*})} to the dual problem, the pair x ∗ , ( μ ∗ , λ ∗ ) {\displaystyle x^{*},(\mu ^{*},\lambda ^{*})} must satisfy

8910-513: The proof of the Karush–Kuhn–Tucker theorem makes use of the hyperplane separation theorem . The system of equations and inequalities corresponding to the KKT conditions is usually not solved directly, except in the few special cases where a closed-form solution can be derived analytically. In general, many optimization algorithms can be interpreted as methods for numerically solving the KKT system of equations and inequalities. Suppose that

9009-430: The respective gradients. The constant λ {\displaystyle \lambda } is required because although the two gradient vectors are parallel, the magnitudes of the gradient vectors are generally not equal. This constant is called the Lagrange multiplier. (In some conventions λ {\displaystyle \lambda } is preceded by a minus sign). Notice that this method also solves

9108-454: The restriction   f | N   {\displaystyle \ f|_{N}\ } at   x ∈ N   {\displaystyle \ x\in N\ } means   d ⁡ ( f | N ) x = 0   . {\displaystyle \ \operatorname {d} (f|_{N})_{x}=0~.} Equivalently,

9207-968: The second possibility, that f is level: if f is level, then its gradient is zero, and setting λ = 0 {\displaystyle \lambda =0} is a solution regardless of ∇ x , y g {\displaystyle \nabla _{x,y}g} . To incorporate these conditions into one equation, we introduce an auxiliary function L ( x , y , λ ) ≡ f ( x , y ) + λ ⋅ g ( x , y ) , {\displaystyle {\mathcal {L}}(x,y,\lambda )\equiv f(x,y)+\lambda \cdot g(x,y)\,,} and solve ∇ x , y , λ L ( x , y , λ ) = 0   . {\displaystyle \nabla _{x,y,\lambda }{\mathcal {L}}(x,y,\lambda )=0~.} Note that this amounts to solving three equations in three unknowns. This

9306-426: The set of points satisfying g i ( x ) = 0 , i = 1 , … , M . {\displaystyle g_{i}(\mathbf {x} )=0,i=1,\dots ,M\,.} Every point x {\displaystyle \mathbf {x} } on the contour of a given constraint function g i {\displaystyle g_{i}} has a space of allowable directions:

9405-426: The space of vectors perpendicular to ∇ g i ( x ) . {\displaystyle \nabla g_{i}(\mathbf {x} )\,.} The set of directions that are allowed by all constraints is thus the space of directions perpendicular to all of the constraints' gradients. Denote this space of allowable moves by   A   {\displaystyle \ A\ } and denote

9504-674: The span of the constraints' gradients by S . {\displaystyle S\,.} Then A = S ⊥ , {\displaystyle A=S^{\perp }\,,} the space of vectors perpendicular to every element of S . {\displaystyle S\,.} We are still interested in finding points where f {\displaystyle f} does not change as we walk, since these points might be (constrained) extrema. We therefore seek x {\displaystyle \mathbf {x} } such that any allowable direction of movement away from x {\displaystyle \mathbf {x} }

9603-1502: The tangent map or Jacobian   T M → T R p   {\displaystyle \ TM\to T\mathbb {R} ^{p}~} (   T x R p {\displaystyle \ T_{x}\mathbb {R} ^{p}} can be canonically identified with   R p {\displaystyle \ \mathbb {R} ^{p}} ). The subspace ker ⁡ ( K x ) {\displaystyle \ker(K_{x})} has dimension smaller than that of ker ⁡ ( L x ) {\displaystyle \ker(L_{x})} , namely   dim ⁡ ( ker ⁡ ( L x ) ) = n − 1   {\displaystyle \ \dim(\ker(L_{x}))=n-1\ } and   dim ⁡ ( ker ⁡ ( K x ) ) = n − p   . {\displaystyle \ \dim(\ker(K_{x}))=n-p~.} ker ⁡ ( K x ) {\displaystyle \ker(K_{x})} belongs to   ker ⁡ ( L x )   {\displaystyle \ \ker(L_{x})\ } if and only if L x ∈ T x ∗ M {\displaystyle L_{x}\in T_{x}^{\ast }M} belongs to

9702-413: The values of the original objective function at the points satisfying the necessary and locally sufficient conditions. The method of Lagrange multipliers relies on the intuition that at a maximum, f ( x , y ) cannot be increasing in the direction of any such neighboring point that also has g = 0 . If it were, we could walk along g = 0 to get higher, meaning that the starting point wasn't actually

9801-422: Was given that d R / d Q {\displaystyle {\text{d}}R/{\text{d}}Q} and d C / d Q {\displaystyle {\text{d}}C/{\text{d}}Q} are strictly positive, this inequality along with the non-negativity condition on μ {\displaystyle \mu } guarantees that μ {\displaystyle \mu }

#496503