Levenberg–Marquardt algorithm

In mathematics and computing, the Levenberg–Marquardt algorithm ( LMA or just LM ), also known as the damped least-squares ( DLS ) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting . The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent . The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

#364635

57-620: The algorithm was first published in 1944 by Kenneth Levenberg , while working at the Frankford Army Arsenal . It was rediscovered in 1963 by Donald Marquardt , who worked as a statistician at DuPont , and independently by Girard, Wynne and Morrison. The LMA is used in many software applications for solving generic curve-fitting problems. By using the Gauss–Newton algorithm it often converges faster than first-order methods. However, like other iterative optimization algorithms,

114-527: A greatest element m , then m is a maximal element of the set, also denoted as max ( S ) {\displaystyle \max(S)} . Furthermore, if S is a subset of an ordered set T and m is the greatest element of S with (respect to order induced by T ), then m is a least upper bound of S in T . Similar results hold for least element , minimal element and greatest lower bound . The maximum and minimum function for sets are used in databases , and can be computed rapidly, since

171-413: A set are the greatest and least elements in the set, respectively. Unbounded infinite sets , such as the set of real numbers , have no minimum or maximum. In statistics , the corresponding concept is the sample maximum and minimum . A real-valued function f defined on a domain X has a global (or absolute ) maximum point at x , if f ( x ) ≥ f ( x ) for all x in X . Similarly,

228-401: A bounded differentiable function f defined on a closed interval in the real line has a single critical point, which is a local minimum, then it is also a global minimum (use the intermediate value theorem and Rolle's theorem to prove this by contradiction ). In two and more dimensions, this argument fails. This is illustrated by the function whose only critical point is at (0,0), which is

285-426: A domain must occur at critical points (or points where the derivative equals zero). However, not all critical points are extrema. One can often distinguish whether a critical point is a local maximum, a local minimum, or neither by using the first derivative test , second derivative test , or higher-order derivative test , given sufficient differentiability. For any function that is defined piecewise , one finds

342-483: A factor of 2 and a decrease by a factor of 3 has been shown to be effective in most cases, while for large problems more extreme values can work better, with an increase by a factor of 1.5 and a decrease by a factor of 5. When interpreting the Levenberg–Marquardt step as the velocity v k {\displaystyle {\boldsymbol {v}}_{k}} along a geodesic path in the parameter space, it

399-418: A local minimum with f (0,0) = 0. However, it cannot be a global one, because f (2,3) = −5. If the domain of a function for which an extremum is to be found consists itself of functions (i.e. if an extremum is to be found of a functional ), then the extremum is found using the calculus of variations . Maxima and minima can also be defined for sets. In general, if an ordered set S has

456-443: A maximum (or minimum) by finding the maximum (or minimum) of each piece separately, and then seeing which one is greatest (or least). For a practical example, assume a situation where someone has 200 {\displaystyle 200} feet of fencing and is trying to maximize the square footage of a rectangular enclosure, where x {\displaystyle x} is the length, y {\displaystyle y}

513-428: A minimum point. An important example is a function whose domain is a closed and bounded interval of real numbers (see the graph above). Finding global maxima and minima is the goal of mathematical optimization . If a function is continuous on a closed interval, then by the extreme value theorem , global maxima and minima exist. Furthermore, a global maximum (or minimum) either must be a local maximum (or minimum) in

570-460: A modified problem with each component of the gradient scaled according to the curvature. This provides larger movement along the directions where the gradient is smaller, which avoids slow convergence in the direction of small gradient. Fletcher in his 1971 paper A modified Marquardt subroutine for non-linear least squares simplified the form, replacing the identity matrix ⁠ I {\displaystyle \mathbf {I} } ⁠ with

627-432: A set can have at most one minimal element and at most one maximal element. Then, due to mutual comparability, the minimal element will also be the least element, and the maximal element will also be the greatest element. Thus in a totally ordered set, we can simply use the terms minimum and maximum . If a chain is finite, then it will always have a maximum and a minimum. If a chain is infinite, then it need not have

SECTION 10

#1732793061365

684-415: A worse residual, but using ⁠ λ {\displaystyle \lambda } ⁠ resulted in a better residual, then ⁠ λ {\displaystyle \lambda } ⁠ is left unchanged and the new optimum is taken as the value obtained with ⁠ λ {\displaystyle \lambda } ⁠ as damping factor. An effective strategy for

741-399: A zero gradient with respect to ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ . The above first-order approximation of f ( x i , β + δ ) {\displaystyle f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)} gives or in vector notation, Taking

798-416: Is a strict local maximum point if there exists some ε > 0 such that, for all x in X within distance ε of x with x ≠ x , we have f ( x ) > f ( x ) . Note that a point is a strict global maximum point if and only if it is the unique global maximum point, and similarly for minimum points. A continuous real-valued function with a compact domain always has a maximum point and

855-465: Is a topological space , since the definition just given can be rephrased in terms of neighbourhoods. Mathematically, the given definition is written as follows: The definition of local minimum point can also proceed similarly. In both the global and local cases, the concept of a strict extremum can be defined. For example, x is a strict global maximum point if for all x in X with x ≠ x , we have f ( x ) > f ( x ) , and x

912-453: Is adjusted at each iteration. If reduction of ⁠ S {\displaystyle S} ⁠ is rapid, a smaller value can be used, bringing the algorithm closer to the Gauss–Newton algorithm , whereas if an iteration gives insufficient reduction in the residual, ⁠ λ {\displaystyle \lambda } ⁠ can be increased, giving a step closer to

969-633: Is an example of very sensitive initial conditions for the Levenberg–Marquardt algorithm. One reason for this sensitivity is the existence of multiple minima — the function cos ⁡ ( β x ) {\displaystyle \cos \left(\beta x\right)} has minima at parameter value β ^ {\displaystyle {\hat {\beta }}} and β ^ + 2 n π {\displaystyle {\hat {\beta }}+2n\pi } . Kenneth Levenberg Kenneth Levenberg (August 6, 1919 – September 1973)

1026-494: Is especially useful when the algorithm is moving through narrow canyons in the landscape of the objective function, where the allowed steps are smaller and the higher accuracy due to the second order term gives significant improvements. In this example we try to fit the function y = a cos ⁡ ( b X ) + b sin ⁡ ( a X ) {\displaystyle y=a\cos \left(bX\right)+b\sin \left(aX\right)} using

1083-684: Is minimized: Like other numeric minimization algorithms, the Levenberg–Marquardt algorithm is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ . In cases with only one minimum, an uninformed standard guess like β T = ( 1 , 1 , … , 1 ) {\displaystyle {\boldsymbol {\beta }}^{\text{T}}={\begin{pmatrix}1,\ 1,\ \dots ,\ 1\end{pmatrix}}} will work fine; in cases with multiple minima ,

1140-446: Is not necessary, as the update is well-approximated by the small gradient step λ − 1 J T [ y − f ( β ) ] {\displaystyle \lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]} . To make the solution scale invariant Marquardt's algorithm solved

1197-751: Is possible to improve the method by adding a second order term that accounts for the acceleration a k {\displaystyle {\boldsymbol {a}}_{k}} along the geodesic where a k {\displaystyle {\boldsymbol {a}}_{k}} is the solution of Since this geodesic acceleration term depends only on the directional derivative f v v = ∑ μ ν v μ v ν ∂ μ ∂ ν f ( x ) {\displaystyle f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})} along

SECTION 20

#1732793061365

1254-604: Is restricted. Since width is positive, then x > 0 {\displaystyle x>0} , and since x = 100 − y {\displaystyle x=100-y} , that implies that x < 100 {\displaystyle x<100} . Plug in critical point 50 {\displaystyle 50} , as well as endpoints 0 {\displaystyle 0} and 100 {\displaystyle 100} , into x y = x ( 100 − x ) {\displaystyle xy=x(100-x)} , and

1311-902: Is the Jacobian matrix , whose ⁠ i {\displaystyle i} ⁠ -th row equals J i {\displaystyle \mathbf {J} _{i}} , and where f ( β ) {\displaystyle \mathbf {f} \left({\boldsymbol {\beta }}\right)} and y {\displaystyle \mathbf {y} } are vectors with ⁠ i {\displaystyle i} ⁠ -th component f ( x i , β ) {\displaystyle f\left(x_{i},{\boldsymbol {\beta }}\right)} and y i {\displaystyle y_{i}} respectively. The above expression obtained for ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ comes under

1368-444: Is the width, and x y {\displaystyle xy} is the area: The derivative with respect to x {\displaystyle x} is: Setting this equal to 0 {\displaystyle 0} reveals that x = 50 {\displaystyle x=50} is our only critical point . Now retrieve the endpoints by determining the interval to which x {\displaystyle x}

1425-521: Is to replace this equation by a "damped version": where ⁠ I {\displaystyle \mathbf {I} } ⁠ is the identity matrix, giving as the increment ⁠ δ {\displaystyle {\boldsymbol {\delta }}} ⁠ to the estimated parameter vector ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ . The (non-negative) damping factor ⁠ λ {\displaystyle \lambda } ⁠

1482-412: The (enlargeable) figure on the right, the necessary conditions for a local maximum are similar to those of a function with only one variable. The first partial derivatives as to z (the variable to be maximized) are zero at the maximum (the glowing dot on top in the figure). The second partial derivatives are negative. These are only necessary, not sufficient, conditions for a local maximum, because of

1539-563: The Gauss–Newton method. The Jacobian matrix as defined above is not (in general) a square matrix, but a rectangular matrix of size m × n {\displaystyle m\times n} , where n {\displaystyle n} is the number of parameters (size of the vector β {\displaystyle {\boldsymbol {\beta }}} ). The matrix multiplication ( J T J ) {\displaystyle \left(\mathbf {J} ^{\mathrm {T} }\mathbf {J} \right)} yields

1596-488: The LMA finds only a local minimum , which is not necessarily the global minimum . The primary application of the Levenberg–Marquardt algorithm is in the least-squares curve fitting problem: given a set of m {\displaystyle m} empirical pairs ( x i , y i ) {\displaystyle \left(x_{i},y_{i}\right)} of independent and dependent variables, find

1653-486: The Levenberg–Marquardt algorithm implemented in GNU Octave as the leasqr function. The graphs show progressively better fitting for the parameters a = 100 {\displaystyle a=100} , b = 102 {\displaystyle b=102} used in the initial curve. Only when the parameters in the last graph are chosen closest to the original, are the curves fitting exactly. This equation

1710-486: The acceleration may point in opposite direction to the velocity, to prevent it to stall the method in case the damping is too small, an additional criterion on the acceleration is added in order to accept a step, requiring that where α {\displaystyle \alpha } is usually fixed to a value lesser than 1, with smaller values for harder problems. The addition of a geodesic acceleration term can allow significant increase in convergence speed and it

1767-544: The algorithm converges to the global minimum only if the initial guess is already somewhat close to the final solution. In each iteration step, the parameter vector ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ is replaced by a new estimate ⁠ β + δ {\displaystyle {\boldsymbol {\beta }}+{\boldsymbol {\delta }}} ⁠ . To determine ⁠ δ {\displaystyle {\boldsymbol {\delta }}} ⁠ ,

Levenberg–Marquardt algorithm - Misplaced Pages Continue

1824-420: The algorithm, therefore requiring only one additional function evaluation to compute f ( x + h δ ) {\displaystyle f({\boldsymbol {x}}+h{\boldsymbol {\delta }})} . The choice of the finite difference step h {\displaystyle h} can affect the stability of the algorithm, and a value of around 0.1 is usually reasonable in general. Since

1881-408: The best choice for the damping parameter ⁠ λ {\displaystyle \lambda } ⁠ . Theoretical arguments exist showing why some of these choices guarantee local convergence of the algorithm; however, these choices can make the global convergence of the algorithm suffer from the undesirable properties of steepest descent , in particular, very slow convergence close to

1938-422: The control of the damping parameter, called delayed gratification , consists of increasing the parameter by a small amount for each uphill step, and decreasing by a large amount for each downhill step. The idea behind this strategy is to avoid moving downhill too fast in the beginning of optimization, therefore restricting the steps available in future iterations and therefore slowing down convergence. An increase by

1995-516: The damping factor ⁠ λ / ν {\displaystyle \lambda /\nu } ⁠ results in a reduction in squared residual, then this is taken as the new value of ⁠ λ {\displaystyle \lambda } ⁠ (and the new optimum location is taken as that obtained with this damping factor) and the process continues; if using ⁠ λ / ν {\displaystyle \lambda /\nu } ⁠ resulted in

2052-411: The derivative of this approximation of S ( β + δ ) {\displaystyle S\left({\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)} with respect to ⁠ δ {\displaystyle {\boldsymbol {\delta }}} ⁠ and setting the result to zero gives where J {\displaystyle \mathbf {J} }

2109-499: The diagonal matrix consisting of the diagonal elements of ⁠ J T J {\displaystyle \mathbf {J} ^{\text{T}}\mathbf {J} } ⁠ : A similar damping factor appears in Tikhonov regularization , which is used to solve linear ill-posed problems , as well as in ridge regression , an estimation technique in statistics . Various more or less heuristic arguments have been put forward for

2166-565: The direction of the velocity v {\displaystyle {\boldsymbol {v}}} , it does not require computing the full second order derivative matrix, requiring only a small overhead in terms of computing cost. Since the second order derivative can be a fairly complex expression, it can be convenient to replace it with a finite difference approximation where f ( x ) {\displaystyle f({\boldsymbol {x}})} and J {\displaystyle {\boldsymbol {J}}} have already been computed by

2223-408: The domain X is a metric space , then f is said to have a local (or relative ) maximum point at the point x , if there exists some ε > 0 such that f ( x ) ≥ f ( x ) for all x in X within distance ε of x . Similarly, the function has a local minimum point at x , if f ( x ) ≤ f ( x ) for all x in X within distance ε of x . A similar definition can be used when X

2280-637: The function f ( x i , β + δ ) {\displaystyle f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)} is approximated by its linearization : where is the gradient (row-vector in this case) of ⁠ f {\displaystyle f} ⁠ with respect to ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ . The sum S ( β ) {\displaystyle S\left({\boldsymbol {\beta }}\right)} of square deviations has its minimum at

2337-634: The function has a global (or absolute ) minimum point at x , if f ( x ) ≤ f ( x ) for all x in X . The value of the function at a maximum point is called the maximum value of the function, denoted max ( f ( x ) ) {\displaystyle \max(f(x))} , and the value of the function at a minimum point is called the minimum value of the function, (denoted min ( f ( x ) ) {\displaystyle \min(f(x))} for clarity). Symbolically, this can be written as follows: The definition of global minimum point also proceeds similarly. If

Levenberg–Marquardt algorithm - Misplaced Pages Continue

2394-420: The function. Known generically as extremum , they may be defined either within a given range (the local or relative extrema) or on the entire domain (the global or absolute extrema) of a function. Pierre de Fermat was one of the first mathematicians to propose a general technique, adequality , for finding the maxima and minima of functions. As defined in set theory , the maximum and minimum of

2451-678: The gradient-descent direction. Note that the gradient of ⁠ S {\displaystyle S} ⁠ with respect to ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ equals − 2 ( J T [ y − f ( β ) ] ) T {\displaystyle -2\left(\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]\right)^{\mathrm {T} }} . Therefore, for large values of ⁠ λ {\displaystyle \lambda } ⁠ ,

2508-424: The initial point, then the damping is increased by successive multiplication by ⁠ ν {\displaystyle \nu } ⁠ until a better point is found with a new damping factor of ⁠ λ 0 ν k {\displaystyle \lambda _{0}\nu ^{k}} ⁠ for some ⁠ k {\displaystyle k} ⁠ . If use of

2565-400: The interior of the domain, or must lie on the boundary of the domain. So a method of finding a global maximum (or minimum) is to look at all the local maxima (or minima) in the interior, and also look at the maxima (or minima) of the points on the boundary, and take the greatest (or least) one.Minima For differentiable functions , Fermat's theorem states that local extrema in the interior of

2622-587: The last parameter vector ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ is considered to be the solution. When the damping factor ⁠ λ {\displaystyle \lambda } ⁠ is large relative to ‖ J T J ‖ {\displaystyle \|\mathbf {J} ^{\mathrm {T} }\mathbf {J} \|} , inverting J T J + λ I {\displaystyle \mathbf {J} ^{\mathrm {T} }\mathbf {J} +\lambda \mathbf {I} }

2679-581: The mathematics department at the University of Hawaiʻi at Hilo and died in Hawaii in 1973. Levenberg was listed in American Men of Science in 1970. This article about a statistician is a stub . You can help Misplaced Pages by expanding it . Local minimum In mathematical analysis , the maximum and minimum of a function are, respectively, the greatest and least value taken by

2736-410: The maximum (or minimum) of a set can be computed from the maxima of a partition; formally, they are self- decomposable aggregation functions . In the case of a general partial order , the least element (i.e., one that is less than all others) should not be confused with a minimal element (nothing is lesser). Likewise, a greatest element of a partially ordered set (poset) is an upper bound of

2793-479: The optimum. The absolute values of any choice depend on how well-scaled the initial problem is. Marquardt recommended starting with a value ⁠ λ 0 {\displaystyle \lambda _{0}} ⁠ and a factor ⁠ ν > 1 {\displaystyle \nu >1} ⁠ . Initially setting λ = λ 0 {\displaystyle \lambda =\lambda _{0}} and computing

2850-409: The parameters ⁠ β {\displaystyle {\boldsymbol {\beta }}} ⁠ of the model curve f ( x , β ) {\displaystyle f\left(x,{\boldsymbol {\beta }}\right)} so that the sum of the squares of the deviations S ( β ) {\displaystyle S\left({\boldsymbol {\beta }}\right)}

2907-433: The possibility of a saddle point . For use of these conditions to solve for a maximum, the function z must also be differentiable throughout. The second partial derivative test can help classify the point as a relative maximum or relative minimum. In contrast, there are substantial differences between functions of one variable and functions of more than one variable in the identification of global extrema. For example, if

SECTION 50

#1732793061365

2964-457: The required n × n {\displaystyle n\times n} square matrix and the matrix-vector product on the right hand side yields a vector of size n {\displaystyle n} . The result is a set of n {\displaystyle n} linear equations, which can be solved for ⁠ δ {\displaystyle {\boldsymbol {\delta }}} ⁠ . Levenberg's contribution

3021-483: The residual sum of squares S ( β ) {\displaystyle S\left({\boldsymbol {\beta }}\right)} after one step from the starting point with the damping factor of λ = λ 0 {\displaystyle \lambda =\lambda _{0}} and secondly with ⁠ λ 0 / ν {\displaystyle \lambda _{0}/\nu } ⁠ . If both of these are worse than

3078-462: The results are 2500 , 0 , {\displaystyle 2500,0,} and 0 {\displaystyle 0} respectively. Therefore, the greatest area attainable with a rectangle of 200 {\displaystyle 200} feet of fencing is 50 × 50 = 2500 {\displaystyle 50\times 50=2500} . For functions of more than one variable, similar conditions apply. For example, in

3135-477: The set which is contained within the set, whereas a maximal element m of a poset A is an element of A such that if m ≤ b (for any b in A ), then m = b . Any least element or greatest element of a poset is unique, but a poset can have several minimal or maximal elements. If a poset has more than one maximal element, then these elements will not be mutually comparable. In a totally ordered set, or chain , all elements are mutually comparable, so such

3192-479: The step will be taken approximately in the direction opposite to the gradient. If either the length of the calculated step ⁠ δ {\displaystyle {\boldsymbol {\delta }}} ⁠ or the reduction of sum of squares from the latest parameter vector ⁠ β + δ {\displaystyle {\boldsymbol {\beta }}+{\boldsymbol {\delta }}} ⁠ fall below predefined limits, iteration stops, and

3249-594: Was an American statistician and original author of the widely used nonlinear least squares fitting algorithm later improved by Donald Marquardt , known as the Levenberg–Marquardt algorithm . Levenberg first published the algorithm in 1944 while working at the Frankford Arsenal . He later worked for Boeing where he developed mathematical models used to design the Boeing 737 . He ended his career in

#364635