Levenberg-Marquardt algorithm (with numeric Jacobians)

1. Introduction

This page first describes the Levenberg-Marquardt optimization algorithm, then shows how to use its implementation within the mrpt-base C++ library. All the source code discussed here, the implementation of the algorithm itself and examples, are available for download within the MRPT packages.

The following notation and algorithm have been extracted from the report [1].

2. Algorithm description

The Levenberg-Marquardt (LM) method consists on an iterative least-square minimization of a cost function based on a modification of the Gauss-Newton method. Let’s state the problem formally before defining the algorithm. We will assume that derivatives of the cost functions are not available in closed form, so they will be approximated by finite-difference approximation Finite-difference approximation.

Let \mathbf{x} \in \mathcal{R}^n be the parameter vector to be optimized. We want to find the optimal \mathbf{x}^\star that minimizes the scalar error function F(\cdot):




\mathbf{x}^\star = \arg \min_x F(\mathbf{x})





F(\mathbf{x}) = \frac{1}{2} || \mathbf{f}(\mathbf{x}) ||^2 = \frac{1}{2} \mathbf{f}^T(\mathbf{x}) \mathbf{f}(\mathbf{x})
The function \mathbf{f}: \mathcal{R}^n \to \mathcal{R}^m may sometimes include a comparison to some reference, or observed, data. A very simple, linear example would be \mathbf{f}(\mathbf{x}) = \mathbf{b} - \mathbf{A}\mathbf{x} . However in the following we assume \mathbf{f}(\cdot) can have any form:

f(\mathbf{x}) = \left( f_1(\mathbf{x}) ~~ \ldots ~~ f_m(\mathbf{x}) \right) ^T
We define the Jacobian of the error functions as the $ m \times n $ matrix:


 \mathbf{J}_{ij}(\mathbf{x}) = \frac{\partial f_i}{\partial x_j}, ~~~~ i=1,…,m ~~~~ j=1,…,n

The Hessian of the error function is the $ n \times n $ matrix of second order derivatives ($n$ being the length of the parameter vector), and it’s approximated by:




 \mathbf{H}(x) = \mathbf{J}(x)^T \mathbf{J}(x)




If we don’t have closed form expressions for the derivatives needed for the Jacobian, we can estimate them from finite differences using some increments for each individual variable \Delta x_j:




 \mathbf{J}_{ij}(\mathbf{x}) \simeq \frac{ f_i(\mathbf{x}+\mathbf{\Delta x}) - f_i(\mathbf{x}-\mathbf{\Delta x}) }{2\Delta x_j}
Then, the LM method minimizes the following linear approximation of the actual error function:







 \mathbf{F}(\mathbf{x}+\mathbf{h}) \simeq \mathbf{L}(\mathbf{h}) = \mathbf{F}(\mathbf{x}) + \mathbf{h} ~ \mathbf{g}(x) + \frac{1}{2} \mathbf{h}^T \mathbf{H}(x) \mathbf{h}
where the gradient \mathbf{g}(x) is given by \mathbf{J}(x)^T \mathbf{f}(x) .




Now, denote as \mathbf{x}^\star_{t} for $t=0,1,2,…$ the sequence of iterative approximations to the optimal set of parameters \mathbf{x}^\star. The first intial guess \mathbf{x}^\star_0 must be provided by the user. Then, each iteration of the LM method performs:




 \mathbf{x}^\star_{t+1} = \mathbf{x}^\star_{t} + \mathbf{h}_{lm}
where each step is obtained from:

 \mathbf{h}_{lm} = - ( \mathbf{H}(x) + \lambda \mathbf{I} )^{-1} \mathbf{g}(x)
The damping factor $\lambda$ is adapted dynamically according to a heuristic rule, as shown in the next list of the whole algorithm. Basically, it iterates until it’s reached a maximum number of iterations or the change in the parameter vector is very small.




\begin{array}{l}\mathbf{H} = \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x}) \\\mathbf{g} = \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) \\found = || \mathbf{g} ||_\infty \leq \epsilon_1 \\\lambda = \tau \max(diag(H)) \\v = 2 \\while (\text{not found}) and (k < k_{max}) \\\quad \mathbf{h}_{lm} = - ( \mathbf{H}(x) + \lambda \mathbf{I} )^{-1} \mathbf{g}(x) \\\quad if ( ||\mathbf{h}_{lm}|| < \epsilon_2 ( ||x|| + \epsilon_2) \\\quad\quad found = true \\\quad else \\\quad\quad \mathbf{x}' = \mathbf{x} + \mathbf{h}_{lm} \\\quad\quad l = \frac{ F(\mathbf{x})-F(\mathbf{x}') }{L(0)-L(\mathbf{h}_{lm})} = \frac{ F(\mathbf{x})-F(\mathbf{x}') }{\frac{1}{2} \mathbf{h}_{lm}^T ( \lambda \mathbf{h}_{lm} - \mathbf{g}) }\\\quad\quad if (l>0) \\\quad\quad\quad \mathbf{x} = \mathbf{x}' \\\quad\quad\quad \mathbf{H} = \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x}) \\\quad\quad\quad \mathbf{g} = \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) \\\quad\quad\quad found = || \mathbf{g} ||_\infty \leq \epsilon_1 \\\quad\quad\quad \lambda = \lambda \max \{ \frac{1}{3}, 1- (2l-1)^3 \} \\\quad\quad\quad v = 2 \\\quad\quad else \\\quad\quad\quad \lambda = \lambda v \\\quad\quad\quad v = 2 v \\\quad\quad end \\\quad end \\end\end{array}

3. C++ Implementation

The LM algorithm is implemented in the C++ template class mrpt::math::CLevenbergMarquardtTempl<T> , and there is an example in MRPT/samples/optimize-lm, which is described next.

The type mrpt::math::CLevenbergMarquard is a shortcut for the template instantiation CLevenbergMarquardtTempl<double>.

The image below represents the result for this example. The displayed equation is the error function, \mathbf{f}(\mathbf{x}), in this case one-dimensional:


See the source code of the example here: https://raw.github.com/MRPT/mrpt/master/samples/optimize-lm/test.cpp


[1] K. Madsen, H.B. Nielsen, O. Tingleff. Methods for non-linear least squares problems, Informatics and Mathematical Modelling, Technical University of Denmark, April 2004.