Nonlinear conjugate gradient method

Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function \displaystyle f(x):

\displaystyle f(x)=\|Ax-b\|^2

The minimum of f is obtained when the gradient is 0:

\nabla_x f=2 A^\top(Ax-b)=0.

Whereas linear conjugate gradient seeks a solution to the linear equation \displaystyle A^\top Ax=A^\top b, the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient \nabla_x f alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum.

Given a function \displaystyle f(x) of N variables to minimize, its gradient \nabla_x f indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:

\Delta x_0=-\nabla_x f (x_0)

with an adjustable step length \displaystyle \alpha and performs a line search in this direction until it reaches the minimum of \displaystyle f:

\displaystyle \alpha_0:= \arg \min_\alpha f(x_0+\alpha \Delta x_0),
\displaystyle x_1=x_0+\alpha_0 \Delta x_0

After this first iteration in the steepest direction \displaystyle \Delta x_0, the following steps constitute one iteration of moving along a subsequent conjugate direction \displaystyle \Lambda x_{n}, where \displaystyle \Lambda x_{0}=\Delta x_{0}:

  1. Calculate the steepest direction: \Delta x_n=-\nabla_x f (x_n) ,
  2. Compute \displaystyle \beta_n according to one of the formulas below,
  3. Update the conjugate direction: \displaystyle \Lambda x_{n}=\Delta x_{n}+\beta_{n} \Lambda x_{n-1}
  4. Perform a line search: optimize \displaystyle \alpha_n, \arg \min_{\alpha_n} f(x_n+\alpha_n \Lambda x_n),
  5. Update the position: \displaystyle x_{n+1}=x_{n}+\alpha_{n} \Lambda x_{n},

With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parameters \displaystyle \alpha and \displaystyle \beta are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys where the steepest descent method slows down and follows a criss-cross pattern.

Three of the best known formulas for \displaystyle \beta_n are titled Fletcher-Reeves (FR), Polak-Ribière (PR), and Hestenes-Stiefel (HS) after their developers. They are given by the following formulas:

  • Fletcher–Reeves:
\beta_{n}^{FR} = \frac{\Delta x_n^\top \Delta x_n}
{\Delta x_{n-1}^\top \Delta x_{n-1}}
  • Polak–Ribière:
\beta_{n}^{PR} = \frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})}
{\Delta x_{n-1}^\top \Delta x_{n-1}}
  • Hestenes-Stiefel:
\beta_{n}^{HS} = \frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})}
{\Delta x_{n-1}^\top (\Delta x_{n}-\Delta x_{n-1})}
.

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is \displaystyle \beta=\max [ 0,\beta^{PR} ] which provides a direction reset automatically.


Newton based methods - Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method) - tend to converge in fewer iterations, although each iteration typically requires more computation than a conjugate gradient iteration as Newton-like methods require computing the Hessian (matrix of second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to operate (see also the limited memory L-BFGS method).


External links

  • Numerical Recipes in C - The Art of Scientific Computing, chapter 10, section 6: Conjugate Gradient Methods in Multidimensions; William H. Press (Editor), Saul A. Teukolsky (Editor), William T. Vetterling (Author), Brian P. Flannery (Author), Cambridge University Press; 2nd edition (1992).

Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Conjugate gradient method — A comparison of the convergence of gradient descent with optimal step size (in green) and conjugate vector (in red) for minimizing a quadratic function associated with a given linear system. Conjugate gradient, assuming exact arithmetic,… …   Wikipedia

  • Nonlinear programming — In mathematics, nonlinear programming (NLP) is the process of solving a system of equalities and inequalities, collectively termed constraints, over a set of unknown real variables, along with an objective function to be maximized or minimized,… …   Wikipedia

  • Gradient descent — For the analytical method called steepest descent see Method of steepest descent. Gradient descent is an optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the… …   Wikipedia

  • Iterative method — In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination… …   Wikipedia

  • Nelder–Mead method — Nelder–Mead simplex search over the Rosenbrock banana function (above) and Himmelblau s function (below) See simplex algorithm for Dantzig s algorithm for the problem of linear opti …   Wikipedia

  • Nelder-Mead method — See simplex algorithm for the numerical solution of the linear programming problem. The Nelder Mead method or downhill simplex method or amoeba method is a commonly used nonlinear optimization algorithm. It is due to John Nelder R. Mead (1965)… …   Wikipedia

  • Newton's method in optimization — A comparison of gradient descent (green) and Newton s method (red) for minimizing a function (with small step sizes). Newton s method uses curvature information to take a more direct route. In mathematics, Newton s method is an iterative method… …   Wikipedia

  • Newton's method — In numerical analysis, Newton s method (also known as the Newton–Raphson method), named after Isaac Newton and Joseph Raphson, is a method for finding successively better approximations to the roots (or zeroes) of a real valued function. The… …   Wikipedia

  • Cutting-plane method — In mathematical optimization, the cutting plane method is an umbrella term for optimization methods which iteratively refine a feasible set or objective function by means of linear inequalities, termed cuts. Such procedures are popularly used to… …   Wikipedia

  • List of numerical analysis topics — This is a list of numerical analysis topics, by Wikipedia page. Contents 1 General 2 Error 3 Elementary and special functions 4 Numerical linear algebra …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”