Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function $\displaystyle f(x)$ :

$\displaystyle f(x)=\|Ax-b\|^2$

The minimum of $f$ is obtained when the gradient is 0:

$\nabla_x f=2 A^\top(Ax-b)=0$ .

Whereas linear conjugate gradient seeks a solution to the linear equation $\displaystyle A^\top Ax=A^\top b$ , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient $\nabla_x f$ alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum.

Given a function $\displaystyle f(x)$ of $N$ variables to minimize, its gradient $\nabla_x f$ indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:

$\Delta x_0=-\nabla_x f (x_0)$

with an adjustable step length $\displaystyle \alpha$ and performs a line search in this direction until it reaches the minimum of $\displaystyle f$ :

$\displaystyle \alpha_0:= \arg \min_\alpha f(x_0+\alpha \Delta x_0)$ ,

$\displaystyle x_1=x_0+\alpha_0 \Delta x_0$

After this first iteration in the steepest direction $\displaystyle \Delta x_0$ , the following steps constitute one iteration of moving along a subsequent conjugate direction $\displaystyle \Lambda x_{n}$ , where $\displaystyle \Lambda x_{0}=\Delta x_{0}$ :

Calculate the steepest direction: $\Delta x_n=-\nabla_x f (x_n)$ ,
Compute $\displaystyle \beta_n$ according to one of the formulas below,
Update the conjugate direction: $\displaystyle \Lambda x_{n}=\Delta x_{n}+\beta_{n} \Lambda x_{n-1}$
Perform a line search: optimize $\displaystyle \alpha_n$ , $\arg \min_{\alpha_n} f(x_n+\alpha_n \Lambda x_n)$ ,
Update the position: $\displaystyle x_{n+1}=x_{n}+\alpha_{n} \Lambda x_{n}$ ,

With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parameters $\displaystyle \alpha$ and $\displaystyle \beta$ are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys where the steepest descent method slows down and follows a criss-cross pattern.

Three of the best known formulas for $\displaystyle \beta_n$ are titled Fletcher-Reeves (FR), Polak-Ribière (PR), and Hestenes-Stiefel (HS) after their developers. They are given by the following formulas:

Fletcher–Reeves:

$\beta_{n}^{FR} = \frac{\Delta x_n^\top \Delta x_n} {\Delta x_{n-1}^\top \Delta x_{n-1}}$

Polak–Ribière:

$\beta_{n}^{PR} = \frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})} {\Delta x_{n-1}^\top \Delta x_{n-1}}$

Hestenes-Stiefel:

$\beta_{n}^{HS} = \frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})} {\Delta x_{n-1}^\top (\Delta x_{n}-\Delta x_{n-1})}$ .

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is $\displaystyle \beta=\max [ 0,\beta^{PR} ]$ which provides a direction reset automatically.

Newton based methods - Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method) - tend to converge in fewer iterations, although each iteration typically requires more computation than a conjugate gradient iteration as Newton-like methods require computing the Hessian (matrix of second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to operate (see also the limited memory L-BFGS method).

External links

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by Jonathan Richard Shewchuk.

Numerical Recipes in C - The Art of Scientific Computing, chapter 10, section 6: Conjugate Gradient Methods in Multidimensions; William H. Press (Editor), Saul A. Teukolsky (Editor), William T. Vetterling (Author), Brian P. Flannery (Author), Cambridge University Press; 2nd edition (1992).

Optimization: Algorithms, methods, and heuristics

Unconstrained nonlinear: Methods calling ...

... functions

Golden section search · Interpolation methods · Line search · Successive parabolic interpolation

... and gradients

Convergence	Trust region · Wolfe conditions

Quasi–Newton	BFGS and L-BFGS · DFP · Symmetric rank-one (SR1)

Other methods	Gauss–Newton · Gradient · Levenberg–Marquardt · Conjugate gradient

... and Hessians

Newton's method

The graph of a strictly concave quadratic function is shown in blue, with its unique maximum shown as a red dot. Below the graph appears the contours of the function: The level sets are nested ellipses.

Constrained nonlinear

General	Barrier methods · Penalty methods

Differentiable	Augmented Lagrangian methods · Sequential quadratic programming · Successive linear programming

Convex minimization

General

Interior point method · Reduced gradient (Frank–Wolfe) · Subgradient method · Cutting-plane method

Linear and
quadratic

Interior point	Ellipsoid method of Khachiyan · Projective algorithm of Karmarkar · Semidefinite programming

Basis-exchange	Simplex algorithm of Dantzig · Criss-cross algorithm · Principal pivoting algorithm of Lemke

Combinatorial

Paradigms

Approximation algorithm · Dynamic programming · Greedy algorithm · Integer programming (Branch & bound or cut)

Graph algorithms

Minimum spanning tree

Bellman–Ford algorithm · Borůvka · Dijkstra · Floyd–Warshall · Johnson · Kruskal

Network flows

Dinic · Edmonds–Karp · Ford–Fulkerson · Push-relabel maximum flow

Metaheuristics

Evolutionary algorithm · Hill climbing · Local search · Simulated annealing · Tabu search

Categories (Algorithms · Methods · Heuristics) · Software

Categories:

Optimization methods
Gradient methods

Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

Conjugate gradient method — A comparison of the convergence of gradient descent with optimal step size (in green) and conjugate vector (in red) for minimizing a quadratic function associated with a given linear system. Conjugate gradient, assuming exact arithmetic,… … Wikipedia
Nonlinear programming — In mathematics, nonlinear programming (NLP) is the process of solving a system of equalities and inequalities, collectively termed constraints, over a set of unknown real variables, along with an objective function to be maximized or minimized,… … Wikipedia
Gradient descent — For the analytical method called steepest descent see Method of steepest descent. Gradient descent is an optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the… … Wikipedia
Iterative method — In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination… … Wikipedia
Nelder–Mead method — Nelder–Mead simplex search over the Rosenbrock banana function (above) and Himmelblau s function (below) See simplex algorithm for Dantzig s algorithm for the problem of linear opti … Wikipedia
Nelder-Mead method — See simplex algorithm for the numerical solution of the linear programming problem. The Nelder Mead method or downhill simplex method or amoeba method is a commonly used nonlinear optimization algorithm. It is due to John Nelder R. Mead (1965)… … Wikipedia
Newton's method in optimization — A comparison of gradient descent (green) and Newton s method (red) for minimizing a function (with small step sizes). Newton s method uses curvature information to take a more direct route. In mathematics, Newton s method is an iterative method… … Wikipedia
Newton's method — In numerical analysis, Newton s method (also known as the Newton–Raphson method), named after Isaac Newton and Joseph Raphson, is a method for finding successively better approximations to the roots (or zeroes) of a real valued function. The… … Wikipedia
Cutting-plane method — In mathematical optimization, the cutting plane method is an umbrella term for optimization methods which iteratively refine a feasible set or objective function by means of linear inequalities, termed cuts. Such procedures are popularly used to… … Wikipedia
List of numerical analysis topics — This is a list of numerical analysis topics, by Wikipedia page. Contents 1 General 2 Error 3 Elementary and special functions 4 Numerical linear algebra … Wikipedia

Academic Dictionaries and Encyclopedias

Nonlinear conjugate gradient method

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Nonlinear conjugate gradient method

External links

Look at other dictionaries:

Share the article and excerpts

Direct link