Gradient descent

Gradient descent

Gradient descent is an optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the "negative" of the gradient (or the approximate gradient) of the function at the current point. If instead one takes steps proportional to the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.

Gradient descent is also known as steepest descent, or the method of steepest descent. When known as the latter, gradient descent should not be confused with the method of steepest descent for approximating integrals.

Description

Gradient descent is based on the observation that if the real-valued function F(mathbf{x}) is defined and differentiable in a neighborhood of a point mathbf{a}, then F(mathbf{x}) decreases "fastest" if one goes from mathbf{a} in the direction of the negative gradient of F at mathbf{a}, - abla F(mathbf{a}). It follows that, if

:mathbf{b}=mathbf{a}-gamma abla F(mathbf{a})

for gamma>0 a small enough number, then F(mathbf{a})geq F(mathbf{b}). With this observation in mind, one starts with a guess mathbf{x}_0 for a local minimum of F, and considers the sequencemathbf{x}_0, mathbf{x}_1, mathbf{x}_2, dots such that

:mathbf{x}_{n+1}=mathbf{x}_n-gamma_n abla F(mathbf{x}_n), n ge 0.

We have

:F(mathbf{x}_0)ge F(mathbf{x}_1)ge F(mathbf{x}_2)ge dots,

so hopefully the sequence (mathbf{x}_n) converges to the desired local minimum. Note that the value of the "step size" gamma is allowed to change at every iteration.

This process is illustrated in the picture to the right.Here F is assumed to be defined on the plane, and that its graph has a bowl shape. The blue curves are the contour lines, that is, the regions on which the value of F is constant. A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is orthogonal to the contour line going through that point. We see that gradient "descent" leads us to the bottom of the bowl, that is, to the point where the value of the function F is minimal.

Examples

Gradient descent has problems with pathological functions such as the Rosenbrock function shown here. The Rosenbrock function has a narrow curved valley which contains the minimum. The bottom of the valley is very flat. Because of the curved flat valley the optimisation is zig-zagging slowly with small stepsizes towards the minimum.:

The gradient ascent method applied to F(x,y)=sinleft(frac{1}{2} x^2 - frac{1}{4} y^2 + 3 ight) cos(2 x+1-e^y):

Comments

Gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones. In the latter case the search space is typically a function space, and one calculates the Gâteaux derivative of the functional to be minimized to determine the descent direction.

Two weaknesses of gradient descent are:
# The algorithm can take many iterations to converge towards a local minimum, if the curvature in different directions is very different.
# Finding the optimal gamma per step can be time-consuming. Conversely, using a fixed gamma can yield poor results. Methods based on Newton's method and inversion of the Hessian using conjugate gradient techniques are often a better alternative.

A more powerful algorithm is given by the BFGS method which consists in calculating on every step a matrix by which the gradient vector is multiplied to go into a "better" direction, combined with a more sophisticated line search algorithm, to find the "best" value of gamma.

Gradient descent is in fact Euler's method for solving ordinary differential equations applied to a gradient flow. As the goal is to find the minimum, not the flow line, the error in finite methods is less significant.

A computational example

The gradient descent algorithm is applied to find a local minimum of the function "f"("x")="x"4-3"x"3+2 , with derivative "f"'("x")=4"x"3-9"x"2. Here is an implementation in the C programming language.


#include
#include
#include

int main (){ // From calculation, we expect that the local minimum occurs at x=9/4 // The algorithm starts at x=6

double xOld = 0; double xNew = 6; double eps = 0.01; // step size double precision = 0.00001; while (fabs(xNew - xOld) > precision) { xOld = xNew; xNew = xNew - eps*(4*xNew*xNew*xNew-9*xNew*xNew); } printf ("Local minimum occurs at %lg ", xNew);

}

With this precision, the algorithm converges to a local minimum of 2.24996 in 70 iterations.

A more robust implementation of the algorithm would also check whether the function value indeed decreases at every iteration and would make the step size smaller otherwise. One can also use an adaptive step size which may make the algorithm converge faster.

ee also

References

* Mordecai Avriel (2003). "Nonlinear Programming: Analysis and Methods." Dover Publishing. ISBN 0-486-43227-0.
* Jan A. Snyman (2005). "Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms." Springer Publishing. ISBN 0-387-24348-8


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Stochastic gradient descent — is a general optimization algorithm, but is typically used to fit the parameters of a machine learning model.In standard (or batch ) gradient descent, the true gradient is used to update the parameters of the model. The true gradient is usually… …   Wikipedia

  • Descent direction — In optimization, a descent direction is a vector that, in the sense below, moves us closer towards a local minimum of our objective function . Suppose we are computing by an iterative method, such as line search. We define a descent direction at… …   Wikipedia

  • Gradient — In vector calculus, the gradient of a scalar field is a vector field which points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change.A generalization of the gradient for… …   Wikipedia

  • Descent — Not to be confused with Dissent. Descent may refer to: In genealogy and inheritance: Common descent, concept in evolutionary biology Kinship and descent, one of the major concepts of cultural anthropology Pedigree chart or family tree Ancestry… …   Wikipedia

  • Gradient — Gra di*ent, n. 1. The rate of regular or graded ascent or descent in a road; grade. [1913 Webster] 2. A part of a road which slopes upward or downward; a portion of a way not level; a grade. [1913 Webster] 3. The rate of increase or decrease of a …   The Collaborative International Dictionary of English

  • gradient maker — Gradient Gra di*ent, n. 1. The rate of regular or graded ascent or descent in a road; grade. [1913 Webster] 2. A part of a road which slopes upward or downward; a portion of a way not level; a grade. [1913 Webster] 3. The rate of increase or… …   The Collaborative International Dictionary of English

  • Gradient post — Gradient Gra di*ent, n. 1. The rate of regular or graded ascent or descent in a road; grade. [1913 Webster] 2. A part of a road which slopes upward or downward; a portion of a way not level; a grade. [1913 Webster] 3. The rate of increase or… …   The Collaborative International Dictionary of English

  • descent — [n1] moving down; lowering cave in, coast, coming down, crash, declension, declination, decline, declivity, dip, downgrade, droop, drop, drop off, fall, falling, grade, gradient, header, hill, inclination, incline, landslide, plummeting, plunge,… …   New thesaurus

  • descent — noun 1) the plane began its descent Syn: dive, drop; fall, pitch, nosedive 2) their descent of the mountain Syn: downward climb 3) a steep descent Syn …   Thesaurus of popular words

  • gradient — /gray dee euhnt/, n. 1. the degree of inclination, or the rate of ascent or descent, in a highway, railroad, etc. 2. an inclined surface; grade; ramp. 3. Physics. a. the rate of change with respect to distance of a variable quantity, as… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”