Stochastic gradient descent

Stochastic gradient descent is a general optimization algorithm, but is typically used to fit the parameters of a machine learning model.

In standard (or "batch") gradient descent, the true gradient is used to update the parameters of the model. The true gradient is usually the sum of the gradients caused by each individual training example. The parameter vectors are adjusted by the negative of the true gradient multiplied by a step size. Therefore, batch gradient descent requires one sweep through the training set before any parameters can be changed.

In stochastic (or "on-line") gradient descent, the true gradient is approximated by the gradient of the cost function only evaluated on a single training example. The parameters are then adjusted by an amount proportional to this approximate gradient. Therefore, the parameters of the model are updated after each training example. For large data sets, on-line gradient descent can be much faster than batch gradient descent.

There is a compromise between the two forms, which is often called "mini-batches", where the true gradient is approximated by a sum over a small number of training examples.

Stochastic gradient descent is a form of stochastic approximation. The theory of stochastic approximations gives conditions on when stochastic gradient descent converges.

Some of the most popular stochastic gradient descent algorithms are the least mean squares (LMS) adaptive filter and the backpropagation algorithm.

References

* [http://leon.bottou.org/papers/bottou-mlss-2004 "Stochastic Learning"] . Lecture by Léon Bottou for the Machine Learning Summer School 2003 in Tübingen. Also in "Advanced Lectures on Machine Learning" edited by Olivier Bousquet and Ulrike von Luxburg, ISBN 3-540-23122-6, 2004
*"Introduction to Stochastic Search and Optimization" by James C. Spall, ISBN 0-471-33052-3, 2003
*"Pattern Classification" by Richard O. Duda, Peter E. Hart, David G. Stork, ISBN 0-471-05669-3, 2000

Implementation

* [http://leon.bottou.org/projects/sgd sgd] : An LGPL C++ library implementing Stochastic Gradient Descent with application to learning Support Vector Machine and Conditional random field

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

Stochastic optimization — (SO) methods are optimization algorithms which incorporate probabilistic (random) elements, either in the problem data (the objective function, the constraints, etc.), or in the algorithm itself (through random parameter values, random choices,… … Wikipedia
Stochastic approximation — methods are a family of iterative stochastic optimization algorithms that attempt to find zeroes or extrema of functions which cannot be computed directly, but only estimated via noisy observations. The first, and prototypical, algorithms of this … Wikipedia
Stochastic hill climbing — is a variant of the basic hill climbing method. While basic hill climbing always chooses the steepest uphill move, stochastic hill climbing chooses at random from among the uphill moves. The probability of selection may vary with the steepness of … Wikipedia
List of numerical analysis topics — This is a list of numerical analysis topics, by Wikipedia page. Contents 1 General 2 Error 3 Elementary and special functions 4 Numerical linear algebra … Wikipedia
Neighbourhood components analysis — is a supervised learning method for clustering multivariate data into distinct classes according to a given distance metric over the data. Functionally, it serves the same purposes as the K nearest neighbour algorithm, and makes direct use of a… … Wikipedia
List of mathematics articles (S) — NOTOC S S duality S matrix S plane S transform S unit S.O.S. Mathematics SA subgroup Saccheri quadrilateral Sacks spiral Sacred geometry Saddle node bifurcation Saddle point Saddle surface Sadleirian Professor of Pure Mathematics Safe prime Safe… … Wikipedia
Ant colony optimization algorithms — Ant behavior was the inspiration for the metaheuristic optimization technique. In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be… … Wikipedia
Neural network — For other uses, see Neural network (disambiguation). Simplified view of a feedforward artificial neural network The term neural network was traditionally used to refer to a network or circuit of biological neurons.[1] The modern usage of the term … Wikipedia
Backpropagation — Backpropagation, or propagation of error, is a common method of teaching artificial neural networks how to perform a given task. It was first described by Paul Werbos in 1974, but it wasn t until 1986, through the work of David E. Rumelhart,… … Wikipedia
Delta rule — The delta rule is a gradient descent learning rule for updating the weights of the artificial neurons in a single layer perceptron. It is a special case of the more general backpropagation algorithm. For a neuron with activation function the… … Wikipedia

Academic Dictionaries and Encyclopedias

Stochastic gradient descent

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Stochastic gradient descent

Look at other dictionaries:

Share the article and excerpts

Direct link