Delta rule

Delta rule

The delta rule is a gradient descent learning rule for updating the weights of the artificial neurons in a single-layer perceptron. It is a special case of the more general backpropagation algorithm. For a neuron j \, with activation function g(x) \, the delta rule for j \,'s i \,th weight w_{ji} \, is given by

\Delta w_{ji}=\alpha(t_j-y_j) g'(h_j) x_i  \,,

where

\alpha \, is a small constant called learning rate
g(x) \, is the neuron's activation function
t_j \, is the target output
h_j \, is the weighted sum of the neuron's inputs
y_j \, is the actual output
x_i \, is the i \,th input.

It holds that h_j=\sum x_i w_{ji} \, and y_j=g(h_j) \,.

The delta rule is commonly stated in simplified form for a perceptron with a linear activation function as

\Delta w_{ji}=\alpha(t_j-y_j) x_i.  \,

Derivation of the delta rule

The delta rule is derived by attempting to minimize the error in the output of the perceptron through gradient descent. The error for a perceptron with j \, outputs can be measured as

E=\sum_{j} \frac{1}{2}(t_j-y_j)^2 \,.

In this case, we wish to move through "weight space" of the neuron (the space of all possible values of all of the neuron's weights) in proportion to the gradient of the error function with respect to each weight. In order to do that, we calculate the partial derivative of the error with respect to each weight. For the i \,th weight, this derivative can be written as

\frac{\partial E}{ \partial w_{ji} } \,.

Because we are only concerning ourselves with the j \,th neuron, we can substitute the error formula above while omitting the summation:

\frac{\partial E}{ \partial w_{ji} } = \frac{ \partial \left ( \frac{1}{2} \left( t_j-y_j \right ) ^2 \right ) }{ \partial w_{ji} } \,

Next we use the chain rule to split this into two derivatives:

= \frac{ \partial \left ( \frac{1}{2} \left( t_j-y_j \right ) ^2 \right ) }{ \partial y_j } \frac{ \partial y_j }{ \partial w_{ji} } \,

To find the left derivative, we simply apply the general power rule:

= - \left ( t_j-y_j \right ) \frac{ \partial y_j }{ \partial w_{ji} } \,

To find the right derivative, we again apply the chain rule, this time differentiating with respect to the total input to j \,, h_j \,:

= - \left ( t_j-y_j \right ) \frac{ \partial y_j }{ \partial h_j } \frac{ \partial h_j }{ \partial w_{ji} } \,

Note that the output of the neuron y_j \, is just the neuron's activation function g() \, applied to the neuron's input h_j \,. We can therefore write the derivative of y_j \, with respect to h_j \, simply as g() \,'s first derivative:

= - \left ( t_j-y_j \right ) g'(h_j) \frac{ \partial h_j }{ \partial w_{ji} } \,

Next we rewrite h_j \, in the last term as the sum over all k \, weights of each weight w_{jk} \, times its corresponding input x_k \,:

= - \left ( t_j-y_j \right ) g'(h_j) \frac{ \partial \left ( \sum_{k} x_k w_{jk} \right ) }{ \partial w_{ji} } \,

Because we are only concerned with the i \,th weight, the only term of the summation that is relevant is x_i w_{ji} \,. Clearly,

\frac{ \partial x_i w_{ji} }{ \partial w_{ji} }=x_i \,,

giving us our final equation for the gradient:

\frac{\partial E}{ \partial w_{ji} } = - \left ( t_j-y_j \right ) g'(h_j) x_i \,

As noted above, gradient descent tells us that our change for each weight should be proportional to the gradient. Choosing a proportionality constant \alpha \, and eliminating the minus sign to enable us to move the weight in the negative direction of the gradient to minimize error, we arrive at our target equation:

\Delta w_{ji}=\alpha(t_j-y_j) g'(h_j) x_i \,.

See also


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Delta (letter) — Greek alphabet Αα Alpha Νν Nu Ββ Beta …   Wikipedia

  • Rule 110 — The Rule 110 cellular automaton (often simply Rule 110) is a one dimensional two state cellular automaton with the following rule table:Interesting propertiesAround 2000, Matthew Cook verified a 1985 conjecture by Stephen Wolfram by proving that… …   Wikipedia

  • Delta, Colorado — City of Delta, Colorado   City   Main Street Nickname(s): City of Murals …   Wikipedia

  • Delta Jackpot Stakes — Grade III race Delta Jackpot Stakes Location Delta Downs Vinton, Louisiana Inaugurated …   Wikipedia

  • Delta L problem — The delta L problem (ΔL problem) is a condition that occurs regarding certain firearms chambers and their practical incompatibility with ammunition made for the corresponding chambering. The ΔL refers to a Commission Internationale Permanente… …   Wikipedia

  • Quotient rule — In calculus, the quotient rule is a method of finding the derivative of a function that is the quotient of two other functions for which derivatives exist. If the function one wishes to differentiate, f(x), can be written as :f(x) =… …   Wikipedia

  • Admissible decision rule — In classical (frequentist) decision theory, an admissible decision rule is a rule for making a decision that is better than any other rule that may compete with it, in a specific sense defined below. Generally speaking, in most decision problems… …   Wikipedia

  • Rifleman's rule — is a rule of thumb that allows a rifleman to accurately fire a rifle that has been calibrated for horizontal targets at uphill or downhill targets. The rule provides an equivalent horizontal range setting for engaging a target at a known uphill… …   Wikipedia

  • McCallum rule — In monetary policy, the McCallum rule specifies a target for the monetary base (M0) which could be used by a central bank. The McCallum rule was proposed by Bennett T. McCallum at Carnegie Mellon University s Tepper School of Business. It is an… …   Wikipedia

  • Sum rule in differentiation — In calculus, the sum rule in differentiation is a method of finding the derivative of a function that is the sum of two other functions for which derivatives exist. This is a part of the linearity of differentiation. The sum rule in integration… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”