Generalized Hebbian Algorithm

The Generalized Hebbian Algorithm (GHA), also known in the literature as Sanger's rule, is a linear feedforward neural network model for unsupervised learning with applications primarily in principal components analysis. First defined in 1989cite journal |last=Sanger |first=Terence D. |authorlink=Terence Sanger |coauthors= |year=1989 |month= |title= Optimal unsupervised learning in a single-layer linear feedforward neural network |journal=Neural Networks |volume=2 |issue=6 |pages=459–473 |id= |url=http://ece-classweb.ucsd.edu/winter06/ece173/documents/Sanger%201989%20--%20Optimal%20Unsupervised%20Learning%20in%20a%20Single-layer%20Linear%20FeedforwardNN.pdf |accessdate= 2007-11-24 |quote=|doi= 10.1016/0893-6080(89)90044-0 ] , it is similar to Oja's rule in its formulation and stability, except it can be applied to networks with multiple outputs.

Theory

GHA combines Oja's rule with the Gram-Schmidt process to produce a learning rule of the form

:$Delta w_\left\{ij\right\} = etaleft\left(y_j x_i - y_j sum_\left\{k=1\right\}^j w_\left\{ik\right\} y_k ight\right)$,

where $w_\left\{ij\right\}$ defines the synaptic weight or connection strength between the $i$th input and $j$th output neurons, $x$ and $y$ are the input and output vectors, respectively, and $eta$ is the "learning rate" parameter.

Derivation

In matrix form, Oja's rule can be written

:$frac\left\{d w\left(t\right)\right\}\left\{d t\right\}=w\left(t\right) Q - extrm\left\{diag\right\} \left(w\left(t\right) Q w\left(t\right)^T\right) w\left(t\right)$,

and the Gram-Schmidt algorithm is

:$,Delta w\left(t\right) = - extrm\left\{lower\right\} \left[w\left(t\right) w\left(t\right)^T\right] w\left(t\right)$,

where $w\left(t\right)$ is any matrix, in this case representing synaptic weights, $Q = eta extbf\left\{x\right\} extbf\left\{x\right\}^T$ is the autocorrelation matrix, simply the outer product of inputs, $extrm\left\{diag\right\}$ is the function that diagonalizes a matrix, and $extrm\left\{lower\right\}$ is the function that sets all matrix elements on or above the diagonal equal to 0. We can combine these equations to get our original rule in matrix form,

:$Delta w\left(t\right) = eta\left(t\right) left\left( extbf\left\{y\right\}\left(t\right) extbf\left\{x\right\}\left(t\right)^T - extrm\left\{LT\right\} \left[ extbf\left\{y\right\}\left(t\right) extbf\left\{y\right\}\left(t\right)^T\right] w\left(t\right) ight\right)$,

where the function $extrm\left\{LT\right\}$ sets all matrix elements above the diagonal equal to 0, and note that our output $extbf\left\{y\right\}\left(t\right)= w\left(t\right) extbf\left\{x\right\}\left(t\right)$ is a linear neuron.

tability and PCA

cite book |last=Haykin |first=Simon |authorlink=Simon Haykin |title=Neural Networks: A Comprehensive Foundation |edition=2 |year=1998 |publisher=Prentice Hall |location= |isbn=0132733501 ] cite journal |last=Oja |first=Erkki |authorlink=Erkki Oja |coauthors= |year=1982 |month=November |title=Simplified neuron model as a principal component analyzer |journal=Journal of Mathematical Biology |volume=15 |issue=3 |pages=267–273 |id=BF00275687 |url=http://www.springerlink.com/content/u9u6120r003825u1/ |accessdate= 2007-11-22 |quote= |doi=10.1007/BF00275687 ]

Applications

GHA is used in applications where a self-organizing map is necessary, or where a feature or principal components analysis can be used. Examples of such cases include artificial intelligence and speech and image processing.

Its importance comes from the fact that learning is a single-layer process--that is, a synaptic weight changes only depending on the response of the inputs and outputs of that layer, thus avoiding the multi-layer dependence associated with the backpropagation algorithm. It also has a simple and predictable trade-off between learning speed and accuracy of convergence as set by the learning rate parameter $eta$.

