Oja's rule

Oja's rule

Oja's learning rule, or simply Oja's rule, named after a Finnish computer scientist Erkki Oja, is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule (see Hebbian learning) that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons.

Contents

Theory

Oja's rule requires a number of simplifications to derive, but in its final form it is demonstrably stable, unlike Hebb's rule. It is a single-neuron special case of the Generalized Hebbian Algorithm. However, Oja's rule can also be generalized in other ways to varying degrees of stability and success.

Formula

Oja's rule defines the change in presynaptic weights w given the output response y of a neuron to its inputs x to be

\,\Delta \mathbf{w} ~ = ~ \mathbf{w}_{n+1}-\mathbf{w}_{n} ~ = ~ \eta \, y_{n} (\mathbf{x}_{n} - y_{n}\mathbf{w}_{n}),

where η is the learning rate which can also change with time. Note that the bold symbols are vectors and n defines a discrete time iteration. The rule can also be made for continuous iterations as

\,\frac{d\mathbf{w}}{d t} ~ = ~ \eta \, y(t) (\mathbf{x}(t) - y(t)\mathbf{w}(t)).

Derivation

The simplest learning rule known is Hebb's rule, which states in conceptual terms that neurons that fire together, wire together. In component form as a difference equation, it is written

\,\Delta\mathbf{w} ~ = ~ \eta\, y(\mathbf{x}_n) \mathbf{x}_{n},

or with implicit n-dependence,

\,w_{i}(n+1) ~ = ~ w_{i} + \eta\, y(\mathbf{x}) x_{i},

where y(xn) is again the output, this time explicitly dependent on its input vector x.

Hebb's rule has synaptic weights approaching infinity with a positive learning rate. We can stop this by normalizing the weights so that each weight's magnitude is restricted between 0, corresponding to no weight, and 1, corresponding to being the only input neuron with any weight. Mathematically, this takes the form

\,w_i (n+1) ~ = ~ \frac{w_i + \eta\, y(\mathbf{x}) x_i}{\left(\sum_{j=1}^m [w_j + \eta\, y(\mathbf{x}) x_j]^p \right)^{1/p}}.

Note that in Oja's original paper[1], p=2, corresponding to quadrature (root sum of squares), which is the familiar Cartesian normalization rule. However, any type of normalization, even linear, will give the same result without loss of generality.

Our next step is to expand this into a Taylor series for a small learning rate | \eta | \ll 1, giving

\,w_i (n+1) ~ = ~ \frac{w_i}{\left( \sum_j w_j^p \right)^{1/p}} ~ + ~ \eta \left( \frac{y x_i}{\left(\sum_j w_j^p \right)^{1/p}} - \frac{w_i \sum_j y x_j w_j}{\left(\sum_j w_j^p \right)^{(1 + 1/p)}} \right) ~ + ~ O(\eta^2).

For small η, our higher-order terms O(η2) go to zero. We again make the specification of a linear neuron, that is, the output of the neuron is equal to the sum of the product of each input and its synaptic weight, or

\,y(\mathbf{x}) ~ = ~ \sum_{j=1}^m x_j w_j .

We also specify that our weights normalize to 1, which will be a necessary condition for stability, so

\,| \mathbf{w} | ~ = ~ \left( \sum_{j=1}^m w_j^p \right)^{1/p} ~ = ~ 1,

which, when substituted into our expansion, gives Oja's rule, or

\,w_i (n+1) ~ = ~ w_i + \eta\, y(x_i - w_i y).

Stability and PCA

In analyzing the convergence of a single neuron evolving by Oja's rule, one extracts the first principal component, or feature, of a data set. Furthermore, with extensions using the Generalized Hebbian Algorithm, one can create a multi-Oja neural network that can extract as many features as desired, allowing for principal components analysis.

A principal component aj is extracted from a dataset x through some associated vector qj, or aj = qjx, and we can restore our original dataset by taking

\mathbf{x} ~ = ~ \sum_j a_j \mathbf{q}_j.

In the case of a single neuron trained by Oja's rule, we find the weight vector converges to q1, or the first principal component, as time or number of iterations approaches infinity. We can also define, given a set of input vectors Xi, that its correlation matrix Rij = XiXj has an associated eigenvector given by qj with eigenvalue λj. The variance of outputs of our Oja neuron σ2(n) = ⟨y2(n)⟩ then converges with time iterations to the principal eigenvalue, or

\lim_{n\rightarrow\infty} \sigma^2(n) ~ = ~ \lambda_1.

These results are derived using Lyapunov function analysis, and they show that Oja's neuron necessarily converges on strictly the first principal component if certain conditions are met in our original learning rule. Most importantly, our learning rate η is allowed to vary with time, but only such that its sum is divergent but its power sum is convergent, that is

\sum_{n=1}^\infty \eta(n) = \infty, ~~~ \sum_{n=1}^\infty \eta(n)^p < \infty, ~~~ p > 1.

Our output activation function y(x(n)) is also allowed to be nonlinear and nonstatic, but it must be continuously differentiable in both x and w and have derivatives bounded in time.[2]

Applications

Oja's rule was originally described in Oja's 1982 paper[1], but the principle of self-organization to which it is applied is first attributed to Alan Turing in 1952[2]. PCA has also had a long history of use before Oja's rule formalized its use in network computation in 1989. The model can thus be applied to any problem of self-organizing mapping, in particular those in which feature extraction is of primary interest. Therefore, Oja's rule has an important place in image and speech processing. It is also useful as it expands easily to higher dimensions of processing, thus being able to integrate multiple outputs quickly. A canonical example is its use in binocular vision[3].

Biology and Oja's subspace rule

There is clear evidence for both long-term potentiation and long-term depression in biological neural networks, along with a normalization effect in both input weights and neuron outputs. However, while as of yet there is no direct experimental evidence of Oja's rule active in a biological neural network, a biophysical derivation of a generalization of the rule is possible. Such a derivation requires retrograde signalling from the postsynaptic neuron, which is biologically plausible (see neural backpropagation), and takes the form of

\Delta w_{ij} ~ \propto ~ \langle x_i y_j \rangle - \epsilon \left\langle \left(c_\mathrm{pre} * \sum_k w_{ik} y_k \right) \cdot \left(c_\mathrm{post} * y_j \right) \right\rangle,

where as before wij is the synaptic weight between the ith input and jth output neurons, x is the input, y is the postsynaptic output, and we define ε to be a constant analogous the learning rate, and cpre and cpost are presynaptic and postsynaptic functions that model the weakening of signals over time. Note that the angle brackets denote the average and the operator is a convolution. By taking the pre- and post-synaptic functions into frequency space and combining integration terms with the convolution, we find that this gives an arbitrary-dimensional generalization of Oja's rule known as Oja's Subspace[4], namely

\Delta w ~ = ~ C x\cdot w - w\cdot C y.[5]


See also

References

  1. ^ a b Oja, Erkki (November 1982). "Simplified neuron model as a principal component analyzer". Journal of Mathematical Biology 15 (3): 267–273. doi:10.1007/BF00275687. PMID 7153672. BF00275687. http://www.springerlink.com/content/u9u6120r003825u1/. Retrieved 2007-11-22. 
  2. ^ a b Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation (2 ed.). Prentice Hall. ISBN 0132733501. 
  3. ^ Intrator, Nathan (2007). "Unsupervised Learning". Neural Computation lectures. Tel-Aviv University. http://www.cs.tau.ac.il/~nin/Courses/NC06/Hebb_PCA.ppt. Retrieved 2007-11-22. 
  4. ^ Oja, Erkki (1989). "Neural Networks, Principal Components, and Subspaces". International Journal of Neural Systems (IJNS) 1 (1): 61–68. doi:10.1142/S0129065789000475. http://www.worldscinetarchives.com/cgi-bin/details.cgi?id=pii:S0129065789000475&type=html. Retrieved 2007-11-22. 
  5. ^ Friston, K.J.; C.D. Frith, R.S.J. Frackowiak (October 22 1993). "Principal Component Analysis Learning Algorithms: A Neurobiological Analysis". Proceedings: Biological Sciences 254 (1339): 47–54. doi:10.1098/rspb.1993.0125. JSTOR 49565. PMID 8265675. 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Generalized Hebbian Algorithm — The Generalized Hebbian Algorithm (GHA), also known in the literature as Sanger s rule, is a linear feedforward neural network model for unsupervised learning with applications primarily in principal components analysis. First defined in 1989cite …   Wikipedia

  • Hebbian theory — describes a basic mechanism for synaptic plasticity wherein an increase in synaptic efficacy arises from the presynaptic cell s repeated and persistent stimulation of the postsynaptic cell. Introduced by Donald Hebb in 1949, it is also called… …   Wikipedia

  • Synaptic weight — In neuroscience and computer science, synaptic weight refers to the strength or amplitude of a connection between two nodes, corresponding in biology to the amount of influence the firing of one neuron has on another. The term is typically used… …   Wikipedia

  • Principal components analysis — Principal component analysis (PCA) is a vector space transform often used to reduce multidimensional data sets to lower dimensions for analysis. Depending on the field of application, it is also named the discrete Karhunen Loève transform (KLT),… …   Wikipedia

  • Principal component analysis — PCA of a multivariate Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the (0.878, 0.478) direction and of 1 in the orthogonal direction. The vectors shown are the eigenvectors of the covariance matrix scaled by… …   Wikipedia

  • Q'umarkaj — Gumarcaj Q umarkaj Utatlán   Ruined City (Archaeological Site)   Gumarcaaj (Utatlán) …   Wikipedia

  • Automatic differentiation — In mathematics and computer algebra, automatic differentiation, or AD, sometimes alternatively called algorithmic differentiation, is a method to numerically evaluate the derivative of a function specified by a computer program. Two classical… …   Wikipedia

  • Oyo Empire — before the fourteenth century CE–still existant in 2011 …   Wikipedia

  • Leo Ornstein — Infobox Person name = Leo Ornstein (Лев Орнштейн) image size = 185px caption = Leo Ornstein as a young man birth date = ca. December 2, 1893 birth place = Kremenchug, Poltava, Russian Empire death date = February 24, 2002 (age 108) death place =… …   Wikipedia

  • List of Pokémon episodes — This is a list of episodes of the Pokémon animated series (ポケットモンスター, Poketto Monsutā?, Pocket Monsters). The division between seasons of Pokémon is based on the openings of each episode, and may not reflect the actual production season. The… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”