# Self-organizing map

Self-organizing map

A self-organizing map (SOM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two dimensional), discretized representation of the input space of the training samples, called a map. The map seeks to preserve the topological properties of the input space. This makes SOM useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling. The model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen, and is sometimes called a Kohonen map.

Like most artificial neural networks, SOMs operate in two modes: training and mapping. Training builds the map using input examples. It is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector.

Network structure

A self-organizing map consists of components called nodes or neurons. Associated with each node is a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. The procedure for placing a vector from data space onto the map is to find the node with the closest weight vector to the vector taken from data space and to assign the map coordinates of this node to our vector.

While it is typical to consider this type of network structure as related to feedforward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation.

Useful extensions include using toroidal grids where opposite edges are connected and using large numbers of nodes. It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character.

It is also common to use the U-matrix. The U-matrix value of a particular node is the average distance between the node and its closest neighbors. In a rectangular grid for instance, we might consider the closest 4 or 8 nodes.

Large SOMs display properties which are emergent. Therefore, large maps are preferable to smaller ones. In maps consisting of thousands of nodes, it is possible to perform cluster operations on the map itself. [cite book | first=Alfred | last=Ultsch | title=Emergence in Self-Organizing Feature Maps, In Proceedings Workshop on Self-Organizing Maps (WSOM '07) | publisher=Bielefeld, Germany | year=2007 | id=ISBN 978-3-00-022473-7]

Learning algorithm

The goal of learning in the self-organizing map is to cause different parts of the network to respond similarly to certain input patterns. This is partly motivated by how visual, auditory or other sensory information is handled in separate parts of the cerebral cortex in the human brain.cite book | first=Simon | last=Haykin | title=Neural networks - A comprehensive foundation | chapter=9. Self-organizing maps | edition=2nd edition | publisher=Prentice-Hall | year=1999 | id=ISBN 0-13-908385-5]

The weights of the neurons are initialized either to small random values or sampled evenly from the subspace spanned by the two largest principal component eigenvectors. With the latter alternative, learning is much faster because the initial weights already give good approximation of SOM weights.]

The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. The examples are usually administered several times.

The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron with weight vector most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the input vector. The magnitude of the change decreases with time and with distance from the BMU. The update formula for a neuron with weight vector Wv(t) is:Wv(t + 1) = Wv(t) + Θ (v, t) α(t)(D(t) - Wv(t)),where α(t) is a monotonically decreasing learning coefficient and D(t) is the input vector. The neighborhood function Θ (v, t) depends on the lattice distance between the BMU and neuron "v". In the simplest form it is one for all neurons close enough to BMU and zero for others, but a gaussian function is a common choice, too. Regardless of the functional form, the neighborhood function shrinks with time. At the beginning when the neighborhood is broad, the self-organizing takes place on the global scale. When the neighborhood has shrunk to just a couple of neurons the weights are converging to local estimates.

This process is repeated for each input vector for a (usually large) number of cycles λ. The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net.

During mapping, there will be one single "winning" neuron: the neuron whose weight vector lies closest to the input vector. This can be simply determined by calculating the Euclidean distance between input vector and weight vector.

While representing input data as vectors has been emphasized in this article, it should be noted that any kind of object which can be represented digitally and which has an appropriate distance measure associated with it and in which the necessary operations for training are possible can be used to construct a self-organizing map. This includes matrices, continuous functions or even other self-organizing maps.

Example

Preliminary definitions

Consider a 10×10 array of nodes each of which contains a weight vector and is aware of its location in the array. Each weight vector is of the same dimension as the node's input vector. The weights are initially set to random values.

Now we need input to feed the map. (The generated map and the given input exist in separate subspaces) We will create three vectors to represent colors. Colors can be represented by their red, green, and blue components. Consequently our input vectors will have three components, each corresponding to a color space. The input vectors will be: R = <255, 0, 0> G = <0, 255, 0> B = <0, 0, 255>

Variables

Vectors are in bold t = current iteration λ = limit on time iteration Wv = current weight vector D = target input Θ(t) = restraint due to distance from BMU - usually called the neighbourhood function α(t) = learning restraint due to time

Stepping through the algorithm

# Randomize the map's nodes' weight vectors
# Grab an input vector
# Traverse each node in the map
## Use Euclidean distance formula to find similarity between the input vector and the map's node's weight vector
## Track the node that produces the smallest distance (this node is the best matching unit, BMU)
# Update the nodes in the neighbourhood of BMU by pulling them closer to the input vector
## Wv(t + 1) = Wv(t) + Θ(t)α(t)(D(t) - Wv(t))
# Increment t and repeat while t < λ

Interpretation

There are two ways to interpret a SOM. Because in the training phaseweights of the whole neighborhood are moved in the same direction,similar items tend to excite adjacent neurons. Therefore, SOM forms a semantic map where similar samples are mapped close together anddissimilar apart.

The other way is to think of neuronal weights as pointers to the input space. They form a discrete approximation of the distribution of training samples. More neurons point to regions with high training sample concentration and fewer where the samples are scarce.

SOM may be considered a nonlinear generalization of principal component analysis ["Yin H." [http://pca.narod.ru/contentsgkwz.htm Learning Nonlinear Principal Manifolds by Self-Organising Maps] , In: Gorban A. N. et al (Eds.), LNCSE 58, Springer, 2007 ISBN 978-3-540-73749-0] .

Alternatives

Generative topographic maps (GTM) are a potential alternative to SOMs. In the sense that GTM explicitly requires a smooth and continuous mapping from the input space to the map space, it is topology preserving. However, in a practical sense, this measure of topological preservation is lacking. [cite paper | author=Kaski, S. | title=Data exploration using self-organizing maps, Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series No. 82, Espoo 1997, 57 pp.]

References

* Rustum R., Adeloye A.J., and Scholz M. , 2008. Applying Kohonen Self-organizing Map as a Software Sensor to Predict the Biochemical Oxygen Demand. Water Environment Research, 80 (1), 2008.

* Rustum R and A.J.Adeloye, 2007. Replacing outliers and missing values from activated sludge data using Kohonen Self Organizing Map. Journal of Environmental Engineering, Vol. 133,No. 9, September 1, 2007, pp. 909-916.

ee also

* Artificial intelligence
* Biologically-inspired computing
* Connectionism
* Data mining
* Growing self-organizing map
* Growing neural gas
* Generative topographic map
* Nonlinear dimensionality reduction
* Machine learning
* Pattern recognition
* Predictive analytics
* Artificial Neural Network
* Sammon's projection
* J. Brant Arseneau

* Chapter 15 [http://page.mi.fu-berlin.de/rojas/neural/chapter/K15.pdf Kohonen networks] of [http://www.inf.fu-berlin.de/inst/ag-ki/rojas_home/pmwiki/pmwiki.php?n=Books.NeuralNetworksBook "Neural Networks - A Systematic Introduction"] by Raúl Rojas (ISBN 978-3540605058)
* [http://www.cis.hut.fi/teuvo/ Prof. Kohonen's website in Helsinki University of Technology] See ->research ->software for Toolkits and C and Matlab code for SOM's
* [http://www.scholarpedia.org/wiki/index.php?title=Kohonen_Network Kohonen network] a Scholarpedia article on the Self-Organizing Map
* [http://blog.peltarion.com/2007/04/10/the-self-organized-gene-part-1 "The Self-Organized Gene, Part 1"] , and [http://blog.peltarion.com/2007/06/13/the-self-organized-gene-part-2 Part 2] beginners' level introduction to competitive learning and self-organizing maps.
* [http://www.shef.ac.uk/psychology/gurney/notes/l7/l7.html Chapter 7: "Competition and self organisation: Kohonen nets"] , part of Kevin Gurney's web-book on Neural Nets.
* [http://neurondotnet.freehostia.com/index.html NeuronDotNet] , Implementation in C# along with sample applications
* [http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html Nice application to visualize some neural-network algorithms.] Written in Java, so you need a Java-plugin in your browser.
* [http://www.heatonresearch.com/articles/6/page1.html Programming a Kohonen Neural Network in Java] part of a Java Neural Network web book.
* [http://www.mathematik.uni-marburg.de/~databionics/de Databionics] Prof. A. Ultsch's websites on Visualization and Datamining with SOM
* [http://www.e-nuts.net/en/self-organizing-map Kohonen Neural Network] applied to the Traveling Salesman Problem (using three dimensions).
* [http://www.len.ro/work/ai/som-neural-networks Python SOM example] : Simple SOM python library along with a 2D implementation and some very suggestive images.
* [http://www.viscovery.net/somine/index.php?sprache=en Viscovery SOMine] : SOM Technology Tool from Viscovery Software (formerly Eudaptics Software)
* [http://ai-depot.com/Tutorial/SomColour.html SomColour tutorial] : Self-Organising Maps For Colour Recognition
* [http://websom.hut.fi/websom/ WEBSOM] , a Kohonen network project
* [http://www.staffmapper.com/ Staff-Mapper] , Solution for mapping knowledge based on survey results
* [http://www.samhill.co.uk/kohonen/ Kohonen Neural Network Simulation] , Demonstration of and detailed discussion about Kohonen Neural Networks, a form of SOM
* [http://netzspannung.org/about/tools/semantic-map/index.xsp?lang=en The Semantic Map Interface based on Kohonen Map] for Text analysis and visualization on [http://netzspannung.org/index_en_flash.html netzspannung.org] - the German platform for media art & electronic culture.
* [http://homepages.feis.herts.ac.uk/~nngroup/software.html Java Competitve Learning Application] A suite of Unsupervised Neural Networks (including Self-organizing map) written in Java. Complete with all source code.
* [http://www.sarstech.com/kohonen/ Kohonen Map Java Applet]
* [http://www.visipoint.fi/ SOM and Sammon's mapping software for multivariate data analysis]

*

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Self-organizing map — Carte auto adaptative Carte auto adaptative ou auto organisatrice est une classe de réseau de neurones artificiels fondée sur des méthodes d apprentissage non supervisée. On la désigne souvent par le terme anglais self organizing map (SOM), on… …   Wikipédia en Français

• Growing self-organizing map — A growing self organizing map (GSOM) is a growing variant of the popular self organizing map (SOM). The GSOM was developed to address the issue of identifying a suitable map size in the SOM. It starts with a minimal number of nodes (usually 4)… …   Wikipedia

• Self-Organizing Maps — Als Selbstorganisierende Karten, Kohonenkarten oder Kohonennetze (nach Teuvo Kohonen) bezeichnet man eine Art von künstlichen neuronalen Netzen. Sie sind als unüberwachtes Lernverfahren ein leistungsfähiges Werkzeug des Data Mining. Ihr… …   Deutsch Wikipedia

• Generative topographic map — (GTM) is a machine learning method that is a probabilistic counterpart of the self organizing map (SOM), is provably convergent and does not require a shrinking neighborhood or a decreasing step size. It is a generative model: the data is assumed …   Wikipedia

• Mind map — A hand drawn mind map A mind map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea. Especially in British English, the terms spidergram and spidergraph are more common,[1]… …   Wikipedia

• Data mining in meteorology — Meteorology is the interdisciplinary scientific study of the atmosphere. It observes the changes in temperature, air pressure, moisture and wind direction. Usually, temperature, pressure, wind measurements and humidity are the variables that are… …   Wikipedia

• Selbstorganisierende Karte — Als Selbstorganisierende Karten, Kohonenkarten oder Kohonennetze (nach Teuvo Kohonen; engl. self organizing map, SOM) bezeichnet man eine Art von künstlichen neuronalen Netzen. Sie sind als unüberwachtes Lernverfahren ein leistungsfähiges… …   Deutsch Wikipedia

• Nonlinear dimensionality reduction — High dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non linear manifold within… …   Wikipedia

• Kohonen-karten — Als Selbstorganisierende Karten, Kohonenkarten oder Kohonennetze (nach Teuvo Kohonen) bezeichnet man eine Art von künstlichen neuronalen Netzen. Sie sind als unüberwachtes Lernverfahren ein leistungsfähiges Werkzeug des Data Mining. Ihr… …   Deutsch Wikipedia

• Kohonenkarte — Als Selbstorganisierende Karten, Kohonenkarten oder Kohonennetze (nach Teuvo Kohonen) bezeichnet man eine Art von künstlichen neuronalen Netzen. Sie sind als unüberwachtes Lernverfahren ein leistungsfähiges Werkzeug des Data Mining. Ihr… …   Deutsch Wikipedia