Multilayer perceptron

Multilayer perceptron

A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate output. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training the network.[1][2] MLP is a modification of the standard linear perceptron, which can distinguish data that is not linearly separable.[3]

Contents

Theory

Activation function

If a multilayer perceptron has a linear activation function in all neurons, that is, a simple on-off mechanism to determine whether or not a neuron fires, then it is easily proved with linear algebra that any number of layers can be reduced to the standard two-layer input-output model (see perceptron). What makes a multilayer perceptron different is that each neuron uses a nonlinear activation function which was developed to model the frequency of action potentials, or firing, of biological neurons in the brain. This function is modeled in several ways, but must always be normalizable and differentiable.

The two main activation functions used in current applications are both sigmoids, and are described by

\phi(y_i) = \tanh(v_i) ~~ \textrm{and} ~~ \phi(y_i) = (1+e^{-v_i})^{-1},

in which the former function is a hyperbolic tangent which ranges from -1 to 1, and the latter, the logistic function, is similar in shape but ranges from 0 to 1. Here yi is the output of the ith node (neuron) and vi is the weighted sum of the input synapses. More specialized activation functions include radial basis functions which are used in another class of supervised neural network models.

Layers

The multilayer perceptron consists of three or more layers (an input and an output layer with one or more hidden layers) of nonlinearly-activating nodes. Each node in one layer connects with a certain weight wij to every node in the following layer.

Learning through backpropagation

Learning occurs in the perceptron by changing connection weights after each piece of data is processed, based on the amount of error in the output compared to the expected result. This is an example of supervised learning, and is carried out through backpropagation, a generalization of the least mean squares algorithm in the linear perceptron.

We represent the error in output node j in the nth data point by ej(n) = dj(n) − yj(n), where d is the target value and y is the value produced by the perceptron. We then make corrections to the weights of the nodes based on those corrections which minimize the error in the entire output, given by

\mathcal{E}(n)=\frac{1}{2}\sum_j e_j^2(n).

Using gradient descent, we find our change in each weight to be

\Delta w_{ji} (n) = -\eta\frac{\partial\mathcal{E}(n)}{\partial v_j(n)} y_i(n)

where yi is the output of the previous neuron and η is the learning rate, which is carefully selected to ensure that the weights converge to a response fast enough, without producing oscillations. In programming applications, this parameter typically ranges from 0.2 to 0.8.

The derivative to be calculated depends on the induced local field vj, which itself varies. It is easy to prove that for an output node this derivative can be simplified to

-\frac{\partial\mathcal{E}(n)}{\partial v_j(n)} = e_j(n)\phi^\prime (v_j(n))

where \phi^\prime is the derivative of the activation function described above, which itself does not vary. The analysis is more difficult for the change in weights to a hidden node, but it can be shown that the relevant derivative is

-\frac{\partial\mathcal{E}(n)}{\partial v_j(n)} = \phi^\prime (v_j(n))\sum_k -\frac{\partial\mathcal{E}(n)}{\partial v_k(n)} w_{kj}(n).

This depends on the change in weights of the kth nodes, which represent the output layer. So to change the hidden layer weights, we must first change the output layer weights according to the derivative of the activation function, and so this algorithm represents a backpropagation of the activation function. [4]

Applications

Multilayer perceptrons using a backpropagation algorithm are the standard algorithm for any supervised-learning pattern recognition process and the subject of ongoing research in computational neuroscience and parallel distributed processing. They are useful in research in terms of their ability to solve problems stochastically, which often allows one to get approximate solutions for extremely complex problems like fitness approximation.

Currently, they are most commonly seen in speech recognition, image recognition, and machine translation software, but they have also seen applications in other fields such as cyber security. In general, their most important use has been in the growing field of artificial intelligence, although the multilayer perceptron does not have connections with biological neural networks as initial neural based networks have.[5]

References

  1. ^ Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961
  2. ^ Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. “Learning Internal Representations by Error Propagation”. David E. Rumelhart, James L. McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundations. MIT Press, 1986.
  3. ^ Cybenko, G. 1989. Approximation by superpositions of a sigmoidal function Mathematics of Control, Signals, and Systems, 2(4), 303–314.
  4. ^ Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation (2 ed.). Prentice Hall. ISBN 0132733501. 
  5. ^ Neural networks. II. What are they and why is everybody so interested in them now?; Wasserman, P.D.; Schwartz, T.; Page(s): 10-15; IEEE Expert, 1988, Volume 3, Issue 1

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Perceptrón multicapa — Saltar a navegación, búsqueda El perceptrón multicapa es una red neuronal artificial (RNA) formada por múltiples capas, esto le permite resolver problemas que no son linealmente separables, lo cual es la principal limitación del perceptrón… …   Wikipedia Español

  • Perceptron — Perceptrons redirects here. For the book of that title, see Perceptrons (book). The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt.[1] It can be seen as the simplest… …   Wikipedia

  • Многослойный перцептрон Румельхарта — У этого термина существуют и другие значения, см. Многослойный перцептрон. Архитектура многослойного перцептрона Многослойный перцептрон  частный случай перцептрона Розенблатта, в котором один алгоритм обратного распространения …   Википедия

  • Recurrent neural network — A recurrent neural network (RNN) is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.Recurrent neural networks must …   Wikipedia

  • Artificial neuron — An artificial neuron is a mathematical function conceived as a crude model, or abstraction of biological neurons. Artificial neurons are the constitutive units in an artificial neural network. Depending on the specific model used, it can receive… …   Wikipedia

  • Перцептрон — Логическая схема перцептрона с тремя выходами Перцептрон, или персептрон[nb 1] (англ. perceptron от …   Википедия

  • Predictive analytics — encompasses a variety of techniques from statistics and data mining that analyze current and historical data to make predictions about future events. Such predictions rarely take the form of absolute statements, and are more likely to be… …   Wikipedia

  • Персептрон — Логическая схема перцептрона с тремя выходами Перцептрон, или персептрон[nb 1] (англ. perceptron от лат. perceptio  восприятие; нем. perzeptron)  математическая и компьютерная модель восприятия информации мозгом (кибернетическая модель мозга),… …   Википедия

  • Morphological computation (robotics) — Morphological computation is computation obtained through interactions of physical form. Contents 1 Birth of the term morphological computation 2 Relevance 2.1 Robotics 2.2 Artificial intellige …   Wikipedia

  • MLP — may stand for: Malta Labour Party Marschollek, Lautenschläger und Partner, a large German financial services corporation Mary Louise Parker, an American actress Master limited partnership Memory Level Parallelism Meridian Lossless Packing, a… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”