Prior knowledge for pattern recognition

Pattern recognition is a very active field of research intimately bound to machine learning. Also known as classification or statistical classification, pattern recognition aims at building a classifier that can determine the class of an input pattern. This procedure, known as training, corresponds to learning an unknown decision function based only on a set of input-output pairs $(oldsymbol{x}_i,y_i)$ that form the training data (or training set). Nonetheless, in real world applications such as character recognition, a certain amount of information on the problem is usually known beforehand. The incorporation of this prior knowledge into the training is the key element that will allow an increase of performance in many applications.

Definition

Prior knowledge, as defined in [Scholkopf02] , refers to all information about the problem available in addition to the training data. However, in this most general form, determining a model from a finite set of samples without prior knowledge is an ill-posed problem, in the sense that a unique model may not exist. Many classifiers incorporate the general smoothness assumption that a test pattern similar to one of the training samples tends to be assigned to the same class.

The importance of prior knowledge in machine learning is suggested by its role in search and optimization. Loosely, the no free lunch theorem states that all search algorithms have the same average performance over all problems, and thus implies that to gain in performance on a certain application one must use a specialized algorithm that includes some prior knowledge about the problem.

The different types of prior knowledge encountered in pattern recognition are now regrouped under two main categories: class-invariance and knowledge on the data.

Class-invariance

A very common type of prior knowledge in pattern recognition is the invariance of the class (or the output of the classifier) to a transformation of the input pattern. This type of knowledge is referred to as transformation-invariance. The mostly used transformations used in image recognition are:

* translation;
* rotation;
* skewing;
* scaling.

Incorporating the invariance to a transformation $T_{ heta}: oldsymbol{x} mapsto T_{ heta}oldsymbol{x}$ parametrized in $heta$ into a classifier of output $f(oldsymbol{x})$ for an input pattern $oldsymbol{x}$ corresponds to enforce the equality

$f(oldsymbol{x}) = f(T_{ heta}oldsymbol{x}), quad forall oldsymbol{x}, heta$

Local invariance can also be considered for a transformation centered at $heta=0$ , so that $T_0oldsymbol{x} = oldsymbol{x}$ , by the constraint

$left.frac{partial}{partial heta}
ight|_{ heta=0} f(T_{ heta} oldsymbol{x}) = 0$

It must be noted that $f$ in these Equations can be either the decision function of the classifier or its real-valued output.

Another approach is to consider the class-invariance with respect to a "domain of the input space" instead of a transformation. In this case, the problem becomes finding $f$ so that

$f(oldsymbol{x}) = y_{mathcal{P, forall oldsymbol{x}in mathcal{P}$

where $y_{mathcal{P$ is the membership class of the region $mathcal{P}$ of the input space.

A different type of class-invariance found in pattern recognition is the permutation-invariance, i.e. invariance of the class to a permutation of elements in a structured input. A typical application of this type of prior knowledge is a classifier invariant to permutations of rows in matrix inputs.

Knowledge on the data

Other forms of prior knowledge than class-invariance concern the data more specifically and are thus of particular interest for real-world applications. The three particular cases that most often occur when gathering data are:
* Unlabeled samples are available with supposed class-memberships;
* Imbalance of the training set due to a high proportion of samples of a class;
* Quality of the data may vary from a sample to another.

Prior knowledge on these can enhance the quality of the recognition if included in the learning. Moreover, not taking into account the poor quality of some data or a large imbalance between the classes can mislead the decision of a classifier.

References

* [Scholkopf02] , B. Scholkopf and A. Smola, "Learning with Kernels", MIT Press 2002.

* [Krupka07] , E. Krupka and N. Tishby, "Incorporating Prior Knowledge on Features into Learning", Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 07)

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

Pattern recognition — is a sub topic of machine learning. It is the act of taking in raw data and taking an action based on the category of the data .citation needed|date=September 2008 Most research in pattern recognition is about methods for supervised learning and… … Wikipedia
Pattern theory — Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery … Wikipedia
Knowledge Management — (KM) comprises a range of practices used by organisations to identify, create, represent, distribute and enable adoption of what it knows, and how it knows it. It has been an established discipline since 1995 [Stankosky, 2005] with a body of… … Wikipedia
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
A Guide for the Perplexed — is a short book by E.F. Schumacher, published in 1977. Schumacher himself considered A Guide for the Perplexed to be his most important achievement, although he was better known for his 1974 environmental economics bestseller Small Is Beautiful … Wikipedia
No Country for Old Men (film) — No Country for Old Men Theatrical release poster Directed by Joel Coen Ethan Coen … Wikipedia
Artificial neural network — An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an… … Wikipedia
Go (game) — This article is about Go, the board game. For other uses, see Go (disambiguation). Goe redirects here. For other uses, see GOE (disambiguation). Go Go is played on a grid of black lines (usually 19×19). The playing pieces, called stones, are… … Wikipedia
Mediation — For the Wikipedia mediation process for resolving disputes, see Wikipedia:Mediation. For other uses, see Mediation (disambiguation) … Wikipedia
Meteorology — For other uses of the root word meteor , see Meteor (disambiguation). For the work by Aristotle, see Meteorology (Aristotle). Not to be confused with Metrology. Part of the Nature series on Weather … Wikipedia

Academic Dictionaries and Encyclopedias

Prior knowledge for pattern recognition

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Prior knowledge for pattern recognition

Look at other dictionaries:

Share the article and excerpts

Direct link