Isomap

Isomap

In statistics, isomap is one of several widely used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). It is used for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points. It is one of representative isometric mapping methods, which extends metric multidimensional scaling (MDS), considering Dijkstra's geodesic distances (shortest paths) on a weighted graph, instead of Euclidean distances. The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point’s neighbors on the manifold. It is highly efficient and generally applicable to a broad range of data sources and dimensionalities.

The classical scaling, that is one of metric MDS, is a method of low-dimensional embedding based on pairwise similarity between data points. In general, Euclidean distance is used as a measure of dissimilarity (or similarity) in MDS. The basic idea in isomap is to use geodesic distances on a neighborhood graph in the framework of the classical scaling, in order to incorporate with the manifold structure, instead of subspace. The sum of edge weights along the shortest path between two nodes is assigned as geodesic distance. The top "n" eigenvectors of the geodesic distance matrix, represent the coordinates in the n-dimensional Euclidean space.

Following the connection between the classical scaling and PCA, metric MDS can be interpreted as kernel PCA. In a similar manner, the geodesic distance matrix in Isomap can be viewed as a kernel matrix. The doubly centered geodesic distance matrix "K" in Isomap is of the form

: K = frac{1}{2} HD^2 H,

where "D"2 = "D"2"ij" means the elementwise square of the geodesic distance matrix "D" = ["D""ij"] , "H" is the centering matrix, given by

: H = I_n-frac{1}{N} e_N e^T_N, quad ext{where }e_N= [1 dots 1] ^T in mathbb{R}^N.

However, the kernel matrix "K" is not always positive semidefinite. The main idea for kernel Isomap is to make this "K" as a Mercer kernel matrix (that is positive semidefinite) using a constant-shifting method, in order to relate it to kernel PCA such that the generalization property naturally emerges.

The success of isomap depends on being able to choose a neighborhood size (ε or "K") that is neither so large that it introduces “short-circuit” edges into the neighborhood graph,nor so small that the graph becomes too sparse to approximate geodesic paths accurately. Short-circuit edges can lead to low-dimensional embeddings that do not preserve a manifold’s true topology. It defines the connectivity of each data point via its nearest Euclidean neighbors in the high-dimensional space. This step is vulnerable to short-circuit errors if the neighborhood is too large with respect to folds in the manifold on which the data points lie or if noise in the data moves the points slightly off the manifold. Even a single short-circuit error can alter many entries in the geodesic distance matrix, which in turn can lead to a drastically different (and incorrect) low-dimensional embedding. The geodesic distance matrix used in Isomap, can be interpreted as a kernel matrix. However, the kernel matrix based on the doubly centered geodesic distance matrix is not always positive semidefinite.

External links

* [http://isomap.stanford.edu/ Isomap webpage at Stanford university]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Nonlinear dimensionality reduction — High dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non linear manifold within… …   Wikipedia

  • Dimension reduction — For dimensional reduction in physics, see Dimensional reduction. In machine learning, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature… …   Wikipedia

  • Robot — This article is about mechanical robots. For other uses of the term, see robot (disambiguation). For software agents, see Bot. ASIMO (2000) at the Expo 2005, a humanoid robot …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Feature extraction — In pattern recognition and in image processing, Feature extraction is a special form of dimensionality reduction.When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not… …   Wikipedia

  • Uncanny valley — The uncanny valley is a hypothesis that when robots and other facsimiles of humans look and act almost like actual humans, it causes a response of revulsion among human observers. The valley in question is a dip in a proposed graph of the… …   Wikipedia

  • Semidefinite embedding — (SDE) or maximum variance unfolding (MVU) is an algorithm in computer science, that uses semidefinite programming to perform non linear dimensionality reduction of high dimensional vectorial input data. Non linear dimensionality reduction… …   Wikipedia

  • List of mathematics articles (I) — NOTOC Ia IA automorphism ICER Icosagon Icosahedral 120 cell Icosahedral prism Icosahedral symmetry Icosahedron Icosian Calculus Icosian game Icosidodecadodecahedron Icosidodecahedron Icositetrachoric honeycomb Icositruncated dodecadodecahedron… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”