Medoid

Medoid

Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal [1] . Medoids are similar in concept to means or centroids, but medoids are always members of the data set. Medoids are most commonly used on data when a mean or centroid cannot be defined such as 3-D trajectories or in the gene expression context[2]. The term is used in computer science in data clustering algorithms.

A common application of the medoid is the k-medoids clustering algorithm, which is similar to the k-means algorithm but works when a mean or centroid is not definable. This algorithm basically works as follows. First, a set of medoids is chosen at random. Second, the distances to the other points are computed. Third, data are clustered according to the medoid they are most similar to. Fourth, the medoid set is optimized via an iterative process.

Note that a medoid is not equivalent to a median or a geometric median. A median is only defined on 1-dimensional data, and it only minimizes dissimilarity to other points for a specific distance metric (Manhattan norm). A geometric median is not necessarily a point from within the original dataset.

See also

References

  1. ^ Struyf, Anja; Mubert, Mia and Rousseeuw, Peter (1997). "Clustering in an Object-Oriented Environment". Journal of Statistical Software 1 (4): 1–30. http://www.jstatsoft.org/v01/i04. 
  2. ^ Van Der Lann, Mark J; Pollard, Katherine S; Bryan, Jennifer; E (2003). "A New Partitioning Around Medoids Algorithm". Journal of Statistical Computation and Simulation (Taylor & Francis Group) 73 (8): 575–584. doi:10.1080/0094965031000136012. 

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • medoid — noun A mathematically representative object in a set of objects; it has the smallest average dissimilarity to all other objects in the set …   Wiktionary

  • K-medoids — The K medoids algorithm is a clustering algorithm related to the K means algorithm and the medoidshift algorithm. Both the K means and K medoids algorithms are partitional (breaking the dataset up into groups) and both attempt to minimize squared …   Wikipedia

  • Clustering high-dimensional data — is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high dimensional data spaces are often encountered in areas such as medicine, where DNA microarray technology can produce a large number of… …   Wikipedia

  • Median — This article is about the statistical concept. For other uses, see Median (disambiguation). In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability… …   Wikipedia

  • Shape context — is the term given by Serge Belongie and Jitendra Malik to the feature descriptor they first proposed in their paper Matching with Shape Contexts in 2000cite conference author = S. Belongie and J. Malik title = Matching with Shape Contexts url =… …   Wikipedia

  • k-means clustering — In statistics and data mining, k means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results into a partitioning of… …   Wikipedia

  • Hierarchische Clusteranalyse — Als Hierarchische Clusteranalyse bezeichnet man eine bestimmte Familie von distanzbasierten Verfahren zur Clusteranalyse (Strukturentdeckung in Datenbeständen). Cluster bestehen hierbei aus Objekten, die zueinander eine geringere Distanz (oder… …   Deutsch Wikipedia

  • Silhouettenkoeffizient — Die Silhouette gibt für eine Beobachtung an, wie gut die Zuordnung zu den beiden nächstgelegenen Clustern ist. Der Silhouettenkoeffizient gibt ein von der Cluster Anzahl unabhängige Maßzahl für die Qualität eines Clusterings an. Der… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”