K-medoids

K-medoids

The K-medoids algorithm is a clustering algorithm related to the K-means algorithm and the medoidshift algorithm. Both the K-means and K-medoids algorithms are partitional (breaking the dataset up into groups) and both attempt to minimize squared error, the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the K-means algorithm K-medoids chooses datapoints as centers (medoids or exemplars).



k-medoid is a classical partitioning technique of clustering that clusters the data set of n objects into k clusters known apriori.

It is more robust to noise and outliers as compared to k-means

A medoid can be defined as that object of a cluster, whose average dissimilarity to all the objects in the cluster is minimal i.e. it is a most centrally located point in the given data set.

K-medoid clustering algorithm is as follows

1) The algorithm begins with arbitrary selection of the k objects as medoid points out of n data points (n>k)

2) After selection of the k medoid points, associate each data object in the given data set to most similar medoid. The similarity here is defined using distance measure that can be euclidean distance, manhattan distance or minkowski distance

3) Randomly select nonmedoid object O’

4) compute total cost , S of swapping initial medoid object to O’

5) If S<0, then swap initial medoid with the new one ( if S<0 then there will be new set of medoids)

6) repeat steps 2 to 5 until there is no change in the medoid.

Demonstration of algorithm

Cluster the following data set of 10 objects into 2 clusters i.e k=2.

Consider data set of 10 data set of objects as follows:


Figure 1.1 distribution of the data




Step 1:

Initialize k centre Let us assume C1=(3,4) and C2=(7,4)So here C1 and C2 are selected as medoid.Calculating distance so as to associate each data object to its nearest medoid. Cost is calculated using Minkowski distance metric with r = 1.

So the clusters become:

Cluster1={(3,4)(2,6)(3,8)(4,7)}Cluster2={(7,4)(6,2)(6,4)(7,3)(8,5)(7,6)}

Since the points (2,6) ( 3,8) and (4,7) are close to c1 hence they form one cluster while remaining points form another cluster.

So the total cost involved is 20.

Where cost between any 2 points is found using formula

where x is any data object, c is the medoid, and d is the dimension of the object which in this case is 2.

Total cost is summation of the cost of data object from its medoid in its cluster so here


Total cost= {cost((3,4),(2,6))+ cost((3,4),(3,8))+ cost((3,4),(4,7))} + {cost((7,4),(6,2))+ cost((7,4),(6,4)) + cost((7,4),(7,3)) + cost((7,4),(8,5)) + cost((7,4),(7,6)) }

=3+4+4+3+1+1+2+2 =20


Figure 1.2 clusters after step 1


Step 2:Selection of nonmedoid O’ randomlyLet us assume O’=(7,3)

So now the medoids are c1(3,4) and O’(7,3)If c1 and O’ are new medoids, calculate the total cost involvedBy using the formula in the step 1







Figure 1.3 clusters after step 2


Total cost = 3+4+4+2+2+1+3+3 =22

So cost of swapping medoid from c2 to O’ is

S=Current total cost – past total cost = 22-20 = 2>0

So moving to O’ would be bad idea,so the previous choice was good and algorithm terminates here(i.e there is no change in the medoids).

It may happen some data points may shift from one cluster to another cluster depending upon their closeness to medoid.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • K-medoids — En statistiques, un médoide[1] est le représentant le plus central d une classe. L algorithme des k medoids est un algorithme de partitionnement plus robuste aux données aberrantes (outliers) que celui des k means. Sommaire 1 Algorithme 2 Voir… …   Wikipédia en Français

  • Medoid — Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal [1] . Medoids are similar in concept to means or centroids, but medoids are always members of… …   Wikipedia

  • k-means clustering — In statistics and data mining, k means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results into a partitioning of… …   Wikipedia

  • Minería de datos — La minería de datos (DM, Data Mining) consiste en la extracción no trivial de información que reside de manera implícita en los datos. Dicha información era previamente desconocida y podrá resultar útil para algún proceso. En otras palabras, la… …   Wikipedia Español

  • Медоид — (в кластерном анализе) объект, принадлежащий набору данных или кластеру, различие (например, по координатам) которого с другими объектами в наборе данных или кластере минимально. Медоиды близки по смыслу центроидам, но в отличие от них, являются… …   Википедия

  • k-means — (метод k средних)  наиболее популярный метод кластеризации. Был изобретён в 1950 х годах математиком Гуго Штейнгаузом[1] и почти одновременно Стюартом Ллойдом[2]. Особую популярность приобрёл после работы Маккуина[3]. Действие алгоритма… …   Википедия

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • PAM — or PAM may refer to: Contents 1 Companies 2 Organisations 3 Military …   Wikipedia

  • Cluster analysis — The result of a cluster analysis shown as the coloring of the squares into three clusters. Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more… …   Wikipedia

  • List of mathematics articles (K) — NOTOC K K approximation of k hitting set K ary tree K core K edge connected graph K equivalence K factor error K finite K function K homology K means algorithm K medoids K minimum spanning tree K Poincaré algebra K Poincaré group K set (geometry) …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”