- Geometric median
The geometric median of a discrete set of sample points in a
Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes themedian , which has the property of minimizing the sum of distances for one-dimensional data, and provides acentral tendency in higher dimensions. It is also known as the Fermat–Weber point or 1-median. [The more general "k"-median problem asks for the location of "k" cluster centers minimizing the sum of distances from each sample point to its nearest center.]The geometric median is an important
estimator of location in statistics. It is also a standard problem infacility location , where it models the problem of locating a facility to minimize the cost of transportation.The special case of the problem for three points in the plane (that is, "m" = 3 and "n" = 2) is sometimes also known as Fermat's problem; it arises in the construction of minimal
Steiner tree s, and was originally posed as a problem byPierre de Fermat toEvangelista Torricelli , who solved it. Its solution is now known as theFermat point of the triangle formed by the three sample points.Alfred Weber 's name is associated with the more general Fermat–Weber problem due to a discussion of the problem in his 1909 book on facility location.Wesolowsky (1993) provides a survey of the problem. See Fekete, Mitchell, and Beurer (2003) for generalizations of the problem to non-discrete point sets.
Definition
Formally, for given a set of "m" points with each , the geometric median is defined as
:Geometric Median
Note that "argmin" means the argument for which the sum is minimized. In this case, it is the point from where the sum of all
Euclidean distance s to the 's is minimum.Properties
* For the 1-dimensional case, the geometric median coincides with the
median . This is because theunivariate median also minimizes the sum of distances from the points.
* The geometric median is unique whenever the points are notcollinear .
* The geometric median isequivariant for Euclidean similarity transformations, including translation and rotation. This means that one would get the same result either by transforming the geometric median, or by applying the same transformation to the sample data and finding the geometric median of the transformed data. This property follows from the fact that the geometric median is defined only from pairwise distances, and doesn't depend on the system of orthogonalCartesian coordinates by which the sample data is represented. In contrast, the component-wise median for a multivariate data set is not in general rotation invariant, nor is it independent of the choice of coordinates.
* The geometric median has abreakdown point of 0.5. [Lopuhaä and Rousseeuw (1991).] That is, up to half of the sample data may be arbitrarily corrupted, and the median of the samples will still provide arobust estimator for the location of the uncorrupted data.pecial cases
*For 3 points, if any angle of the triangle is more than 120° then the geometric median is the point making that angle. If all the angles are less than 120°, the geometric median is the point inside the triangle which subtends an angle of 120° to all three pairs of points. This is also known as the
Fermat point of the triangle formed by the three points.
*For 4coplanar points, if one of the four points is inside the triangle formed by the other three points, then the geometric median is that point. Otherwise, the points form a convexquadrilateral and the geometric median is the crossing point of the diagonals of the quadrilateral. The geometric median of four points is also known as theRadon point of the four points.Computation
Despite being an easy to understand concept, computing the geometric median poses a challenge. The
centroid orcenter of mass , defined similarly to the geometric median as minimizing the sum of the squares of the distances to each sample, can be found by a simple formula — its coordinates are the averages of the coordinates of the samples — but no such formula is known for the geometric median, and it has been shown that no formula involving only arithmetic operations and "k"th roots can exist in general. [Cockayne and Melzak (1969); Bajaj (1988).]However, it is straightforward to calculate an approximation to the geometric median using an iterative procedure in which each step produces a more accurate approximation. Procedures of this type can be derived from the fact that the sum of distances is a
convex function , since the distance to each sample point is convex and the sum of convex functions remains convex. Therefore, procedures that decrease the sum of distances at each step cannot get trapped in alocal optimum .One common approach of this type, called Weiszfeld's algorithm [Weiszfeld (1937); Kuhn (1973); Chandrasekaran and Tamir (1989).] , is a form of
iteratively re-weighted least squares . This algorithm defines a set of weights that are inversely proportional to the distances from the current estimate to the samples, and creates a new estimate that is the weighted average of the samples according to these weights. That is,:Bose et al (2003) describe more sophisticated geometric optimization procedures for finding approximately optimal solutions to this problem.
Implicit formula
If "y" is distinct from all the given points, "x""j", then "y" is the geometric median if and only if it satisfies::
This is equivalent to::
which is closely related to Weiszfeld's algorithm.
If "y" is equal to some of the given points, then "y" is the geometric median if and only if there are vectors "u""j" such that::
where for "x""j" ≠ "y", :
and for "x""j" = "y", :
ee also
*
Central tendency
*Centroid , which minimizes the sum of "squares" of Euclidean distanceNotes
References
*cite journal
author = Bajaj, C.
title = The algebraic degree of geometric optimization problems
journal = Discrete and Computational Geometry
year = 1988
volume = 3
pages = 177–191
doi = 10.1007/BF02187906*cite journal
title = Fast approximations for sums of distances, clustering and the Fermat–Weber problem
author = Bose, Prosenjit; Maheshwari, Anil; Morin, Pat
journal = Computational Geometry: Theory and Applications
volume = 24
issue = 3
pages = 135–146
year = 2003
doi = 10.1016/S0925-7721(02)00102-5
url = http://www.scs.carleton.ca/~jit/publications/papers/bmm01.ps*cite journal
author = Chandrasekaran, R.; Tamir, A.
title = Open questions concerning Weiszfeld's algorithm for the Fermat-Weber location problem
journal = Mathematical Programming, Series A
volume = 44
year = 1989
pages = 293–295
doi = 10.1007/BF01587094*cite journal
author = Cockayne, E. J.; Melzak, Z. A.
title = Euclidean constructability in graph minimization problems.
journal = Mathematics Magazine
volume = 42
pages = 206–208
year = 1969*cite journal
author = Fekete, Sándor P.; Mitchell, Joseph S. B.; Beurer, Karin
title = On the continuous Fermat-Weber problem
year = 2003
id = arxiv | archive = cs.CG | id = 0310027*cite journal
author = Kuhn, Harold W.
title = A note on Fermat's problem
journal = Mathematical Programming
year = 1973
volume = 4
issue = 1
pages = 98–107
doi = 10.1007/BF01584648*cite journal
author = Lopuhaä, Hendrick P.; Rousseeuw, Peter J.
title = Breakdown points of affine equivariant estimators of multivariate location and covariance matrices
year = 1991
journal = Annals of Statistics
volume = 19
pages = 229–248
url = http://links.jstor.org/sici?sici=0090-5364(199103)19%3A1%3C229%3ABPOAEE%3E2.0.CO%3B2-1
issue = 1
doi = 10.1214/aos/1176347978*cite book
author = Weber, Alfred
title = Über den Standort der Industrien, Erster Teil: Reine Theorie des Standortes
location = Tübingen
publisher = Mohr
year = 1909*cite journal
author = Wesolowsky, G.
title = The Weber problem: History and perspective
journal = Location Science
volume = 1
pages = 5–23
year = 1993*cite journal
author = Weiszfeld, E.
title = Sur le point pour lequel la somme des distances de "n" points donnes est minimum
journal = Tohoku Math. Journal
volume = 43
year = 1937
pages = 355–386
Wikimedia Foundation. 2010.