Sørensen similarity index

Sørensen similarity index

The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic used for comparing the similarity of two samples. It was developed by the botanist Thorvald Sørensen and published in 1948 [Sørensen, T. (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter / Kongelige Danske Videnskabernes Selskab, 5 (4): 1-34.] .

The shorthand version of the formula, as applied to qualitative data, is: QS = { { 2 cdot C } over A + B }where A and B are the species numbers in sample A and B, respectively, and C is the number of species shared by the two samples. This expression is easily extended to abundance instead of incidence of species. This quantitative version of the Sørensen index is also known as "Czekanowski index". Multiplying by 2, we get Dice's coefficient which is always in [0,1] range. Sørensen index used as a distance measure, 1 - "QS", is identical to Hellinger distance and Bray-Curtis distance.

The Sørensen coefficient is mainly useful for ecological community data (e.g. Looman & Campbell, 1960 [ [http://links.jstor.org/sici?sici=0012-9658%28196007%2941%3A3%3C409%3AAOSK%28F%3E2.0.CO%3B2-1 Looman, J. and Campbell, J.B. (1960) Adaptation of Sorensen's K (1948) for estimating unit affinities in prairie vegetation. Ecology 41 (3): 409-416.] ] ). Justification for its use is primarily empirical rather than theoretical (although it can be justified theoretically as the intersection of two fuzzy sets [ [http://doi.dx.org/10.1007/BF00039905 Roberts, D.W. (1986) Ordination on the basis of fuzzy set theory. Vegetatio 66 (3): 123-131.] ] ). As compared to Euclidean distance, Sørensen distance retains sensitivity in more heterogeneous data sets and gives less weight to outliers [ McCune, Bruce & Grace, James (2002) Analysis of Ecological Communities. Mjm Software Design; ISBN 0972129006.] .

See also

* Jaccard index
* Kulczyński similarity index
* Renkonen similarity index
* Czekanowski similarity index
* Hamming distance
* Correlation
* Dice's coefficient


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Thorvald Sørensen — Thorvald Julius Sørensen (1902–1973) was a Danish botanist and evolutionary biologist.Sørensen was professor at the Royal Veterinary and Agricultural College 1953 1955 and at the University of Copenhagen 1955 1972. He was director of the… …   Wikipedia

  • Jaccard index — The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets.The Jaccard coefficient measures… …   Wikipedia

  • Cosine similarity — is a measure of similarity between two vectors by measuring the cosine of the angle between them. The cosine of 0 is 1, and less than 1 for any other angle. The cosine of the angle between two vectors thus determines whether two vectors are… …   Wikipedia

  • Dice's coefficient — Dice s coefficient, named after Lee Raymond Dice[1] and also known as the Dice coefficient, is a similarity measure over sets: It is identical to the Sørensen similarity index, and is occasionally referred to as the Sørensen Dice coefficient. It… …   Wikipedia

  • Levenshtein distance — In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences. The term edit distance is often used to refer specifically to Levenshtein distance. The… …   Wikipedia

  • Hamming distance — 3 bit binary cube for finding Hamming distance …   Wikipedia

  • Beta diversity — (β diversity) is a measure of biodiversity which works by comparing the species diversity between ecosystems or along environmental gradients. This involves comparing the number of taxa that are unique to each of the ecosystems.It is the rate of… …   Wikipedia

  • Коэффициент сходства — (также мера сходства, индекс сходства) безразмерный показатель, применяемый в биологии для количественного определения степени сходства биологических объектов. Также известен под названиями: мера ассоциации, мера подобия и др. более редкие… …   Википедия

  • Europe, history of — Introduction       history of European peoples and cultures from prehistoric times to the present. Europe is a more ambiguous term than most geographic expressions. Its etymology is doubtful, as is the physical extent of the area it designates.… …   Universalium

  • literature — /lit euhr euh cheuhr, choor , li treuh /, n. 1. writings in which expression and form, in connection with ideas of permanent and universal interest, are characteristic or essential features, as poetry, novels, history, biography, and essays. 2.… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”