Matthews correlation coefficient

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and −1 an inverse prediction. The statistic is also known as the phi coefficient. MCC is related to the chi-square statistic for a 2×2 contingency table

$|\text{MCC}| = \sqrt{\frac{\chi^2}{n}}$

where n is the total number of observations.

While there is no perfect way of describing the confusion matrix of true and false positives and negatives by a single number, the Matthews correlation coefficient is generally regarded as being one of the best such measures. Other measures, such as the proportion of correct predictions (also termed accuracy), are not useful when the two classes are of very different sizes. For example, assigning every object to the larger set achieves a high proportion of correct predictions, but is not generally a useful classification.

The MCC can be calculated directly from the confusion matrix using the formula:

$\text{MCC} = \frac{ TP \times TN - FP \times FN } {\sqrt{ (TP + FP) ( TP + FN ) ( TN + FP ) ( TN + FN ) } }$

In this equation, TP is the number of true positives, TN the number of true negatives, FP the number of false positives and FN the number of false negatives. If any of the four sums in the denominator is zero, the denominator can be arbitrarily set to one; this results in a Matthews correlation coefficient of zero, which can be shown to be the correct limiting value.

References

Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C. A. F.; Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16, 412–424. [1]
Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442–451
Carugo, O., Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots. BMC Bioinformatics 2007. [2]

Categories:

Machine learning
Information retrieval
Statistical classification
Computational chemistry
Cheminformatics
Bioinformatics
Statistical ratios
Summary statistics for contingency tables

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Matthews Correlation Coefficient — The Matthews Correlation Coefficient is used in machine learning as a measure of the quality of binary (two class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which … Wikipedia
Phi coefficient — In statistics, the phi coefficient (also referred to as the mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables introduced by Karl Pearson[1]. This measure is similar to the Pearson… … Wikipedia
MCC — may refer to:Business, government and non profit organizations; Business related Topics * Merchant Category Code, a code assigned to companies accepting credit cards.; Business Corporations * Mitsubishi Chemical Corporation * Manhattan… … Wikipedia
Classification rule — See also: Statistical classification and Classification in machine learning Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements… … Wikipedia
Ronald Fisher — R. A. Fisher Born 17 February 1890(1890 02 17) East Finchley, London … Wikipedia
Intelligence quotient — IQ redirects here. For other uses, see IQ (disambiguation). Intelligence quotient Diagnostics An example of one kind of IQ test item, modeled after items in the Raven s Progressive Matrices test … Wikipedia
Maurice Kendall — Sir Maurice George Kendall, FBA (6 September 1907 – 29 March 1983) was a British statistician, widely known for his contribution to statistics. The Kendall tau rank correlation is named after him. Contents 1 Education and early life 2 Work in… … Wikipedia
16PF Questionnaire — The Sixteen Personality Factor Questionnaire (or 16PF) [16PF is a trademark of the Institute for Personality and Ability Testing, see http://ipat.com IPAT.com.] , is a multiple choice personality questionnaire which was scientifically developed… … Wikipedia
Randomized controlled trial — Flowchart of four phases (enrollment, intervention allocation, follow up, and data analysis) of a parallel randomized trial of two groups, modified from the CONSORT (Consolidated Standards of Reporting Trials) 2010 Statement[1] … Wikipedia
HYPERSONS — On appelle hypersons ou ondes hypersonores les ondes acoustiques ou élastiques cohérentes dont la fréquence est supérieure à 109 Hz. La structure périodique de la matière et l’ordre de grandeur des dimensions atomiques limitent leur fréquence… … Encyclopédie Universelle

Academic Dictionaries and Encyclopedias

Matthews correlation coefficient

See Also

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Matthews correlation coefficient

See Also

References

Look at other dictionaries:

Share the article and excerpts

Direct link