# Binary classification

Binary classification

Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. Some typical binary classification tasks are

* medical testing to determine if a patient has certain disease or not (the classification property is the disease)
* quality control in factories; i.e. deciding if a new product is good enough to be sold, or if it should be discarded (the classification property is being good enough)
* deciding whether a page or an article should be in the result set of a search or not (the classification property is the relevance of the article - typically the presence of a certain word in it)

Classification in general is one of the problems studied in computer science, in order to automatically learn classification systems; some methods suitable for learning binary classifiers include the decision trees, Bayesian networks, support vector machines, and neural networks.

Sometimes, classification tasks are trivial. Given 100 balls, some of them red and some blue, a human with normal color vision can easily separate them into red ones and blue ones. However, some tasks, like those in practical medicine, and those interesting from the computer science point-of-view, are far from trivial, and produce also faulty results.

Hypothesis testing

In traditional statistical hypothesis testing, the tester starts with a null hypothesis and an alternative hypothesis, performs an experiment, and then decides whether to reject the null hypothesis in favour of the alternative.

A positive or statistically significant result is one which rejects the null hypothesis. Doing this when the null hypothesis is in fact true - a false positive - is a Type I error; doing this when the null hypothesis is false is a true positive.

A negative or not statistically significant result is one which does not reject the null hypothesis. Doing this when the null hypothesis is in fact false - a false negative - is a Type II error; doing this when the null hypothesis is true is a true negative.

Evaluation of binary classifiers

:"See also: sensitivity and specificity"To measure the performance of a medical test, the concepts sensitivity and specificity are often used; these concepts are readily usable for the evaluation of any binary classifier. Say we test some people for the presence of a disease. Some of these people have the disease, and our test says they are positive. They are called "true positives" (TP). Some have the disease, but the test claims they don't. They are called "false negatives" (FN). Some don't have the disease, and the test says they don't - "true negatives" (TN). Finally, we might have healthy people who have a positive test result "false positives" (FP). Thus, the number of true positives, false negatives, true negatives, and false positives add up to 100% of the set.

Sensitivity (TPR) is the proportion of people that tested positive of all the positive people tested; that is (true positives) / (true positives + false negatives). It can be seen as "the probability that the test is positive given that the patient is sick". The higher the sensitivity, the fewer real cases of diseases go undetected (or, in the case of the factory quality control, the fewer faulty products go to the market).

Specificity (TNR) is the proportion of people that tested negative of all the negative people tested; that is (true negatives) / (true negatives + false positives). As with sensitivity, it can be looked at as "the probability that the test is negative given that the patient is not sick". The higher the specificity, the fewer healthy people are labeled as sick (or, in the factory case, the less money the factory loses by discarding good products instead of selling them).

The relationship between sensitivity and specificity, as well as the performance of the classifier, can be visualized and studied using the ROC curve.

In theory, sensitivity and specificity are independent in the sense that it is possible to achieve 100 % in both (for instance, the human classifying the red and blue balls most likely does). In practice, there often is a trade-off, and you can't achieve both. This is because much of the characteristics identified to determine whether a sample gives a positive or negative test may not be as obvious as red or blue colors; they are usually an array. A widely known instance is such as the indicator for checking obesity: Body Mass Index. If high sensitivity is desired, a very low threshold could be set which would consequently declare many as obese. Therefore, the number of true positives increase and false negatives decrease. That is, the sensitivity increases. The disadvantage would then be obvious since the number of false positives also increase due to incorrect testing of some normal people as obese. As a result, specificity decreases.

In addition to sensitivity and specificity, the performance of a binary classification test can be measured with positive (PPV) and negative predictive values (NPV). These are possibly more intuitively clear: the positive prediction value answers the question "how likely it is that I really have the disease, given that my test result was positive?". It is calculated as (true positives) / (true positives + false positives); that is, it is the proportion of true positives out of all positive results. (The negative prediction value is the same, but for negatives, naturally.)

One should note, though, one important difference between the two concepts. That is, sensitivity and specificity are independent from the population in the sense that they don't change depending on what the proportion of positives and negatives tested are. Indeed, you can determine the sensitivity of the test by testing "only" positive cases. However, the prediction values are dependent on the population.

Example

As an example, say that you have a test for a disease with 99 % sensitivity and 99 % specificity. Say you test 2000 people, and 1000 of them are sick and 1000 of them are healthy. You are likely to get about 990 true positives, 990 true negatives, and 10 of false positives and negatives each. The positive and negative prediction values would be 99 %, so the people can be quite confident about the result.

Say, however, that of the 2000 people only 100 are really sick. Now you are likely to get 99 true positives, 1 false negative, 1881 true negatives and 19 false positives. Of the 19+99 people tested positive, only 99 really have the disease - that means, intuitively, that given that your test result is positive, there's only 84 % chance that you really have the disease. On the other hand, given that your test result is negative, you can really be reassured: there's only 1 chance in 1881, or 0.05% probability, that you have the disease despite of your test result.

Measuring a classifier with sensitivity and specificity

Suppose you are training your own classifier, and you wish to measure its performance using the well-accepted sensitivity and specificity. It may be instructive to compare your classifier to a random classifier that flips a coin based on the prevalence of a disease. Suppose that the probability a person has the disease is $p$ and the probability that they do not is $q=1-p$. Suppose then that we have a random classifier that guesses that you have the disease with that same probability $p$ and guesses you do not with the same probability $q$.

The probability of a true positive is the probability that you have the disease and the random classifier guesses that you do, or $p^2$. With similar reasoning, the probability of a false negative is $pq$. From the definitions above, the sensitivity of this classifier is $p^2/\left(p^2+pq\right)=p$. With more similar reasoning, we can calculate the specificity as $q^2/\left(q^2+pq\right)=q$.

So, while the measure itself is independent of disease prevalence, the performance of this random classifier depends on disease prevalence. Your classifier may have performance that is like this random classifier, but with a better-weighted coin (higher sensitivity and specificity). So, these measures may be influenced by disease prevalence.

ee also

* binary
* prosecutor's fallacy
* Examples of Bayesian inference

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Classification rule — See also: Statistical classification and Classification in machine learning Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements… …   Wikipedia

• Binary star — For the hip hop group, see Binary Star (band). Hubble image of the …   Wikipedia

• Classification of Romance languages — The internal classification of the Romance languages is a complex and sometimes controversial topic which may not have a unique answer. Several classifications have been proposed, based on different criteria. Romance Geographic distribution …   Wikipedia

• binary opposition — noun phonetics : one of a number of pairs of diametrically opposed characteristics (as voicedness or voicelessness) taken as a basis for the classification of speech sounds * * * Ling. a relation between the members of a pair of linguistic items …   Useful english dictionary

• Statistical classification — See also: Pattern recognition See also: Classification test In machine learning, statistical classification is the problem of identifying the sub population to which new observations belong, where the identity of the sub population is unknown, on …   Wikipedia

• One-class classification — tries to distinguish one class of objects from all other possible objects, by learning from a training set containing only the objects of that class. This is different from and more difficult than the traditional classification problem, which… …   Wikipedia

• Cryptography Classification — Classification of Cryptographic Systems, Ciphers, Algorithms1. This classification scheme has to do with the means of implementation 1.1. Classical : Pre Computer Era involvements of (e.g. before 1950) 1.2. Modern : Computer Era. Usually implies… …   Wikipedia

• Stellar classification — In astronomy, stellar classification is a classification of stars based on their spectral characteristics. The spectral class of a star is a designated class of a star describing the ionization of its chromosphere, what atomic excitations are… …   Wikipedia

• X-ray binary — Artist s impression of an X ray Binary X ray binaries are a class of binary stars that are luminous in X rays. The X rays are produced by matter falling from one component, called the donor (usually a relatively normal star) to the other… …   Wikipedia

• ADE classification — The simply laced Dynkin diagrams classify diverse mathematical objects. In mathematics, the ADE classification (originally A D E classifications) is the complete list of simply laced Dynkin diagrams or other mathematical objects satisfying… …   Wikipedia