Accuracy paradox

Accuracy paradox

The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. It may be better to avoid the accuracy metric in favor of other metrics such as precision and recall.

Accuracy is often the starting point for analyzing the quality of a predictive model, as well as an obvious criterion for prediction. Accuracy measures the ratio of correct predictions to the total number of cases evaluated. It may seem obvious that the ratio of correct predictions to cases should be a key metric. A predictive model may have high accuracy, but be useless.

In an example predictive model for an insurance fraud application, all cases that are predicted as high-risk by the model will be investigated. To evaluate the performance of the model, the insurance company has created a sample data set of 10,000 claims. All 10,000 cases in the validation sample have been carefully checked and it is known which cases are fraudulent. To analyze the quality of the model, the insurance uses the table of confusion. The definition of accuracy, the table of confusion for model M1Fraud, and the calculation of accuracy for model M1Fraud is shown below.

A(M) = (TN + TP) / (TN + FP + FN + TP)whereTN is the number of true negative casesFP is the number of false positive casesFN is the number of false negative casesTP is the number of true positive cases

"Formula 1: Definition of Accuracy"

Predicted Negative Predicted PositiveNegative Cases 9,700 150Positive Cases 50 100

"Table 1: Table of Confusion for Fraud Model M1Fraud."

A(M) = (9,700 + 100) / (9,700 + 150 + 50 + 100) = 98.0%

"Formula 2: Accuracy for model M1Fraud"

With an accuracy of 98.0% model M1Fraud appears to perform fairly well. The paradox lies in the fact that accuracy can be easily improved to 98.5% by always predicting "no fraud". The table of confusion and the accuracy for this trivial “always predict negative” model M2Fraud and the accuracy of this model are shown below.

Predicted Negative Predicted PositiveNegative Cases 9,850 0Positive Cases 150 0

"Table 2: Table of Confusion for Fraud Model M2Fraud."

A(M) = (9,850 + 0) / (9,850 + 0 + 150 + 0) = 98.5%

"Formula 3: Accuracy for model M2Fraud"

Model M2Fraudreduces the rate of inaccurate predictions from 2% to 1.5%. This is an apparent improvement of 25%. The new model M2Fraud shows fewer incorrect predictions and markedly improved accuracy, as compared to the original model M1Fraud, but is obviously useless.

The alternative model M2Fraud does not offer any value to the company for preventing fraud. The less accurate model is more useful than the more accurate model.

Model improvements should not be measured in terms of accuracy gains. It may be going too far to say that accuracy is irrelevant, but caution is advised when using accuracy in the evaluation of predictive models.

ee also

*Receiver operating characteristic for other measures of how good model predictions are.

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Accuracy and precision — In the fields of science, engineering, industry and statistics, accuracy is the degree of closeness of a measured or calculated quantity to its actual (true) value. Accuracy is closely related to precision, also called reproducibility or… …   Wikipedia

  • Paradox Interactive — Infobox Company company name = Paradox Interactive company company type = Public foundation = 1998 location city = Stockholm location country = Sweden location = key people = industry = Interactive entertainment products = revenue = operating… …   Wikipedia

  • Newcomb's paradox — Newcomb s paradox, also referred to as Newcomb s problem, is a thought experiment involving a game between two players, one of whom purports to be able to predict the future. Whether the problem is actually a paradox is disputed. Newcomb s… …   Wikipedia

  • Gibbs paradox — In statistical mechanics, a semi classical derivation of the entropy that doesn t take into account the indistinguishability of particles, yields an expression for the entropy which is not extensive (is not proportional to the amount of substance …   Wikipedia

  • EPR paradox — In quantum mechanics, the EPR paradox is a thought experiment which challenged long held ideas about the relation between the observed values of physical quantities and the values that can be accounted for by a physical theory. EPR stands for… …   Wikipedia

  • Navigation paradox — The Navigation paradox states that increased navigational precision may result in increased collision risk. In the case of ships and aircraft, the advent of Global Positioning System (GPS) navigation has enabled craft to follow navigational paths …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (A) — NOTOC A A Beautiful Mind A Beautiful Mind (book) A Beautiful Mind (film) A Brief History of Time (film) A Course of Pure Mathematics A curious identity involving binomial coefficients A derivation of the discrete Fourier transform A equivalence A …   Wikipedia

  • Archery — competition in West Germany in the early 1980s …   Wikipedia

  • Shotgun — For other uses, see Shotgun (disambiguation). A pump action Remington 870, two semi automatic …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”