Gain (information retrieval)

Gain (information retrieval)

The gain, also called improvement over random can be specified for a classifier and is an important measure to describe the performance of it.

Definition

In the following a random classifier is defined such, that it randomly predicts the same amount of either classes.

The gain is defined as described in the following:

Gain in Precision

The random precision of a classifier is defined as

r = frac{TP+FN}{TP+TN+FP+FN} = frac{Positives}{N}

where TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives respectively, positives is the number of positive instances in the target dataset and N is the size of the dataset.

The random precision defines the lowest baseline of a classifier.

And Gain is defined as

G = frac{precision}{r}

which gives a factor by which a classifier is better when compared to its random counterpart. A Gain of 1 would indicate a classifier that is not better than random. The larger the gain, the better.

Gain in Overall Accuracy

The accuracy of a classifier in general is defined as

Acc = frac{TP+TN}{TP+TN+FP+FN} = frac{Corrects}{N}

Here, the random accuracy of a classifier can be defined as

r = left ( frac{Positives}{N} ight ) ^2+ left ( frac{Negatives}{N} ight ) ^2=f(Positives)^2 + f(Negatives)^2

f(Positives) and f(Negatives) is the fraction of positive and negative classes in the dataset.

And again Gain is

G = frac{Acc}{r}

This time the gain is measured not only with respect to the prediction of a so called positive class, but with respect to the overall classifier ability to distinguish the two equally important classes.

Application

In Bioinformatics as an example, the gain is measured for methods that predict residue contacts in proteins.

See also

* Performance Measures a summary
* Accuracy
* Precision
* Recall = Sensitivity
* Specificity


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Information retrieval — This article is about information retrieval in general. For the fictional government department, see Brazil (film). Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for… …   Wikipedia

  • Adversarial information retrieval — (adversarial IR) is a topic in information retrieval that addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. Adversarial IR includes the… …   Wikipedia

  • Gain (disambiguation) — Gain may refer to: * Gain, an electronics and signal processing term * Gain (lasers), derivative of the logarithm of power with respect to length of propagation. * Gain (finance) * Gain (information retrieval) * Gain (novel), a novel by American… …   Wikipedia

  • Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… …   Wikipedia

  • Information overload — refers to excess amounts of information being provided, making the processing and absorbing tasks very difficult for the individual because we cannot see the validity behind the information (Yang, 2003 [Yang, C.C. (2003) Decision Support Systems …   Wikipedia

  • Cumulative gain — may refer to: discounted cumulative gain (information retrieval). cumulative elevation gain (running, cycling, and mountaineering) This disambiguation page lists articles associated with the same title. If an …   Wikipedia

  • information theory — the mathematical theory concerned with the content, transmission, storage, and retrieval of information, usually in the form of messages or data, and esp. by means of computers. [1945 50] * * * ▪ mathematics Introduction       a mathematical… …   Universalium

  • information system — Introduction       an integrated set of components for collecting, storing, processing, and communicating information (information science). Business firms, other organizations, and individuals in contemporary society rely on information systems… …   Universalium

  • information — noun ADJECTIVE ▪ accurate, correct, precise ▪ authoritative, credible, reliable ▪ erroneous, false, inaccurate …   Collocations dictionary

  • Discounted cumulative gain — (DCG) is a measure of effectiveness of a Web search engine algorithm or related applications, often used in information retrieval. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”