# False discovery rate

False discovery rate

False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses (type I errors). [Benjamini, Y., and Hochberg Y. (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing". "Journal of the Royal Statistical Society. Series B (Methodological)" 57 (1), 289&ndash;300. [http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_hochberg1995.pdf School of Mathematical Sciences] ] It is a less conservative procedure for comparison, with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors. [Shaffer J.P. (1995) Multiple hypothesis testing, Annual Rview of Psychology 46:561-584, [http://dx.doi.org/10.1146/annurev.ps.46.020195.003021 Annual Reviews] ]

The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimate q-values rather than fixing a level at which to control the FDR.

Classification of "m" hypothesis tests

The following table defines some random variables related to the m hypothesis tests.

* $m_0$ is the number of true null hypotheses
* $m - m_0$ is the number of false null hypotheses
* $U$ is the number of true negatives
* $V$ is the number of false positives
* $T$ is the number of false negatives
* $S$ is the number of true positives
* $H_1 ... H_m$ the null hypotheses being tested
* In "m" hypothesis tests of which "m0" are true null hypotheses, "R" is an observable random variable, and "S", "T", "U", and "V" are unobservable random variables.

The false discovery rate is given by $mathrm\left\{E\right\}!left \left[frac\left\{V\right\}\left\{V+S\right\} ight \right] = mathrm\left\{E\right\}!left \left[frac\left\{V\right\}\left\{R\right\} ight \right]$ and one wants to keep this value below a threshold $alpha$.

($frac\left\{V\right\}\left\{R\right\}$ is defined to be 0 when $R = 0$)

Controlling procedures

Independent tests

The "Simes" procedure ensures that its expected value $mathrm\left\{E\right\}!left \left[ frac\left\{V\right\}\left\{V + S\right\} ight\right] ,$ is less than a given $alpha$ (Benjamini and Hochberg 1995). This procedure is valid when the $m$ tests are independent. Let $H_1 ldots H_m$ be the null hypotheses and $P_1 ldots P_m$ their corresponding p-values. Order these values in increasing order and denote them by $P_\left\{\left(1\right)\right\} ldots P_\left\{\left(m\right)\right\}$. For a given $alpha$, find the largest $k$ such that $P_\left\{\left(k\right)\right\} leq frac\left\{k\right\}\left\{m\right\} alpha.$

Then reject (i.e. declare positive) all $H_\left\{\left(i\right)\right\}$ for $i = 1, ldots, k$.

...Note, the mean $alpha$ for these $m$ tests is $frac\left\{alpha\left(m+1\right)\right\}\left\{2m\right\}$ which could be used as a rough FDR (RFDR) or "$alpha$ adjusted for $m$ indep. tests."

NOTE: The RFDR calculation shown here is not part of the Benjamini and Hochberg method.

Dependent tests

The "Benjamini and Yekutieli" procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest $k$ such that:

:$P_\left\{\left(k\right)\right\} leq frac\left\{k\right\}\left\{m cdot c\left(m\right)\right\} alpha$

* If the tests are independent: $c\left(m\right) = 1$ (same as above)
* If the tests are positively correlated: $c\left(m\right) = 1$
* If the tests are negatively correlated: $c\left(m\right) = sum _\left\{i=1\right\} ^m frac\left\{1\right\}\left\{i\right\}$

In the case of negative correlation, $c\left(m\right)$ can be approximated by using the Euler-Mascheroni constant

:$sum _\left\{i=1\right\} ^m frac\left\{1\right\}\left\{i\right\} approx ln\left(m\right) + gamma.$

Using RFDR above, an approximate FDR (AFDR) is the min(mean $alpha$) for $m$ dependent tests = RFDR / ( ln($m$)+ 0.57721...).

References

*cite journal
author = Benjamini, Yoav; Hochberg, Yosef
year = 1995
title = Controlling the false discovery rate: a practical and powerful approach to multiple testing
journal = Journal of the Royal Statistical Society, Series B (Methodological)
volume = 57
issue = 1
pages = 289–300
id = MathSciNet | id = 1325392
url = http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_hochberg1995.pdf

*cite journal
author = Benjamini, Yoav; Yekutieli, Daniel
year = 2001
title = The control of the false discovery rate in multiple testing under dependency
journal = Annals of Statistics
volume = 29
issue = 4
pages = 1165–1188
url = http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_yekutieli_ANNSTAT2001.pdf
id = MathSciNet | id = 1869245
doi = 10.1214/aos/1013699998

*cite journal
author = Storey, John D.
year = 2002
title = A direct approach to false discovery rates
journal = Journal of the Royal Statistical Society, Series B (Methodological)
volume = 64
issue = 3
pages = 479–498
id = MathSciNet | id = 1924302
doi = 10.1111/1467-9868.00346

*cite journal
author = Storey, John D.
year = 2003
title = The positive false discovery rate: A Bayesian interpretation and the "q"-value
journal = Annals of Statistics
volume = 31
issue = 6
pages = 2013–2035
id = MathSciNet | id = 2036398
doi = 10.1214/aos/1074290335
url = http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1074290335

* [http://strimmerlab.org/notes/fdr.html False Discovery Rate Analysis in R] - Lists links with popular R packages

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Familywise error rate — In statistics, familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors among all the hypotheses when performing multiple pairwise tests [Shaffer J. P. Multiple Hypothesis Testing, Annual Review of …   Wikipedia

• Per-comparison error rate — In statistics, per comparison error rate (PCER) is the probability of a result in the absence of any formal multiple hypothesis testing correction.[1] Typically, when considering a result under many hypotheses, some tests will give false… …   Wikipedia

• Discovery and development of nucleoside and nucleotide reverse-transcriptase inhibitors — (NRTIs and NtRTIs) began in the 1980s when the AIDS epidemic hit Western societies. NRTIs inhibit the reverse transcriptase (RT), an enzyme that controls the replication of the genetic material of the human immunodeficiency virus (HIV). The first …   Wikipedia

• Discovery Expedition — The expedition ship Discovery in the Antarctic, alongside the Great Ice Barrier The British National Antarctic Expedition, 1901–04, generally known as the Discovery Expedition, was the first official British exploration of the Antarctic regions… …   Wikipedia

• False etymology — A false etymology is an assumed or postulated etymology that current consensus among scholars of historical linguistics holds to be incorrect. Many false etymologies may be described as folk etymologies , the distinction being that folk… …   Wikipedia

• Discovery of human antiquity — Contents 1 Contemporary formulations 2 Historical debates 2.1 Theological debates …   Wikipedia

• Multiple comparisons — In statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously.[1] Errors in inference, including confidence intervals that fail to include their corresponding… …   Wikipedia

• Mass spectrometry software — is software used for data acquisition, analysis, or representation in mass spectrometry. Contents 1 MS/MS peptide identification 1.1 Database search algorithms 1.1.1 SEQUEST 1.1.2 …   Wikipedia

• Significance analysis of microarrays — (SAM) is a statistical technique, established in 2001 by Tusher, Tibshirani and Chu, for determining whether changes in gene expression are statistically significant. With the advent of DNA microarrays it is now possible to measure the expression …   Wikipedia

• List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia