# Resampling (statistics)

Resampling (statistics)

In statistics, resampling is any of a variety of methods for doing one of the following:
# Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknife) or drawing randomly with replacement from a set of data points (bootstrapping)
# Exchanging labels on data points when performing significance tests (permutation test, also called exact test, randomization test, or re-randomization test)
# Validating models by using random subsets (bootstrap, cross validation)

Common resampling techniques include bootstrapping, jackknifing and permutation tests.

Bootstrap

Bootstrapping is a statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. It may also be used for constructing hypothesis tests. It is often used as a robust alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors.

See also particle filter for the general theory of "Sequential Monte Carlo" methods, as well as details on some common implementations.

Jackknife

Jackknifing, which is similar to bootstrapping, is used in statistical inferencing to estimate the bias and standard error in a statistic, when a random sample of observations is used to calculate it. The basic idea behind the jackknife estimator lies in systematically recomputing the statistic estimate leaving out one observation at a time from the sample set. From this new set of "observations" for the statistic an estimate for the bias can be calculated and an estimate for the variance of the statistic.

Both methods estimate the variability of a statistic from the variability of that statistic between subsamples, rather than from parametric assumptions. The jackknife is a less general technique than the bootstrap, and explores the sample variation differently. However the jackknife is easier to apply to complex sampling schemes, such as multi-stage sampling with varying sampling weights, than the bootstrap.

The jackknife and bootstrap may in many situations yield similar results. But when used to estimate the standard error of a statistic, bootstrap gives slightly different results when repeated on the same data, whereas the jackknife gives exactly the same result each time (assuming the subsets to be removed are the same).

Cross-validation

Cross-validation is a statistical method for validating apredictive model. Subsets of the data are held out, to be usedas validating sets; a model is fit to the remaining data (a training set)and used to predict for the validation set. Averaging the quality ofthe predictions across the validation sets yields an overall measure ofprediction accuracy.

One form of cross-validation leaves out a single observation at a time;this is similar to the jackknife.Another, K-fold cross-validation,splits the data into K subsets; each is held out in turn as the validationset.

This avoids "self-influence". For comparison, in regression analysismethodssuch as linear regression, each y value draws the regression linetoward itself, making the predictions appear more accurate than theyreally are, on average. Cross-validation applied to linear regressionpredicts the y value for each observation without using that observation.

This is often used for deciding how manypredictor variables to use in regression. Without cross-validation,adding predictors always reduces the residual sum of squares (or possiblyleaves it unchanged). In contrast,the cross-validated mean-square error will tend to decrease if valuablepredictors are added, but increase if worthlesspredictors are added.

Permutation tests

A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which a reference distribution is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. In other words, the method by which treatments are allocated to subjects in an experimental design is mirrored in the analysis of that design. If the labels are exchangeable under the null hypothesis, then the resulting tests yield exact significance levels. Confidence intervals can then be derived from the tests. The theory has evolved from the works of R.A. Fisher and E.J.G. Pitman in the 1930s.

To illustrate the basic idea of a permutation test, suppose we have two groups $A$ and $B$ whose sample meansare and ,and that we want to test, at 5% significance level, whether they come from the same distribution.Let $n_\left\{A\right\}$ and $n_\left\{B\right\}$ be the samplesize corresponding to each group.The permutation test is designed todetermine whether the observed differencebetween the sample means is large enoughto reject the null hypothesis H$_\left\{0\right\}$ thatthe two groups have identical probability distribution.

The test proceeds as follows.First, the difference in means between the two samples is calculated: this is the observed value of the test statistic, T(obs). Then the observations of groups $A$ and $B$ are pooled.

Next, the difference in sample means is calculated and recorded for every possible way of dividing these pooled values into two groups of size $n_\left\{A\right\}$ and $n_\left\{B\right\}$ (i.e., for every permutation of the group labels A and B). The set of these calculated differences is the exact distribution of possible differences under the null hypothesis that group label does not matter.

The one-sided p-value of the test is calculated as the proportion of sampled permutations where the difference in means was greater than or equal to T(obs).The two-sided p-value of the test is calculated as the proportion of sampled permutations where the absolute difference was greater than or equal to ABS(T(obs)).

If the only purpose of the test is reject or not reject the null hypothesis, we can as an alternative sort the recorded differences, and then observe if T(obs) is contained within the middle 95% of them. If it does not, we reject the hypothesis of identical probability curves at the 5% significant level.

Relation to parametric tests

Permutation tests are a subset of non-parametric statistics. The basic premise is to use only the assumption that it is possible that all of the treatment groups are equivalent, and that every member of them is the same before sampling began (i.e. the slot that they fill is not differentiable from other slots before the slots are filled). From this, one can calculate a statistic and then see to what extent this statistic is special by seeing how likely it would be if the treatment assignments had been jumbled.

In contrast to permutation tests, the reference distributions for many popular "classical" statistical tests, such as the t-test, f-test, z-test and chi-squared test, are obtained from theoretical probability distributions.
Fisher's exact test is a commonly used test for evaluating the association between two dichotomous variables, that is a permutation test. When sample sizes are large, the Pearson's chi-square test will give accurate results, but for small samples the chi-square reference distribution can't be assumed to give a correct description of the probability distribution of the test statistic, and in this situation the use of Fisher’s exact test becomes more appropriate. A rule of thumb is that the expected count in each cell of the table should be greater than 5 before Pearson's chi-squared test is used.

Permutation tests exist in many situations where parametric tests do not. For example, when deriving an optimal test when losses are proportional to the size of an error rather than its square. All simple and many relatively complex parametric tests have a corresponding permutation test version that is defined by using the same test statistic as the parametric test, but obtains the p-value from the sample-specific permutation distribution of that statistic, rather than from the theoretical distribution derived from the parametric assumption. For example, it is possible in this manner to construct a permutation t-test, a permutation chi-squared test of association, a permutation version of Aly's test for comparing variances and so on.

The major down-side to permutation tests are that

* They can be computationally intensive, and may require "custom" code for difficult-to-calculate statistics. This must be rewritten for every case.
* They are primarily used to provide a p-value. The inversion of the test to get confidence regions/intervals requires even more computation.

Examples

Permutation tests exist for any test statistic, regardless of whether or not its distribution is known. Thus one is always free to choose the statistic which best discriminates between hypothesis and alternative and which minimizes losses.

Permutation tests can be used for analyzing unbalanced designs (http://tbf.coe.wayne.edu/jmasm/vol1_no2.pdf) and for combining dependent tests on mixtures of categorical, ordinal, and metric data (Pesarin, 2001).

Before the 1980s, the burden of creating the reference distribution was overwhelming except for data sets with small sample sizes. But since the 1980s, the confluence of cheap fast computers and the development of new sophisticated path algorithms applicable in special situations, made the application of permutation test methods practical for a wide range of problems, and initiated the addition of exact-test options in the main statistical software packages and the appearance of specialized software for performing a wide range of uni- and multi-variable exact tests and computing test-based "exact" confidence intervals.

Limitations

An important assumption behind a permutation test is that the observations are exchangeable under the null hypothesis. An important consequence of this assumption is that tests of difference in location (like a permutation t-test) require equal variance. In this respect, the permutation t-test shares the same weakness as the classical Student’s t-test. A third alternative in this situation is to use a bootstrap-based test. Good (2000) explains the difference between permutation tests and bootstrap tests the following way: "Permutations test hypotheses concerning distributions; bootstraps tests hypotheses concerning parameters. As a result, the bootstrap entails less-stringent assumptions." Of course, bootstrap tests are not exact.

Monte Carlo testing

An asymptotically equivalent permutation test can be created when there are too many possible orderings of the data to conveniently allow complete enumeration. This is done by generating the reference distribution by Monte Carlo sampling, which takes a small (relative to the total number of permutations) random sample of the possible replicates.
The realization that this could be applied to any permutation test on any dataset was an important breakthrough in the area of applied statistics. The earliest known reference to this approach is Dwass (1957) [Meyer Dwass, "Modified Randomization Tests for Nonparametric Hypotheses", "The Annals of Mathematical Statistics", 28:181-187, 1957.] . This type of permutation test is known under various names: "approximate permutation test", "Monte Carlo permutation tests" or "random permutation tests". [Cite journal
author = Thomas E. Nichols, Andrew P. Holmes
url = http://www.fil.ion.ucl.ac.uk/spm/doc/papers/NicholsHolmes.pdf
title = Nonparametric Permutation Tests For Functional Neuroimaging: A Primer with Examples
journal = Human Brain Mapping
volume = 15
pages = 1-25
year = 2001
] However, it should be noted that all permutation tests are theoretically the same test, so it is important to understand that those different names only refer to one small and unimportant practical difference: to what level of detail the p-value is calculated.

The necessary size of the Monte Carlo sample depends on the need for accuracy of the test. If one merely wants to know if the p-value is significant, sometimes few as 400 rearrangements is sufficient to generate a reliable answer. However, for most scientific applications the required size is much higher. For observed p=0.05, the accuracy from 10,000 random permutations is 0.0056 and for 50,000 it is 0.0025. For observed p=0.10, the corresponding accuracy is 0.0077 and 0.0035. Accuracy is defined from the binomial 99% confidence interval: p +/- accuracy

ee also

* Particle filter
* Random permutation
* Nonparametric statistics

Bibliography

Introductory statistics

*Good, P. (2005) Introduction to Statistics Through Resampling Methods and R/S-PLUS. Wiley. ISBN 0-471-71575-1

*Good, P. (2005) Introduction to Statistics Through Resampling Methods and Microsoft Office Excel. Wiley. ISBN 0-471-73191-9

* Hesterberg, T. C., D. S. Moore, S. Monaghan, A. Clipson, and R. Epstein (2005): [http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf Bootstrap Methods and Permutation Tests] , [http://www.insightful.com/Hesterberg/bootstrap software] .

Resampling methods

*Good, P. (2006) Resampling Methods. 3rd Ed. Birkhauser.

Bootstrapping

*Bradley Efron (1979). "Bootstrap methods: Another look at the jackknife", "The Annals of Statistics", 7, 1-26.
*Bradley Efron (1981). "Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods", "Biometrika", 68, 589-599.
*Bradley Efron (1982). "The jackknife, the bootstrap, and other resampling plans", In "Society of Industrial and Applied Mathematics CBMS-NSF Monographs", 38.
* P. Diaconis, Bradley Efron (1983), "Computer-intensive methods in statistics," "Scientific American", May, 116-130.
* Bradley Efron, Robert J. Tibshirani, (1993). "An introduction to the bootstrap", New York: Chapman & Hall, [http://lib.stat.cmu.edu/S/bootstrap.funs software] .
*Davison, A. C. and Hinkley, D. V. (1997): Bootstrap Methods and their Applications, [http://statwww.epfl.ch/davison/BMA/library.html software] .
*Mooney, C Z & Duval, R D (1993). Bootstrapping. A Nonparametric Approach to Statistical Inference. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-095. Newbury Park, CA: Sage.
* Simon, J. L. (1997): [http://www.resample.com/content/text/index.shtml Resampling: The New Statistics] .

Permutation test

Original references:
*R. A. Fisher, "The Design of Experiment", New York: Hafner, 1935.
*Pitman, E. J. G., "Significance tests which may be applied to samples from any population", "Royal Statistical Society Supplement", 1937; 4: 119-130 and 225-32 (parts I and II).
*Pitman, E. J. G., "Significance tests which may be applied to samples from any population. Part III. The analysis of variance test", "Biometrika", 1938; 29: 322-335.Modern references:
*E. S. Edgington, "Randomization tests", 3rd ed. New York: Marcel-Dekker, 1995.
*Phillip I. Good, "Permutation, Parametric and Bootstrap Tests of Hypotheses", 3rd ed., Springer, 2005. ISBN 0-387-98898-X
* Good, P. (2002) Extensions of the concept of exchangeability and their applications, "J. Modern Appl. Statist. Methods", 1:243-247.
*Lunneborg, Cliff. "Data Analysis by Resampling", Duxbury Press, 1999. ISBN 0-534-22110-6.
*Pesarin, F. 2001. "Multivariate Permutation Tests", Wiley.
*Welch, W. J., Construction of permutation tests, "Journal of American Statistical Association", 85:693-698, 1990.Computational methods:
*Mehta, C. R. and Patel, N. R. (1983). 2A network algorithm for performing Fisher’s exact test in r x c contingency tables", "J. Amer. Statist. Assoc", 78(382):427–434.
*Metha, C. R., Patel, N. R. and Senchaudhuri, P. (1988). "Importance sampling for estimating exact probabilities in permutational inference", "J. Am. Statist. Assoc.", 83(404):999–1005.

References

[Current research on permutation tests]
* [http://people.revoledu.com/kardi/tutorial/Bootstrap/index.html Bootstrap Sampling tutorial]
* Hesterberg, T. C., D. S. Moore, S. Monaghan, A. Clipson, and R. Epstein (2005): [http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf Bootstrap Methods and Permutation Tests] , [http://www.insightful.com/Hesterberg/bootstrap software] .
* Moore, D. S., G. McCabe, W. Duckworth, and S. Sclove (2003): [http://bcs.whfreeman.com/pbs/cat_140/chap18.pdf Bootstrap Methods and Permutation Tests]
* Simon, J. L. (1997): [http://www.resample.com/content/text/index.shtml Resampling: The New Statistics] .
* Yu, Chong Ho (2003): [http://PAREonline.net/getvn.asp?v=8&n=19 Resampling methods: concepts, applications, and justification. Practical Assessment, Research & Evaluation, 8(19)] . "(statistical bootstrapping)"
* [http://www.ericdigests.org/1993/marriage.htm Resampling: A Marriage of Computers and Statistics (ERIC Digests)]

oftware

* [http://www.statistics101.net Statistics101: Resampling, Bootstrap, Monte Carlo Simulation program]

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Statistics — is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. Also with prediction and forecasting based on data. It is applicable to a wide variety of academic disciplines, from the… …   Wikipedia

• List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

• Non-parametric statistics — In statistics, the term non parametric statistics has at least two different meanings: The first meaning of non parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:… …   Wikipedia

• Cross-validation (statistics) — Cross validation, sometimes called rotation estimation, is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and… …   Wikipedia

• Computational statistics — Statistics algorithms were one of the first uses of modern computers. Computational statistics, or statistical computing, is the interface between statistics and computer science. It is the area of computational science (or scientific computing)… …   Wikipedia

• Founders of statistics — Statistics is the theory and application of mathematics to the scientific method including hypothesis generation, experimental design, sampling, data collection, data summarization, estimation, prediction and inference from those results to the… …   Wikipedia

• Bootstrapping (statistics) — In statistics, bootstrapping is a modern, computer intensive, general purpose approach to statistical inference, falling within a broader class of resampling methods.Bootstrapping is the practice of estimating properties of an estimator (such as… …   Wikipedia

• Mediation (statistics) — A simple statistical mediation model. In statistics, a mediation model is one that seeks to identify and explicate the mechanism that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of …   Wikipedia

• Julian Lincoln Simon — (born February 12, 1932; died February 8, 1998 in Chevy Chase, Marylandcite web |url=http://query.nytimes.com/gst/fullpage.html?res=950DE0DD153CF931A25751C0A96E958260 |title=Julian Simon, 65, Optimistic Economist, Dies |last=Gilpin |first=Kenneth …   Wikipedia

• Julian Lincoln Simon — (* 12. Februar 1932; † 8. Februar 1998) war Professor der Wirtschaftswissenschaften an der University of Maryland und Senior Fellow beim Cato Institute. Simon war Verfasser einer Vielzahl von Büchern und Artikeln, am bekanntesten sind seine Werke …   Deutsch Wikipedia