- Stepwise regression
In
statistics , stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure. [Hocking, R. R. (1976) "The Analysis and Selection of Variables in Linear Regression," "Biometrics, 32."] [Draper, N. and Smith, H. (1981) "Applied Regression Analysis, 2d Edition," New York: John Wiley & Sons, Inc.] [SAS Institute Inc. (1989) "SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2," Cary, NC: SAS Institute Inc.] Usually, this takes the form of a sequence ofF-test s, but other techniques are possible, such ast-test s, adjustedR-square ,Akaike information criterion ,Bayesian information criterion ,Mallows' Cp , orfalse discovery rate .s.
For additional consideration, when planning an
experiment ,computer simulation , or scientific survey to collectdata for this model, one must keep in mind the number ofparameter s, P, toestimate and adjust thesample size accordingly. For Kvariable s,P = 1(Start)+ K(Stage I)+ (K2-K)/2(Stage II)+ 3K(Stage III)= .5K2+ 3.5K + 1.
For K<17, an efficient
design of experiments exists for this type of model, a Box-Behnken design, [ [http://www.itl.nist.gov/div898/handbook/pri/section3/pri3362.htm Box-Behnken designs] from a [http://www.itl.nist.gov/div898/handbook/ handbook on engineering statistics] at NIST] augmented with positive and negative axial points of length min(2,sqrt(int(1.5+K/4))), plus point(s) at the origin. There are more efficient designs, requiring fewer runs, even for K>16.]The main approaches are:
a) Forward selection, which involves starting with no variables in the model, trying out the variables one by one and including them if they are 'statistically significant'.
b) Backward elimination, which involves starting with all candidate variables and testing them one by one for statistical significance, deleting any that are not significant.
c) Methods that are a combination of the above, testing at each stage for variables to be included or excluded.
A widely used algorithm was proposed by Efroymson (1960). [Efroymson, MA (1960) "Multiple regression analysis." In Ralston, A. and Wilf, HS, editors, "Mathematical Methods for Digital Computers." Wiley.] This is an automatic procedure for statistical
model selection in cases where there are a large number of potential explanatory variables, and no underlying theory on which to base the model selection. The procedure is used primarily inregression analysis , though the basic approach is applicable in many forms of model selection. This is a variation on forward selection. At each stage in the process, after a new variable is added, a test is made to check if some variables can be deleted without appreciably increasing theresidual sum of squares (RSS). The procedure terminates when the measure is (locally) maximized, or when the available improvement falls below some critical value.Stepwise regression procedures are used in
data mining , but are controversial. Several points of criticism have been made.1. A sequence of F-tests is often used to control the inclusion or exclusion of variables, but these are carried out on the same data and so there will be problems of
multiple comparison s for which many correction criteria have been developed.2. It is difficult to interpret the p-values associated with these tests, since each is conditional on the previous tests of inclusion and exclusion (see "dependent tests" in
false discovery rate ).3. The tests themselves are biased, since they are based on the same data. (Rencher and Pun, 1980, Copas, 1983). [Rencher, A.C. and Pun, F.C. (1980) "Inflation of R² in Best Subset Regression." "Technometrics. 22.49-54."] [Copas, J.B. (1983) "Regression, prediction and shrinkage." "J. Roy. Statist. Soc. Series B. 45." 311-354.] Wilkinson and Dalall (1981) [Wilkinson, L. and Dallal, G.E. (1981) "Tests of significance in forward selection regression with an F-to enter stopping rule." "Technometrics. 23." 377-380.] computed percentage points of the multiple correlation coefficient by simulation and showed that a final regression obtained by forward selection, said by the F-procedure to be significant at 0.1% was in fact only significant at 5%.
Critics regard the procedure as a paradigmatic example of
data dredging , intense computation often being inadequate substitute for subject area expertise.ee also
*
Backward regression
*Forward regression
*Logistic regression
*Occam's Razor References
Wikimedia Foundation. 2010.