DFFITS

DFFITS

DFFITS is a diagnostic meant to show how influential a point is in a statistical regression. It was proposed in 1980.[1] It is defined as the change ("DFFIT"), in the predicted value for a point, obtained when that point is left out of the regression, "Studentized" by dividing by the estimated standard deviation of the fit at that point:

DFFITS = {\widehat{y_i} - \widehat{y_{i(i)}} \over s_{(i)} \sqrt{h_{ii}}}

where \widehat{y_i} and \widehat{y_{i(i)}} are the prediction for point i with and without point i included in the regression, s(i) is the standard error estimated without the point in question, and hii is the leverage for the point.

DFFITS is very similar to the externally Studentized residual, and is in fact equal to the latter times \sqrt{h_{ii}/(1-h_{ii})}.[2]

Since when the errors are Gaussian the externally Studentized residual is distributed as Student's t (with a number of degrees of freedom equal to the number of residual degrees of freedom minus one), DFFITS for a particular point will be distributed according to this same Student's t distribution multiplied by the leverage factor \sqrt{h_{ii}/(1-h_{ii})} for that particular point. Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely.

For a perfectly balanced experimental design (such as a factorial design or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points. This means that the DFFITS values will be distributed (in the Gaussian case) as \sqrt{p \over n-p} \approx \sqrt{p \over n} times a t variate. Therefore, the authors suggest investigating those points with DFFITS greater than 2\sqrt{p \over n}.

A similar measure of influence is Cook's distance.

Why DFFIT?

Previously when assessing our data before running a linear regression, we would look at outliers based on histograms and scatterplots. Both methods of assessment of data points where subjective and there was little way of knowing how much leverage each potential outlier had on our data. Enter DFFIT, DFBETA.


References

  1. ^ Belsley, David A.; Edwin Kuh, Roy E. Welsch (c1980). Regression diagnostics : identifying influential data and sources of collinearity. Wiley series in probability and mathematical statistics. New York: John Wiley & Sons. ISBN 0471058564. 
  2. ^ Montogomery, Douglas C.; Elizabeth A. Peck (c1992). "Appendix C.4". Introduction to Linear Regression Analysis (2nd ed. ed.). New York: John Wiley & Sons. pp. 504–505. ISBN 0-471-53387-4. 

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (D) — NOTOC D D distribution D module D D Agostino s K squared test D Alembert Euler condition D Alembert operator D Alembert s formula D Alembert s paradox D Alembert s principle Dagger category Dagger compact category Dagger symmetric monoidal… …   Wikipedia

  • Cook's distance — In statistics, Cook s distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook s distance can be used in several ways: to indicate… …   Wikipedia

  • Outline of regression analysis — In statistics, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X. The following outline is an overview and guide to the variety of… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”