Cook's distance

Cook's distance

In statistics, Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able obtain more data points.

Contents

Definition

Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance are considered to merit closer examination in the analysis.

D_i = \frac{ \sum_{j=1}^n (\hat Y_j\ - \hat Y_{j(i)})^2 }{p \ \mathrm{MSE}} .

The following is an algebraically equivalent expression

D_i = \frac{e_i^2}{p \ \mathrm{MSE}}\left[\frac{h_{ii}}{(1-h_{ii})^2}\right] .

In the above equations:

\hat Y_j \, is the prediction from the full regression model for observation j;
\hat Y_{j(i)}\, is the prediction for observation j from a refitted regression model in which observation i has been omitted;
h_{ii} \, is the i-th diagonal element of the hat matrix \mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T;
e_i \, is the crude residual (i.e., the difference between the observed value and the value fitted by the proposed model);
MSE is the mean square error of the regression model;
p is the number of fitted parameters in the model

Detecting highly influential observations using Cook's distance

There are different opinions regarding what cut-off values to use for spotting highly influential points. A simple operational guideline of Di > 1 has been suggested.[1] Others have indicated that Di > 4 / n, where n is the number of observations, might be used.[2]

Interpreting Cook's distance

Specifically Di can be interpreted as the distance one's estimates move within the confidence ellipsoid that represents a region of plausible values for the parameters.[clarification needed] This is shown by an alternative but equivalent representation of Cook's distance in terms of changes to the estimates of the regression parameters between the cases where the particular observation is either included or excluded from the regression analysis.

See also

References

  1. ^ Cook, R. D. & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.
  2. ^ Bollen, K. A. & Jackman, R. (1990). Regression diagnostics: An expository treatment of outliers and influential cases. In: J. Fox & J. Scott Long (eds.) Modern Methods of Data Analysis (pp. 257-91). Newbury Park: Sage.

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Cook — may refer to: Cook (profession) Cook (servant), a servant who cooks food for his or her employer Contents 1 Personal name 2 Place name …   Wikipedia

  • Cook Records — was a record label founded by Emory Cook (born 1913, San Francisco, died 2002). Cook was an audio engineer and inventor. From 1952 to 1966, Cook used his Sounds of our Times and Cook Laboratories record labels to demonstrate his philosophy about… …   Wikipedia

  • Cook Islands at the 2008 Summer Olympics — Cook Islands at the Olympic Games Flag of the Cook Islands – Flag bear …   Wikipedia

  • Cook Strait — …   Wikipedia

  • Cook Forest State Park — Pennsylvania State Park Natural Monument (IUCN III) Tom s Run …   Wikipedia

  • Cook's Swift — Scientific classification Kingdom: Animalia Phylum: Chordata Class: Aves …   Wikipedia

  • Cook Islands Māori — This article is about the language. For the people of the Cook Islands, the majority of whom are Cook Islands Māori, see Cook Islanders. Cook Islands Māori Māori Kūki Āirani Spoken in …   Wikipedia

  • Cook Inlet — This article is about the body of water. For other meanings, see Cook Inlet (disambiguation). Cook Inlet, showing Knik and Turnagain Arms Cook Inlet stretches 180 miles (290 km) from the Gulf of Alaska to Anchorage in south central Alaska …   Wikipedia

  • Cook Challenger — JC 1 Challenger Role Cabin Monoplane Manufacturer Cook Aircraft Corporation Designer John Cook First flight May 1969 Number built 4 The Cook JC 1 Challenger was a 1960s American cabin monoplane built by the Cook Aircraft Corporation …   Wikipedia

  • Cook , James — (1728–1778) British navigator and explorer Cook, the son of a Scottish farm laborer, was born at Marston in England. He was educated at the local village school and joined the Royal Navy as an able seaman in 1755. He became a ship s master in… …   Scientists

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”