Cook's distance

In statistics, Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able obtain more data points.

1 Definition
2 Detecting highly influential observations using Cook's distance
3 Interpreting Cook's distance
4 See also
5 References
6 External links

Definition

Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance are considered to merit closer examination in the analysis.

$D_i = \frac{ \sum_{j=1}^n (\hat Y_j\ - \hat Y_{j(i)})^2 }{p \ \mathrm{MSE}} .$

The following is an algebraically equivalent expression

$D_i = \frac{e_i^2}{p \ \mathrm{MSE}}\left[\frac{h_{ii}}{(1-h_{ii})^2}\right] .$

In the above equations:

$\hat Y_j \,$ is the prediction from the full regression model for observation j;

$\hat Y_{j(i)}\,$ is the prediction for observation j from a refitted regression model in which observation i has been omitted;

$h_{ii} \,$ is the i-th diagonal element of the hat matrix $\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T$ ;

$e_i \,$ is the crude residual (i.e., the difference between the observed value and the value fitted by the proposed model);

MSE is the mean square error of the regression model;

p

is the number of fitted parameters in the model

Detecting highly influential observations using Cook's distance

There are different opinions regarding what cut-off values to use for spotting highly influential points. A simple operational guideline of $D i > 1$ has been suggested.^[1] Others have indicated that $D i > 4 / n$ , where $n$ is the number of observations, might be used.^[2]

Interpreting Cook's distance

Specifically $D i$ can be interpreted as the distance one's estimates move within the confidence ellipsoid that represents a region of plausible values for the parameters.^{[clarification needed]} This is shown by an alternative but equivalent representation of Cook's distance in terms of changes to the estimates of the regression parameters between the cases where the particular observation is either included or excluded from the regression analysis.

References

^ Cook, R. D. & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.
^ Bollen, K. A. & Jackman, R. (1990). Regression diagnostics: An expository treatment of outliers and influential cases. In: J. Fox & J. Scott Long (eds.) Modern Methods of Data Analysis (pp. 257-91). Newbury Park: Sage.

Cook, R. Dennis (Feb 1977). "Detection of Influential Observations in Linear Regression". Technometrics (American Statistical Association) 19 (1): 15–18. doi:10.2307/1268249. JSTOR 1268249. MR 0436478.
Cook, R. Dennis (Mar 1979). "Influential Observations in Linear Regression". Journal of the American Statistical Association (American Statistical Association) 74 (365): 169–174. doi:10.2307/2286747. JSTOR 2286747. MR 0529533.
Lorenz, Frederick O. (Apr 1987). "Teaching about Influence in Simple Regression". Teaching Sociology (American Sociological Association) 15 (2): 173–177. doi:10.2307/1318032. JSTOR 1318032.
Chatterjee, S.; Hadi, A.S (2006). Regression analysis by example. John Wiley and Sons. ISBN 0471746967.

External links

Procedure for calculating Cook's distance

Categories:

Regression diagnostics
Statistical outliers
Statistical distance measures

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

Cook — may refer to: Cook (profession) Cook (servant), a servant who cooks food for his or her employer Contents 1 Personal name 2 Place name … Wikipedia
Cook Records — was a record label founded by Emory Cook (born 1913, San Francisco, died 2002). Cook was an audio engineer and inventor. From 1952 to 1966, Cook used his Sounds of our Times and Cook Laboratories record labels to demonstrate his philosophy about… … Wikipedia
Cook Islands at the 2008 Summer Olympics — Cook Islands at the Olympic Games Flag of the Cook Islands – Flag bear … Wikipedia
Cook Strait — … Wikipedia
Cook Forest State Park — Pennsylvania State Park Natural Monument (IUCN III) Tom s Run … Wikipedia
Cook's Swift — Scientific classification Kingdom: Animalia Phylum: Chordata Class: Aves … Wikipedia
Cook Islands Māori — This article is about the language. For the people of the Cook Islands, the majority of whom are Cook Islands Māori, see Cook Islanders. Cook Islands Māori Māori Kūki Āirani Spoken in … Wikipedia
Cook Inlet — This article is about the body of water. For other meanings, see Cook Inlet (disambiguation). Cook Inlet, showing Knik and Turnagain Arms Cook Inlet stretches 180 miles (290 km) from the Gulf of Alaska to Anchorage in south central Alaska … Wikipedia
Cook Challenger — JC 1 Challenger Role Cabin Monoplane Manufacturer Cook Aircraft Corporation Designer John Cook First flight May 1969 Number built 4 The Cook JC 1 Challenger was a 1960s American cabin monoplane built by the Cook Aircraft Corporation … Wikipedia
Cook , James — (1728–1778) British navigator and explorer Cook, the son of a Scottish farm laborer, was born at Marston in England. He was educated at the local village school and joined the Royal Navy as an able seaman in 1755. He became a ship s master in… … Scientists

Academic Dictionaries and Encyclopedias

Cook's distance

Contents

Definition

Detecting highly influential observations using Cook's distance

Interpreting Cook's distance

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Cook's distance

Contents

Definition

Detecting highly influential observations using Cook's distance

Interpreting Cook's distance

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link