Mallows' Cp

In statistics, Mallows' C_p,^[1] named for Colin L. Mallows, is used to assess the fit of a regression model that has been estimated using ordinary least squares. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the goal is to find the best model involving a subset of these predictors. For example, one may be interested in predicting by how much a particular cholesterol-lowering drug will lower a particular person's cholesterol level, based on the person's age, gender, weight, and various dietary and lifestyle factors.

Definition and properties

Mallows' C_p addresses the issue of overfitting, in which model selection statistics such as the residual sum of squares always get smaller as more variables are added to a model. Thus, if we aim to select the model giving the smallest residual sum of squares, the model including all variables would always be selected. The C_p statistic calculated on a sample of data estimates the mean squared prediction error (MSPE) as its population target

$E\sum_j (\hat{Y}_j - E(Y_j|X_j))^2/\sigma^2,$

where $\hat{Y}_j$ is the fitted value from the regression model for the j^th case, E(Y_j | X_j) is the expected value for the j^th case, and σ² is the error variance (assumed constant across the cases). The MSPE will not automatically get smaller as more variables are added. The optimum model under this criterion is a compromise influenced by the sample size, the effect sizes of the different predictors, and the degree of collinearity between them.

If P regressors are selected from a set of K > P, the C_p statistic for that particular set of regressors is defined as :

$C_p={SSE_p \over S^2} - N + 2P,$

where

$SSE_p = \sum_{i=1}^N(Y_i-Y_{pi})^2$ is the error sum of squares for the model with P regressors,
Y_pi is the predicted value of the i'th observation of Y from the P regressors,
S² is the residual mean square after regression on the complete set of K regressors and can be estimated by mean square error MSE,
and N is the sample size.

Practical use

The C_p statistic is often used as a stopping rule for various forms of stepwise regression. Mallows proposed the statistic as a criterion for selecting among many alternative subset regressions. Under a model not suffering from appreciable lack of fit (bias), C_p has expectation nearly equal to P; otherwise the expectation is roughly P plus a positive bias term. Nevertheless, even though it has expectation greater than or equal to P, there is nothing to prevent C_p < P or even C_p < 0 in extreme cases. It is suggested that one should choose a subset that has C_p approaching P,^[2] from above, for a list of subsets ordered by increasing P. In practice, the positive bias can be adjusted for by selecting a model from the ordered list of subsets, such that C_p < 2P.

Since the sample-based C_p statistic is an estimate of the MSPE, using C_p for model selection does not completely guard against overfitting. For instance, it is possible that the selected model will be one in which the sample C_p was a particularly severe underestimate of the MSPE.

Model selection statistics such as C_p are generally not used blindly, but rather information about the field of application, the intended use of the model, and any known biases in the data are taken into account in the process of model selection.

References

^ Mallows, C.L. (1973). "Some Comments on C_P". Technometrics 15 (4): 661–675. doi:10.2307/1267380. JSTOR 1267380. .
^ Daniel, C. and Wood, F. (1980) Fitting Equations to Data, Rev. Ed., NY: Wiley & Sons, Inc.

Hocking, R.R. (1976). "The Analysis and Selection of Variables in Linear Regression". Biometrics 32 (1): 1–50. doi:10.2307/2529336. JSTOR 2529336.

Categories:

Regression analysis
Regression variable selection

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

Mallows — This unusual name is of Anglo Saxon origin, and is a locational surname deriving from the hamlet now called Mallows Green, near Bishop s Stortford in Essex. The placename means the cross or mark on the hill , derived from the Olde English pre 7th … Surnames reference
Mallows — Mallow Mal low, Mallows Mal lows, n. [OE. malwe, AS. mealwe, fr. L. malva, akin to Gr. mala chh; cf. mala ssein to soften, malako s soft. Named either from its softening or relaxing properties, or from its soft downy leaves. Cf. {Mauve},… … The Collaborative International Dictionary of English
Mallows Bay — is a small bay on the Maryland side of the Potomac River in the United States located at 38°28′21.4″N 77°16′6.9″W / 38.472611°N 77.268583°W / … Wikipedia
Mallows — Occurs only in Job 30:4 (R.V., saltwort ). The word so rendered (malluah, from melah, salt ) most probably denotes the Atriplex halimus of Linnaeus, a species of sea purslane found on the shores of the Dead Sea, as also of the Mediterranean,… … Easton's Bible Dictionary
mallows — malÂ·low || mÃ¦lÉ™ÊŠ n. any of a number of plants belonging to the genus Malva having hairy stems and pink or white flowers … English contemporary dictionary
Charles Edward Mallows — House with a bridge approach. Designed by C. E. Mallows and F. L . Griggs. Drawn by F. L. Griggs. Charles Edward Mallows (5 May 1864–2 June 1915, age 51), often known as C. E. Mallows, was an English landscape architect. He was born in Chelsea,… … Wikipedia
The Royal Mallows — The 117th Regiment of Foot ( The Royal Mallows ) is a fictional Irish regiment mentioned in the works of Arthur Conan Doyle. By its title it appears to have been a Royal regiment, thus it probably had royal blue facings on their uniforms and a… … Wikipedia
Malves — mallows … Medieval glossary
Malwys — mallows … Medieval glossary
mauls — mallows. N … A glossary of provincial and local words used in England

Academic Dictionaries and Encyclopedias

Mallows' Cp

Definition and properties

Practical use

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Mallows' Cp

Definition and properties

Practical use

References

Look at other dictionaries:

Share the article and excerpts

Direct link