Regression model validation

In statistics, model validation is possibly the most important step in the model building sequence. It is also one of the most overlooked.^{[citation needed]} Often the validation of a model seems to consist of nothing more than quoting the R² statistic from the fit (which measures the fraction of the total variability in the response that is accounted for by the model).^{[citation needed]}

1 R² is not enough
2 Analysis of residuals
- 2.1 Graphical analysis of residuals
- 2.2 Quantitative analysis of residuals
3 See also
4 References
5 External links

R² is not enough

Analysis of residuals

The residuals from a fitted model are the differences between the responses observed at each combination values of the explanatory variables and the corresponding prediction of the response computed using the regression function. Mathematically, the definition of the residual for the i^th observation in the data set is written

$e_i = y_i - f(x_i;\hat{\beta}),$

with y_i denoting the i^th response in the data set and x_i the vector of explanatory variables, each set at the corresponding values found in the i^th observation in the data set.

If the model fit to the data were correct, the residuals would approximate the random errors that make the relationship between the explanatory variables and the response variable a statistical relationship. Therefore, if the residuals appear to behave randomly, it suggests that the model fits the data well. On the other hand, if non-random structure is evident in the residuals, it is a clear sign that the model fits the data poorly. The next section details the types of plots to use to test different aspects of a model and give guidance on the correct interpretations of different results that could be observed for each type of plot.

Graphical analysis of residuals

Quantitative analysis of residuals

Numerical methods for model validation, such as the R² statistic, are also useful, but usually to a lesser degree than graphical methods. Numerical methods for model validation tend to be narrowly focused on a particular aspect of the relationship between the model and the data and often try to compress that information into a single descriptive number or test result. Numerical methods do play an important role as confirmatory methods for graphical techniques, however. For example, the lack-of-fit test for assessing the correctness of the functional part of the model can aid in interpreting a borderline residual plot. There are also a few modeling situations in which graphical methods cannot easily be used. In these cases, numerical methods provide a fallback position for model validation. One common situation when numerical validation methods take precedence over graphical methods is when the number of parameters being estimated is relatively close to the size of the data set. In this situation residual plots are often difficult to interpret due to constraints on the residuals imposed by the estimation of the unknown parameters. One area in which this typically happens is in optimization applications using designed experiments. Logistic regression with binary data is another area in which graphical residual analysis can be difficult.

References

External links

This article incorporates public domain material from websites or documents of the National Institute of Standards and Technology.

Categories:

Statistical models
Statistical inference
Regression analysis
Regression diagnostics

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

Statistical model validation — Model validation is possibly the most important step in the model building sequence. It is also one of the most overlooked. Often the validation of a model seems to consist of nothing more than quoting the R 2 statistic from the fit (which… … Wikipedia
Model selection — is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is … Wikipedia
Model checking (disambiguation) — Model checking may refer to model checking regression model validation This disambiguation page lists articles associated with the same title. If an internal link led you here, you may wish to chan … Wikipedia
Regression toward the mean — In statistics, regression toward the mean (also known as regression to the mean) is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and a fact that may… … Wikipedia
Regression discontinuity design — In statistics, econometrics, epidemiology and related disciplines, a regression discontinuity design (RDD) is a design that elicits the causal effects of interventions by exploiting a given exogenous threshold determining assignment to treatment … Wikipedia
Model checking — This article is about checking of models in computer science. For the checking of models in statistics, see regression model validation. In computer science, model checking refers to the following problem: Given a model of a system, test… … Wikipedia
Model risk — In finance, model risk is the risk involved in using models to value financial securities.[1] Rebonato considers alternative definitions including: After observing a set of prices for the underlying and hedging instruments, different but… … Wikipedia
Cross-validation (statistics) — Cross validation, sometimes called rotation estimation,[1][2][3] is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and… … Wikipedia
Linear regression — Example of simple linear regression, which has one independent variable In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one… … Wikipedia
Robust regression — In robust statistics, robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non parametric methods. Regression analysis seeks to find the effect of one or more independent… … Wikipedia

Academic Dictionaries and Encyclopedias

Regression model validation

Contents

R² is not enough

Analysis of residuals

Graphical analysis of residuals

Quantitative analysis of residuals

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Regression model validation

Contents

R2 is not enough

Analysis of residuals

Graphical analysis of residuals

Quantitative analysis of residuals

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link

R² is not enough