Regression dilution

Regression dilution: Regression dilution is a statistical phenomenon also known as "attenuation".

Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the gradient (slope) of the line. Statistical variability, measurement error or random noise in the y variable cause imprecision in the estimated gradient, but not bias: on average, the procedure calculates the right gradient. However, variability, measurement error or random noise in the x variable causes bias in the estimated gradient (as well as imprecision). The greater the variance in the x measurement, the closer the estimated slope must approach 0 instead of the true gradient. This 'dilution' of the gradient towards 0 is referred to as "regression dilution," "attenuation," or "attenuation bias."

It may seem counter-intuitive that noise in the predictor variable x induces a bias, but noise in the outcome variable y does not. Recall that linear regression is not symmetric: the line of best fit for predicting y from x (the usual linear regression) is not the same as the line of best fit for predicting x from y (see, for example, Draper & Smith, "Applied Regression Analysis"; page 5 of the 1966 edition^{[Full citation needed]}).

Contents

1 How to correct for regression dilution

1.1 The case of a randomly distributed x variable

1.2 The case of a fixed x variable

1.3 Multiple x variables

2 Is correction necessary?

2.1 Caveats

3 Further reading

4 See also

5 References

How to correct for regression dilution

Main article: correction for attenuation

The case of a randomly distributed x variable

The case that the x variable arises randomly is known as the structural model or structural relationship. For example, in a medical study patients are recruited as a sample from a population, and their characteristics such as blood pressure may be viewed as arising from a random sample.

Under certain assumptions (typically, normal distribution assumptions) there is a known ratio between the true gradient, and the expected estimated gradient. Frost and Thompson (2000) ^[1] review several methods for estimating this ratio and hence correcting the estimated gradient. The term regression dilution ratio (beware – not defined in quite the same way by all authors) is used of this general approach, in which the usual linear regression is fitted, and then a correction applied. The reply to Frost & Thompson by Longford (2001) ^[2] refers the reader to other methods, expanding the regression model to acknowledge the variability in the x variable, so that no bias arises. Fuller (1987) ^[3] is one of the standard references for assessing and correcting for regression dilution.

Hughes (1993) ^[4] shows that the regression dilution ratio methods apply approximately in survival models. Rosner (1992) ^[5] shows that the ratio methods apply approximately to logistic regression models. Carroll et al. (1995) ^[6] give more detail on regression dilution in nonlinear models, presenting the regression dilution ratio methods as the simplest case of regression calibration methods, in which additional covariates may also be incorporated.

In general, methods for the structural model require some estimate of the variability of the x variable. This will require repeated measurements of the x variable in the same individuals, either in a sub-study of the main data set, or in a separate data set. Without this information it will not be possible to make a correction.

The case of a fixed x variable

The case that x is fixed, but measured with noise, is known as the functional model or functional relationship. See, for example, Riggs et al. (1978).^[7]

Multiple x variables

The case of multiple predictor variables (possibly correlated) subject to variability (possibly correlated) has been well-studied for linear regression, and for some non-linear regression models. ^[3]^[6] Other non-linear models, such as proportional hazards models for survival analysis, have been considered only with a single predictor subject to variability.^[4]

Is correction necessary?

In many (perhaps most) applications, correction is neither necessary nor appropriate. To understand this, consider the measurement error as follows. Let y be the outcome variable, x be the true predictor variable, and w be an approximate observation of x. Frost and Thompson^[1] suggest, for example, that x may be the true, long-term blood pressure of a patient, and w may be the blood pressure observed on one particular clinic visit. Regression dilution arises if we are interested in the relationship between y and x, but estimate the relationship between y and w. Because w is measured with variability, the gradient of a regression line of y on w is less than the regression line of y on x.

Does this matter? In predictive modelling, no. Standard methods can fit a regression of y on w without bias. There is bias only if we then use the regression of y on w as an approximation to the regression of y on x. In the example, assuming that blood pressure measurements are similarly variable in future patients, our regression line of y on w (observed blood pressure) gives unbiased predictions.

An example of a circumstance in which correction is desired is prediction of change. Suppose the change in x is known under some new circumstance: to estimate the likely change in an outcome variable y, the gradient of the regression of y on x is needed, not y on w. This arises in epidemiology. To continue the example in which x denotes blood pressure, perhaps a large clinical trial has provided an estimate of the change in blood pressure under a new treatment; then the possible effect on y, under the new treatment, should be estimated from the gradient in the regression of y on x.

Another circumstance is predictive modelling in which future observations are also variable, but not (in the phrase used above) "similarly variable". For example, if the current data set includes blood pressure measured with greater precision than is common in clinical practice. One specific example of this^[8] arose when developing a regression equation based on a clinical trial, in which blood pressure was the average of six measurements, for use in clinical practice, where blood pressure is usually a single measurement.

Caveats

All of these results can be shown mathematically, in the case of simple linear regression assuming normal distributions throughout (the framework of Frost & Thompson). However, it has been pointed out^[9] that a poorly executed correction for regression dilution may do more damage to an estimate than no correction.

Further reading

Regression dilution was first mentioned, under the name attenuation, by Spearman (1904).^[10] Those seeking a readable mathematical treatment might like to start with Frost and Thompson (2000),^[1] or see correction for attenuation.

See also

Correction for attenuation

Errors-in-variables models

References

^ ^a ^b ^c Frost, C. and S. Thompson (2000). "Correcting for regression dilution bias: comparison of methods for a single predictor variable." Journal of the Royal Statistical Society Series A 163: 173–190.

^ Longford, N. T. (2001). Correspondence. Journal of the Royal Statistical Society Series A 164:565.

^ ^a ^b Fuller, W. A. (1987). Measurement Error Models. New York, Wiley.

^ ^a ^b Hughes, M. D. (1993). "Regression dilution in the proportional hazards model." Biometrics 49: 1056–1066.

^ Rosner, B., D. Spiegelman, et al. (1992). "Correction of Logistic Regression Relative Risk Estimates and Confidence Intervals for Random Within-Person Measurement Error." American Journal of Epidemiology 136: 1400–1403.

^ ^a ^b Carroll, R. J., Ruppert, D., and Stefanski, L. A. (1995). Measurement error in non-linear models. New York, Wiley.

^ Riggs, D. S., J. A. Guarnieri, et al. (1978). "Fitting straight lines when both variables are subject to error." Life Sciences 22: 1305–60.

^ Stevens, R. J., Kothari, V., Adler A. I., Stratton I. M. and Holman R. R. (2001). Appendix to "The UKPDS Risk Engine: a model for the risk of coronary heart disease in type 2 diabetes UKPDS 56)." Clinical Science 101: 671–679.

^ Davey-Smith, G. and A. N. Phillips (1996). "Inflation in epidemiology: The proof and measurement of association between two things" revisited." BMJ 3: 1659–1661.

^ Spearman, C. (1904). "The proof and measurement of association between two things." American Journal of Psychology 15: 72–101.

Categories:
Regression analysis
Statistical models

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
Errors-in-variables models — In statistics and econometrics, errors in variables models or measurement errors models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors… … Wikipedia
Correction for attenuation — is a statistical procedure, due to Spearman (1904), to rid a correlation coefficient from the weakening effect of measurement error (Jensen, 1998), a phenomenon also known as regression dilution. In measurement and statistics, it is also called… … Wikipedia
List of mathematics articles (R) — NOTOC R R. A. Fisher Lectureship Rabdology Rabin automaton Rabin signature algorithm Rabinovich Fabrikant equations Rabinowitsch trick Racah polynomials Racah W coefficient Racetrack (game) Racks and quandles Radar chart Rademacher complexity… … Wikipedia
Disattenuation — In measurement and statistics, disattenuation of a correlation between two sets of parameters or measures is the estimation of the correlation in a manner that accounts for measurement error contained within the estimates of those… … Wikipedia
Homeopathy — Homeopathy: coined in German from Greek hómoios ὅμοιος like + páthos πάθος suffering Oxford English Dictionary … Wikipedia
POLLUTION — Bien que d’usage banal à l’heure actuelle, le terme de pollution recouvre des acceptions fort diverses et qualifie une multitude d’actions qui dégradent d’une façon ou d’une autre le milieu naturel. Certes, le vocable désigne sans ambiguïté les… … Encyclopédie Universelle
Espèce invasive — Cette espèce, Miconia calvescens, originaire d Amérique centrale est jugée responsable dans l accélération de l érosion de certaines îles du Pacifique, tel que l archipel d Hawaii … Wikipédia en Français
Arnica Des Montagnes — Arnica des montagnes … Wikipédia en Français
Arnica des montagnes — Arnica montana … Wikipédia en Français

Academic Dictionaries and Encyclopedias

Regression dilution

Contents

How to correct for regression dilution

The case of a randomly distributed x variable

The case of a fixed x variable

Multiple x variables

Is correction necessary?

Caveats

Further reading

See also

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Regression dilution

Contents

How to correct for regression dilution

The case of a randomly distributed x variable

The case of a fixed x variable

Multiple x variables

Is correction necessary?

Caveats

Further reading

See also

References

Look at other dictionaries:

Share the article and excerpts

Direct link