Deming regression

Deming regression
Deming regression. The red lines show the error in both x and y. This is different from the traditional least squares method which measures error parallel to the y axis. The case shown, with deviations measured perpendicularly, arises when x and y have equal variances.

In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis.

Deming regression is equivalent to the maximum likelihood estimation of an errors-in-variables model in which the errors for the two variables are assumed to be independent and normally distributed, and the ratio of their variances, denoted δ, is known.[1] In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio.

The Deming regression is only slightly more difficult to compute compared to the simple linear regression. Many software packages used in clinical chemistry, such as Analyse-it, EP Evaluator, MedCalc and S-PLUS offer Deming regression.

The model was originally introduced by Adcock (1878) who considered the case δ = 1, and then more generally by Kummell (1879) with arbitrary δ. However their ideas remained largely unnoticed for more than 50 years, until they were revived by Koopmans (1937) and later propagated even more by Deming (1943). The latter book became so popular in clinical chemistry and related fields that the method was even dubbed Deming regression in those fields.[2]

Contents

Specification

Assume that the available data (yi, xi) are mismeasured observations of the “true” values (yi*, xi*):

\begin{align}
  y_i &= y^*_i + \varepsilon_i, \\
  x_i &= x^*_i + \eta_i,
  \end{align}

where errors ε and η are independent and the ratio of their variances is assumed to be known:

 \delta = \frac{\sigma_\varepsilon^2}{\sigma_\eta^2}.

In practice the variance of the x and y parameters is often unknown which complicates the estimate of δ but where the measurement method for x and y is the same they are likely to be equal so that δ = 1 for this case.

We seek to find the line of “best fit” y* = β0 + β1x*, such that the weighted sum of squared residuals of the model is minimized:[3]

SSR = \sum_{i=1}^n\bigg(\frac{\varepsilon_i^2}{\sigma_\varepsilon^2} + \frac{\eta_i^2}{\sigma_\eta^2}\bigg) = \frac{1}{\sigma_\varepsilon^2} \sum_{i=1}^n\Big((y_i-\beta_0-\beta_1x^*_i)^2 + \delta(x_i-x^*_i)^2\Big) \ \to\ \min_{\beta_0,\beta_1,x_1^*,\ldots,x_n^*} SSR

Solution

Solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from i = 1 to n):

\begin{align}
  & \overline{x} = \frac{1}{n}\sum x_i, \quad \overline{y} = \frac{1}{n}\sum y_i, \\
  & s_{xx} = \tfrac{1}{n-1}\sum (x_i-\overline{x})^2, \\
  & s_{xy} = \tfrac{1}{n-1}\sum (x_i-\overline{x})(y_i-\overline{y}), \\
  & s_{yy} = \tfrac{1}{n-1}\sum (y_i-\overline{y})^2.
  \end{align}

Finally, the least-squares estimates of model's parameters will be[4]

\begin{align}
  & \hat\beta_1 = \frac{s_{yy}-\delta s_{xx} + \sqrt{(s_{yy}-\delta s_{xx})^2 + 4\delta s_{xy}^2}}{2s_{xy}} \\
  & \hat\beta_0 = \overline{y} - \hat\beta_1\overline{x}, \\
  & \hat{x}_i^* = x_i + \frac{\hat\beta_1}{\hat\beta_1^2+\delta}(y_i-\hat\beta_0-\hat\beta_1x_i).
  \end{align}

See also

References

  1. ^ (Linnet 1993)
  2. ^ Cornbleet, Gochman (1979)
  3. ^ Fuller, ch.1.3.3
  4. ^ Glaister (2001)
  • Adcock, R. J. (1878). "A problem in least squares". The Analyst (Annals of Mathematics) 5 (2): 53–54. doi:10.2307/2635758. JSTOR 2635758. 
  • Cornbleet, P.J.; Gochman, N. (1979). "Incorrect Least–Squares Regression Coefficients". Clin. Chem. 25 (3): 432–438. PMID 262186. 
  • Deming, W. E. (1943). Statistical adjustment of data. Wiley, NY (Dover Publications edition, 1985). ISBN 0-486-64685-8. 
  • Fuller, Wayne A. (1987). Measurement error models. John Wiley & Sons, Inc. ISBN 0-471-86187-1. 
  • Glaister, P. (March 2001). "Least squares revisited". The Mathematical Gazette 85: 104-107.
  • Koopmans, T. C. (1937). Linear regression analysis of economic time series. DeErven F. Bohn, Haarlem, Netherlands. 
  • Kummell, C. H. (1879). "Reduction of observation equations which contain more than one observed quantity". The Analyst (Annals of Mathematics) 6 (4): 97–105. doi:10.2307/2635646. JSTOR 2635646. 
  • Linnet, K. (1993). "Evaluation of regression procedures for method comparison studies". Clinical Chemistry 39 (3): 424–432. PMID 8448852. http://www.clinchem.org/cgi/reprint/39/3/424. 

Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Regression fallacy — The regression (or regressive) fallacy is an informal fallacy. It ascribes cause where none exists. The flaw is failing to account for natural fluctuations. It is frequently a special kind of the post hoc fallacy. Explanation Things like stock… …   Wikipedia

  • Outline of regression analysis — In statistics, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X. The following outline is an overview and guide to the variety of… …   Wikipedia

  • Total least squares — The bivariate (Deming regression) case of Total Least Squares. The red lines show the error in both x and y. This is different from the traditional least squares method which measures error parallel to the y axis. The case shown, with deviations… …   Wikipedia

  • Errors-in-variables models — In statistics and econometrics, errors in variables models or measurement errors models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors… …   Wikipedia

  • Analyse-it — Infobox Software name = Analyse it caption = Analyse it screenshot developer = Analyse it Software, Ltd. latest release version = 2.04 (Win) latest release date = 2007 operating system = Windows genre = Statistical analysis license = proprietary… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (D) — NOTOC D D distribution D module D D Agostino s K squared test D Alembert Euler condition D Alembert operator D Alembert s formula D Alembert s paradox D Alembert s principle Dagger category Dagger compact category Dagger symmetric monoidal… …   Wikipedia

  • Control chart — One of the Seven Basic Tools of Quality First described by Walter A. Shewhart …   Wikipedia

  • Chemometrics — is the science of extracting information from chemical systems by data driven means. It is a highly interfacial discipline, using methods frequently employed in core data analytic disciplines such as multivariate statistics, applied mathematics,… …   Wikipedia

  • statistics — /steuh tis tiks/, n. 1. (used with a sing. v.) the science that deals with the collection, classification, analysis, and interpretation of numerical facts or data, and that, by use of mathematical theories of probability, imposes order and… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”