Grubbs' test for outliers

Grubbs' test for outliers

Many statistical techniques are sensitive to the presence of outliers. For example, simple calculations of the mean and standard deviation may be distorted by a single grossly inaccurate data point.

Checking for outliers should be a routine part of any data analysis. Potential outliers should be examined to see if they are possibly erroneous. If the data point is in error, it should be corrected if possible and deleted if it is not possible. If there is no reason to believe that the outlying point is in error, it should not be deleted without careful consideration. However, the use of more robust techniques may be warranted. Robust techniques will often downweight the effect of outlying points without deleting them.

Definition

Grubbs' test (also known as the maximum normed residual test) is used to detect outliers in a univariate data set. It is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs' test.

Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or less since it frequently tags most of the points as outliers.

Grubbs' test is defined for the hypothesis:

:H0: There are no outliers in the data set:Ha: There is at least one outlier in the data set

Test statistic

The Grubbs' test statistic is defined as::G = frac{underset{i=1,ldots, N}{max}left vert Y_i - ar{Y} ightvert}{s} with overline{Y} and "s" denoting the sample mean and standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.

This is the two-sided version of the test. The Grubbs test can also be defined as one of the following one-sided tests:

test whether the minimum value is an outlier

:G = frac{ar{Y}-Y_{ extrm{min}{s} with "Y"min denoting the minimum value.

test whether the maximum value is an outlier

:G = frac{Y_{ extrm{max - ar{Y{s} with "Y"max denoting the maximum value.

Critical region

For the two-sided test, the hypothesis of no outliers is rejected at significance level α if

:G > frac{N-1}{sqrt{N sqrt{frac{t_{alpha/(2N),N-2}^2}{N - 2 + t_{alpha/(2N),N-2}^2

with "t"α/(2"N"),"N"−2 denoting the upper critical value of the t-distribution with "N" − 2 degrees of freedom and a significance level of α/(2"N"). For the one-sided tests, replace α/(2"N") with α/"N".

Related techniques

Several graphical techniques can, and should, be used to detect outliers. A simple run sequence plot, a box plot, or a histogram should show any obviously outlying points. A normal probability plot or lag plot may also be useful.

ee also

* Chauvenet's criterion
* Peirce's criterion
* Q test

External links

* [http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm Grubbs' Test for Outliers]
* [http://www.graphpad.com/quickcalcs/Grubbs1.cfm Grubbs' Test online calculator]

References

*
*


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Ausreißertest nach Grubbs — Der Ausreißertest nach Grubbs ist ein statistischer Test, der dazu verwendet wird, Ausreißer in einer gegebenen Stichprobe zu entdecken, zu eliminieren und durch Iteration die verbleibende Stichprobe zu verbessern. Der Ausreißertest nach Nalimov… …   Deutsch Wikipedia

  • Dixon's Q test — In statistics, Dixon s Q test, or simply the Q test, is used for identification and rejection of outliers. Per Dean and Dixon, and others, this test should be used sparingly and never more than once in a data set. To apply a Q test for bad data,… …   Wikipedia

  • Outlier — This article is about the statistical term. For other uses, see Outlier (disambiguation). Figure 1. Box plot of data from the Michelson Morley Experiment displaying outliers in the middle column. In statistics, an outlier[1] is an observ …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Chauvenet's criterion — In statistical theory, the Chauvenet s criterion (named for William Chauvenet[1]) is a means of assessing whether one piece of experimental data an outlier from a set of observations, is likely to be spurious. To apply Chauvenet s criterion,… …   Wikipedia

  • List of mathematics articles (G) — NOTOC G G₂ G delta space G networks Gδ set G structure G test G127 G2 manifold G2 structure Gabor atom Gabor filter Gabor transform Gabor Wigner transform Gabow s algorithm Gabriel graph Gabriel s Horn Gain graph Gain group Galerkin method… …   Wikipedia

  • Criterio de Chauvenet — Saltar a navegación, búsqueda En estadística, el criterio de Chauvenet es un método para calcular si un dato experimental (a partir de ahora llamado dato dudoso), de un conjunto de datos experimentales, es probable que sea un valor atípico… …   Wikipedia Español

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”