- Grubbs' test for outliers
Many statistical techniques are sensitive to the presence of
outlier s. For example, simple calculations of the mean and standard deviation may be distorted by a single grossly inaccurate data point.Checking for outliers should be a routine part of any
data analysis . Potential outliers should be examined to see if they are possibly erroneous. If the data point is in error, it should be corrected if possible and deleted if it is not possible. If there is no reason to believe that the outlying point is in error, it should not be deleted without careful consideration. However, the use of more robust techniques may be warranted. Robust techniques will often downweight the effect of outlying points without deleting them.Definition
Grubbs' test (also known as the maximum normed residual test) is used to detect outliers in a univariate data set. It is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs' test.
Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or less since it frequently tags most of the points as outliers.
Grubbs' test is defined for the hypothesis:
:H0: There are no outliers in the data set:Ha: There is at least one outlier in the data set
Test statistic
The Grubbs' test statistic is defined as::with and "s" denoting the
sample mean andstandard deviation , respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.This is the two-sided version of the test. The Grubbs test can also be defined as one of the following one-sided tests:
test whether the minimum value is an outlier
:with "Y"min denoting the minimum value.
test whether the maximum value is an outlier
:with "Y"max denoting the maximum value.
Critical region
For the
two-sided test , the hypothesis of no outliers is rejected atsignificance level α if:
with "t"α/(2"N"),"N"−2 denoting the upper
critical value of thet-distribution with "N" − 2degrees of freedom and a significance level of α/(2"N"). For the one-sided tests, replace α/(2"N") with α/"N".Related techniques
Several
graphical technique s can, and should, be used to detect outliers. A simplerun sequence plot , abox plot , or ahistogram should show any obviously outlying points. Anormal probability plot orlag plot may also be useful.ee also
*
Chauvenet's criterion
*Peirce's criterion
*Q test External links
* [http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm Grubbs' Test for Outliers]
* [http://www.graphpad.com/quickcalcs/Grubbs1.cfm Grubbs' Test online calculator]References
*
*
Wikimedia Foundation. 2010.