- Winsorising
Winsorising or Winsorization is the transformation of
statistic s by transformingextreme value s in the statistical data, and is named for the engineer-turned-biostatisticianCharles P. Winsor (1895–1951).The distribution of many
statistic s can be heavily influenced byoutlier s. A typical strategy is to set all outliers to a specifiedpercentile of the data; for example, a 90% Winsorisation would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile.Winsorisedestimator s are usually more robust to outliers than their unwinsorised counterparts.Distinction from trimming
Note that Winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called trimming.
In a trimmed estimator, the extreme values are "discarded;" in a Winsorized estimator, the extreme values are instead "replaced" by certain percentiles (the trimmed minimum and maximum).
For example, a
Winsorized mean is not the same as atruncated mean :the 5% trimmed mean is the average of the 5th to 95th percentile of the data, while the 90% Winsorised mean sets the bottom 5% to the 5th percentile, the top 5% to the 95th percentile, and then averages the data.More formally, they are distinct because the
order statistics are not independent.References
* "Simplified Estimation from Censored Normal Samples", W. J. Dixon, The Annals of Mathematical Statistics, 31, pp. 385-391, 1960
* "The Future of Data Analysis", J. W. Tukey, The Annals of Mathematical Statistics, 33, p. 18, 1962
Wikimedia Foundation. 2010.