Winsorising

Winsorising

Winsorising or Winsorization is the transformation of statistics by transforming extreme values in the statistical data, and is named for the engineer-turned-biostatistician Charles P. Winsor (1895–1951).

The distribution of many statistics can be heavily influenced by outliers. A typical strategy is to set all outliers to a specified percentile of the data; for example, a 90% Winsorisation would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile.Winsorised estimators are usually more robust to outliers than their unwinsorised counterparts.

Distinction from trimming

Note that Winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called trimming.

In a trimmed estimator, the extreme values are "discarded;" in a Winsorized estimator, the extreme values are instead "replaced" by certain percentiles (the trimmed minimum and maximum).

For example, a Winsorized mean is not the same as a truncated mean:the 5% trimmed mean is the average of the 5th to 95th percentile of the data, while the 90% Winsorised mean sets the bottom 5% to the 5th percentile, the top 5% to the 95th percentile, and then averages the data.

More formally, they are distinct because the order statistics are not independent.

References

* "Simplified Estimation from Censored Normal Samples", W. J. Dixon, The Annals of Mathematical Statistics, 31, pp. 385-391, 1960
* "The Future of Data Analysis", J. W. Tukey, The Annals of Mathematical Statistics, 33, p. 18, 1962


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Outlier — This article is about the statistical term. For other uses, see Outlier (disambiguation). Figure 1. Box plot of data from the Michelson Morley Experiment displaying outliers in the middle column. In statistics, an outlier[1] is an observ …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (W) — NOTOC Wad Wadge hierarchy Wagstaff prime Wald test Wald Wolfowitz runs test Wald s equation Waldhausen category Wall Sun Sun prime Wallenius noncentral hypergeometric distribution Wallis product Wallman compactification Wallpaper group Walrasian… …   Wikipedia

  • Robust statistics — provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions. Contents 1 Introduction 2 Examples of robust and non robust… …   Wikipedia

  • Trimmed estimator — Given a estimator, a trimmed estimator is obtained by excluding some of the extreme values. This is generally done to obtain a more robust statistic: the extreme values are considered outliers.Given an estimator, the n% trimmed version is… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”