Robust confidence intervals

Robust confidence intervals

In statistics a robust confidence interval is the outcome of a specialized set of calculations constructed in such a way as to produce confidence intervals which are not badly affected by outlying or aberrant observations in a data-set.

In the process of weighing 1000 objects, under practical conditions, it is easy to believe that the operator might make a mistake in procedure and so report an incorrect mass (thereby making one type of systematic error). Suppose he has 100 objects and he weighed them all, one at a time, and repeated the whole process ten times. Then he can calculate a sample standard deviation for each object, and look for outliers. Any object with an unusually large standard deviation probably has an outlier in its data. These can be removed by various non-parametric techniques. If he repeated the process only three times, he would simply take the median of the three measurements and use σ to give a confidence interval. The 200 extra weighings served only to detect and correct for operator error and did nothing to improve the confidence interval. With more repetitions, he could use a truncated mean, discarding say the largest and smallest values and averaging the rest. He could then use a bootstrap calculation to determine a confidence interval narrower than that calculated from σ, and so obtain some benefit from a large amount of extra work.

These procedures are robust against procedural errors which are not modeled by the assumption that the balance has a fixed known standard deviation σ. In practical applications where the occasional operator error can occur, or the balance can malfunction, the assumptions behind simple statistical calculations cannot be taken for granted. Before trusting the results of 100 objects weighed just three times each to have confidence intervals calculated from σ, it is necessary to test for and remove a reasonable number of outliers (testing the assumption that the operator is careful and correcting for the fact that he is not perfect), and to test the assumption that the data really have a normal distribution with standard deviation σ.

The theoretical analysis of such an experiment is complicated, but it is easy to set up a spreadsheet which draws random numbers from a normal distribution with standard deviation σ to simulate the situation (use =norminv(rand(),0,σ)); see for example [Wittwer, J.W., [http://vertex42.com/ExcelArticles/mc/ "Monte Carlo Simulation in Excel: A Practical Guide"] , June 1, 2004] . These techniques also work in Open Office and gnumeric.

After removing obvious outliers, one could subtract the median from the other two values for each object, and examine the distribution of the 200 resulting numbers. It should be normal with mean near zero and standard deviation a little larger than σ. A simple Monte Carlo spreadsheet calculation would reveal typical values for the standard deviation (around 105 to 115% of σ). Or, one could subtract the mean of each triplet from the values, and examine the distribution of 300 values. The mean is identically zero, but the standard deviation should be somewhat smaller (around 75 to 85% of σ).

ee also

Robust statistics

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Robust statistics — provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions. Contents 1 Introduction 2 Examples of robust and non robust… …   Wikipedia

  • Confidence interval — This article is about the confidence interval. For Confidence distribution, see Confidence Distribution. In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter and is used to indicate the… …   Wikipedia

  • Confidence band — A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Confidence bands are often used as part of the graphical presentation of results in a statistical …   Wikipedia

  • Robust regression — In robust statistics, robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non parametric methods. Regression analysis seeks to find the effect of one or more independent… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Credible interval — Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood …   Wikipedia

  • Multiple comparisons — In statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously.[1] Errors in inference, including confidence intervals that fail to include their corresponding… …   Wikipedia

  • Effect size — In statistics, an effect size is a measure of the strength of the relationship between two variables in a statistical population, or a sample based estimate of that quantity. An effect size calculated from data is a descriptive statistic that… …   Wikipedia

  • Resampling (statistics) — In statistics, resampling is any of a variety of methods for doing one of the following: # Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknife) or drawing randomly with… …   Wikipedia

  • Bootstrapping (statistics) — In statistics, bootstrapping is a modern, computer intensive, general purpose approach to statistical inference, falling within a broader class of resampling methods.Bootstrapping is the practice of estimating properties of an estimator (such as… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”