Prediction interval

Prediction interval

In statistics, a prediction interval bears the same relationship to a future observation that a confidence interval bears to an unobservable population parameter. Prediction intervals predict the distribution of individual points, whereas confidence intervals estimate the true population mean or other quantity of interest that cannot be observed.

In other words, an interval estimate of a parameter, such as a population mean is usually called a confidence interval. An interval estimate of a variable is sometimes called a prediction interval.

A common example given in statistics classes is the prediction interval for a response variable when finding the least squares regression line. If the entire population is given in the data, this is not needed. However, if the data is a sample, then the true regression line may not be known. The predicted value of the response variable "y", found using the equation of the regression line from the sample data, will have a margin of error. The predicted "y" value is a statistic, not a parameter. For this "y" value, a prediction interval can be found. We use the standard deviation (standard error) of the distribution of the slope to do this. The "y" value is a point estimate and we are looking for a prediction interval for that estimate.

Example

Suppose one has drawn a sample from a normally distributed population. The mean and standard deviation of the population are unknown except insofar as they can be estimated based on the sample. It is desired to predict the next observation. Let "n" be the sample size; let μ and σ be respectively the true (unobservable) mean and standard deviation of the population. Let "X"1, ..., "X""n", be the sample; let "X""n"+1 be the future observation to be predicted. Let the sample mean be

:overline{X}_n=(X_1+cdots+X_n)/n

and the sample variance be

:S_n^2={1 over n-1}sum_{i=1}^n (X_i-overline{X}_n)^2.

Then it is fairly routine to show that

:{X_{n+1}-overline{X}_n over sqrt{S_n^2+S_n^2/n={X_{n+1}-overline{X}_n over S_nsqrt{1+1/n

has a Student's t-distribution with "n" − 1 degrees of freedom. Consequently we have

:Prleft(overline{X}_n-T_a S_nsqrt{1+(1/n)}leq X_{n+1} leqoverline{X}_n+T_a S_nsqrt{1+(1/n)}, ight)=p

where "Ta" is the 100((1 + "p")/2)th percentile of Student's t-distribution with "n" − 1 degrees of freedom. Therefore the numbers

:overline{X}_npm T_a {S}_nsqrt{1+(1/n)}

are the endpoints of a 100"p"% prediction interval for "X""n" + 1.

ee also

*Confidence interval
*Extrapolation
*Prediction
*Regression analysis
*Seymour Geisser
*Trend estimation

References

*Chatfield, C. (1993) "Calculating Interval Forecasts," "Journal of Business and Economic Statistics," 11 121–135.
*Meade, N. and T. Islam (1995) "Prediction Intervals for Growth Curve Forecasts," "Journal of Forecasting," 14 413–430.
*Lawless, J.F. and Fredette, M. (2005) "Frequentist prediction intervals and predictive distributions". "Biometrika", 92 (3) 529–542.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Prediction — A prediction is a statement or claim that a particular event will occur in the future in more certain terms than a forecast. The etymology of this word is Latin (from præ before plus dicere to say ). Niels Bohr stated Prediction is very difficult …   Wikipedia

  • Interval estimation — In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter. The most prevalent forms of interval estimation are: * confidence intervals (a frequentist… …   Wikipedia

  • Confidence interval — This article is about the confidence interval. For Confidence distribution, see Confidence Distribution. In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter and is used to indicate the… …   Wikipedia

  • Linear prediction — is a mathematical operation where future values of a discrete time signal are estimated as a linear function of previous samples.In digital signal processing, linear prediction is often called linear predictive coding (LPC) and can thus be viewed …   Wikipedia

  • Mixed-excitation linear prediction — (MELP) is a United States Department of Defense speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its standardization and later development was led and supported by… …   Wikipedia

  • Reference range — Reference ranges edit in: blood urine CSF feces In health related fields, a reference range or reference interval usually describes the variations of a measurement or value in healthy i …   Wikipedia

  • Confidence band — A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Confidence bands are often used as part of the graphical presentation of results in a statistical …   Wikipedia

  • Software development effort estimation — is the process of predicting the most realistic use of effort required to develop or maintain software based on incomplete, uncertain and/or noisy input. Effort estimates may be used as input to project plans, iteration plans, budgets, investment …   Wikipedia

  • Sample maximum and minimum — Box plots of the Michelson–Morley experiment, showing sample maximums and minimums. In statistics, the maximum and sample minimum, also called the largest observation, and smallest observation, are the values of the greatest and least elements of …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”