Normal probability plot

Normal probability plot

The normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed.

Example of a normal probability plot.

The data are plotted against a theoretical normal distribution in such a way that the points should form an approximate straight line. Departures from this straight line indicate departures from normality.

The normal probability plot is a special case of the probability plot, for the case of a normal distribution.

Contents

Definition

The normal probability plot is formed by:

These are calculated according to the following formula. For each data value i=1, \ldots, n, find zi such that:


P(Z<z_i)=\begin{cases}
1-0.5^{1/n} &\text{for } i=1\\[8pt]
0.5^{1/n} &\text{for } i=n\\[8pt]
\frac{i-0.3175}{n+0.365} &\text{otherwise}
\end{cases}

That is, the observations are plotted as a function of the corresponding normal order statistic medians. Another way to think about this is that the sample values are plotted against what we would expect to see if it was strictly consistent with the normal distribution.

If the data is consistent with a sample from a normal distribution the points should lie close to a straight line. As a reference, a straight line can be fit to the points. The further the points vary from this line, the greater the indication of departure from normality. If the sample has mean 0, standard deviation 1 then a line through 0 with slope 1 could be used. How close to the line the points will lie does depend on the sample size. For a large sample, > 100, we'd expect the points to be very close to the reference line. Smaller samples will see a much larger variation, but might still be consistent with a normal sample.

Other distributions

Probability plots for distributions other than the normal are computed in exactly the same way. The normal quantile function G is simply replaced by the quantile function of the desired distribution. That is, a probability plot can easily be generated for any distribution for which one has the quantile function.

One advantage of this method of computing probability plots is that the intercept and slope estimates of the fitted line are in fact estimates for the location and scale parameters of the distribution. Although this is not too important for the normal distribution since the location and scale are estimated by the mean and standard deviation, respectively, it can be useful for many other distributions.

The correlation coefficient of the points on the normal probability plot can be compared to a table of critical values to provide a formal test of the hypothesis that the data come from a normal distribution.

Examples

This is a sample of size 50 from a normal distribution, plotted as both a histogram, and a normal probability plot.

This is a sample of size 50 from a right-skewed distribution, plotted as both a histogram, and a normal probability plot.

This is a sample of size 50 from a uniform distribution, plotted as both a histogram, and a normal probability plot.

See also

References

 This article incorporates public domain material from websites or documents of the National Institute of Standards and Technology.


Further reading

  • Chambers, John; William Cleveland, Beat Kleiner, and Paul Tukey (1983). Graphical Methods for Data Analysis. Wadsworth. 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Probability plot — The probability plot is a graphical technique for assessing whether or not a data set follows a given distribution such as the normal or Weibull, and for visually estimating the location and scale parameters of the chosen distribution. The data… …   Wikipedia

  • Probability plot correlation coefficient plot — Many statistical analyses are based on distributional assumptions about the population from which the data have been obtained. However, distributional families can have radically different shapes depending on the value of the shape parameter.… …   Wikipedia

  • Normal distribution — This article is about the univariate normal distribution. For normally distributed vectors, see Multivariate normal distribution. Probability density function The red line is the standard normal distribution Cumulative distribution function …   Wikipedia

  • Plot (graphics) — Scatterplot of the eruption interval for Old Faithful (a geyser). A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a… …   Wikipedia

  • Normal score — The term normal score is used with two different meanings in statistics. One of them relates to creating a single value which can be treated as if it had arisen from a standard (zero mean, unit variance) normal distribution. The second relates to …   Wikipedia

  • Q-Q plot — Not to be confused with P P plot. A normal Q Q plot of randomly generated, independent standard exponential data, (X   Exp(1)). This Q Q plot compares a sample of data on the vertical axis to a statistical population on the horizontal… …   Wikipedia

  • Distribución normal — Saltar a navegación, búsqueda Distribución normal Función de densidad de probabilidad La línea verde corresponde a la distribución normal estandar Función de distribución de probabilidad …   Wikipedia Español

  • Normal-inverse Gaussian distribution — Normal inverse Gaussian (NIG) parameters: μ location (real) α tail heavyness (real) β asymmetry parameter (real) δ scale parameter (real) support …   Wikipedia

  • Normal Distribution — A probability distribution that plots all of its values in a symmetrical fashion and most of the results are situated around the probability s mean. Values are equally likely to plot either above or below the mean. Grouping takes place at values… …   Investment dictionary

  • Copula (probability theory) — In probability theory and statistics, a copula can be used to describe the dependence between random variables. Copulas derive their name from linguistics. The cumulative distribution function of a random vector can be written in terms of… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”