Probability plot correlation coefficient plot

Probability plot correlation coefficient plot

Many statistical analyses are based on distributional assumptions about the population from which the data have been obtained. However, distributional families can have radically different shapes depending on the value of the shape parameter. Therefore, finding a reasonable choice for the shape parameter is a necessary step in the analysis. In many analyses, finding a good distributional model for the data is the primary focus of the analysis.

The probability plot correlation coefficient (PPCC) plot is a graphical technique for identifying the shape parameter for a distributional family that best describes the data set. This technique is appropriate for families, such as the Weibull, that are defined by a single shape parameter and location and scale parameters, and it is not appropriate for distributions, such as the normal, that are defined only by location and scale parameters.

Definition

The PPCC plot is formed by:
*Vertical axis: Probability plot correlation coefficient;
*Horizontal axis: Value of shape parameter. That is, for a series of values of the shape parameter, the correlation coefficient is computed for the probability plot associated with a given value of the shape parameter. These correlation coefficients are plotted against their corresponding shape parameters. The maximum correlation coefficient corresponds to the optimal value of the shape parameter. For better precision, two iterations of the PPCC plot can be generated; the first is for finding the right neighborhood and the second is for fine tuning the estimate.

The PPCC plot is used first to find a good value of the shape parameter. The probability plot is then generated to find estimates of the location and scale parameters and in addition to provide a graphical assessment of the adequacy of the distributional fit.

The PPCC plot answers the following questions:
#What is the best-fit member within a distributional family?
#Does the best-fit member provide a good fit (in terms of generating a probability plot with a high correlation coefficient)?
#Does this distributional family provide a good fit compared to other distributions?
#How sensitive is the choice of the shape parameter?

Comparing distributions

In addition to finding a good choice for estimating the shape parameter of a given distribution, the PPCC plot can be useful in deciding which distributional family is most appropriate. For example, given a set of reliability data, one might generate PPCC plots for a Weibull, lognormal, gamma, and inverse Gaussian distributions, and possibly others, on a single page. This one page would show the best value for the shape parameter for several distributions and would additionally indicate which of these distributional families provides the best fit (as measured by the maximum probability plot correlation coefficient). That is, if the maximum PPCC value for the Weibull is 0.99 and only 0.94 for the lognormal, then one could reasonably conclude that the Weibull family is the better choice.

When comparing distributional models, one should not simply choose the one with the maximum PPCC value. In many cases, several distributional fits provide comparable PPCC values. For example, a lognormal and Weibull may both fit a given set of reliability data quite well. Typically, one would consider the complexity of the distribution. That is, a simpler distribution with a marginally smaller PPCC value may be preferred over a more complex distribution. Likewise, there may be theoretical justification in terms of the underlying scientific model for preferring a distribution with a marginally smaller PPCC value in some cases. In other cases, one may not need to know if the distributional model is optimal, only that it is adequate for our purposes. That is, one may be able to use techniques designed for normally distributed data even if other distributions fit the data somewhat better.

Tukey-lambda PPCC plot for symmetric distributions

The Tukey lambda PPCC plot, with shape parameter λ, is particularly useful for symmetric distributions. It indicates whether a distribution is short or long tailed and it can further indicate several common distributions. Specifically,
#λ = −1: distribution is approximately Cauchy
#λ = 0: distribution is exactly logistic
#λ = 0.14: distribution is approximately normal
#λ = 0.5: distribution is U-shaped
#λ = 1: distribution is exactly uniform(−1, 1)If the Tukey lambda PPCC plot gives a maximum value near 0.14, one can reasonably conclude that the normal distribution is a good model for the data. If the maximum value is less than 0.14, a long-tailed distribution such as the double exponential or logistic would be a better choice. If the maximum value is near −1, this implies the selection of very long-tailed distribution, such as the Cauchy. If the maximum value is greater than 0.14, this implies a short-tailed distribution such as the Beta or uniform.

The Tukey-lambda PPCC plot is used to suggest an appropriate distribution. One should follow-up with PPCC and probability plots of the appropriate alternatives.

ee also

*Probability plot

External links

* [http://www.itl.nist.gov/div898/handbook/eda/section3/ppccplot.htm Probability Plot Correlation Coefficient Plot]

References

*cite journal
last=Filliben
first=J. J.
month = February
year = 1975
title = The Probability Plot Correlation Coefficient Test for Normality
journal = Technometrics
pages = 111–117
doi = 10.2307/1268008
volume = 17


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Probability plot — The probability plot is a graphical technique for assessing whether or not a data set follows a given distribution such as the normal or Weibull, and for visually estimating the location and scale parameters of the chosen distribution. The data… …   Wikipedia

  • Normal probability plot — The normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed. Example of a normal probability plot. The data are plotted against a theoretical normal… …   Wikipedia

  • Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation …   Wikipedia

  • Coefficient of variation — In probability theory and statistics, the coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes …   Wikipedia

  • coefficient — 1. The expression of the amount or degree of any quality possessed by a substance, or of the degree of physical or chemical change normally occurring in that substance under stated conditions. 2. The ratio or factor that relates a quantity… …   Medical dictionary

  • Q-Q plot — Not to be confused with P P plot. A normal Q Q plot of randomly generated, independent standard exponential data, (X   Exp(1)). This Q Q plot compares a sample of data on the vertical axis to a statistical population on the horizontal… …   Wikipedia

  • Partial correlation — In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. Contents 1 Formal definition 2 Computation 2.1 Using… …   Wikipedia

  • Phi coefficient — In statistics, the phi coefficient (also referred to as the mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables introduced by Karl Pearson[1]. This measure is similar to the Pearson… …   Wikipedia

  • Frequency probability — Statistical probability redirects here. For the episode of Star Trek: Deep Space Nine, see Statistical Probabilities. John Venn Frequency probability is the interpretation of probability that defines an event s probability as the limit of its… …   Wikipedia

  • Student's t-distribution — Probability distribution name =Student s t type =density pdf cdf parameters = u > 0 degrees of freedom (real) support =x in ( infty; +infty)! pdf =frac{Gamma(frac{ u+1}{2})} {sqrt{ upi},Gamma(frac{ u}{2})} left(1+frac{x^2}{ u} ight)^{ (frac{… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”