Pearson distribution

Pearson distribution

The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.

History

The Pearson system was originally devised in an effort to model visibly skewed observations. It was well known at the time how to adjust a theoretical model to fit the first two cumulants or moments of observed data: Any probability distribution can be extended straightforwardly to form a location-scale family. Except in pathological cases, a location-scale family can be made to fit the observed mean (first cumulant) and variance (second cumulant) arbitrarily well. However, it was not known how to construct probability distributions in which the skewness (standardized third cumulant) and kurtosis (standardized fourth cumulant) could be adjusted equally freely. This need became apparent when trying to fit known theoretical models to observed data that exhibited skewness. Pearson's examples include survival data, which are usually asymmetric.

In his original paper, Pearson (1895, p. 360) identified four types of distributions (numbered I through IV) in addition to the normal distribution (which was originally known as type V). The classification depended on whether the distributions were supported on a bounded interval, on a half-line, or on the whole real line; and whether they were potentially skewed or necessarily symmetric. A second paper (Pearson 1901) fixed two omissions: it redefined the type V distribution (originally just the normal distribution, but now the inverse-gamma distribution) and introduced the type VI distribution. Together the first two papers cover the five main types of the Pearson system (I, III, VI, V, and IV). In a third paper, Pearson (1916) introduced further special cases and subtypes (VII through XII).

Rhind (1909, pp. 430–432) devised a simple way of visualizing the parameter space of the Pearson system, which was subsequently adopted by Pearson (1916, plate 1 and pp. 430ff., 448ff.). The Pearson types are characterized by two quantities, commonly referred to as eta_1 and eta_2. The first is the square of the skewness: eta_1 = gamma_1^2 where gamma_1 is the skewness, or third standardized moment. The second is the traditional kurtosis, or fourth standardized moment: eta_2 = gamma_2 + 3. (Modern treatments define kurtosis gamma_2 in terms of cumulants instead of moments, so that for a normal distribution we have gamma_2=0 and eta_2=3. Here we follow the historical precedent and use eta_2.) The diagram on the right shows which Pearson type a given concrete distribution (identified by a point (eta_1, eta_2)) belongs to.

Many of the skewed and/or non-mesokurtic distributions familiar to us today were still unknown in the early 1890s. What is now known as the beta distribution had been used by Thomas Bayes as a posterior distribution of the parameter of a Bernoulli distribution in his 1763 work on inverse probability. The Beta distribution gained prominence due to its membership in Pearson's system and was known until the 1940s as the Pearson type I distribution. [cite web
url = http://members.aol.com/jeff570/b.html
title = Beta distribution
accessmonthday = December 9
accessyear = 2006
last = Miller
first = Jeff
coauthors = et al.
date = 2006-07-09
work = [http://members.aol.com/jeff570/mathword.html "Earliest Known Uses of Some of the Words of Mathematics"]
] (Pearson's type II distribution is a special case of type I, but is usually no longer singled out.) The gamma distribution originated from Pearson's work (Pearson 1893, p. 331; Pearson 1895, pp. 357, 360, 373–376) and was known as the Pearson type III distribution, before acquiring its modern name in the 1930s and 1940s. [cite web
url = http://members.aol.com/jeff570/g.html
title = Gamma distribution
accessmonthday = December 9
accessyear = 2006
last = Miller
first = Jeff
coauthors = et al.
date = 2006-12-07
work = [http://members.aol.com/jeff570/mathword.html "Earliest Known Uses of Some of the Words of Mathematics"]
] Pearson's 1895 paper introduced the type IV distribution, which contains Student's "t"-distribution as a special case, predating William Gosset's subsequent use by several years. His 1901 paper introduced the inverse-gamma distribution (type V) and the beta prime distribution (type VI).

Definition

A Pearson density "p" is defined to be any valid solution to the differential equation (cf. Pearson 1895, p. 381)

:frac{p'(x)}{p(x)} + frac{a+x-lambda}{b_2 (x-lambda)^2 + b_1 (x-lambda) + b_0} = 0.qquad (1) !with : b_0=frac{4 eta_2-3 eta_1}{10 eta_2 -12eta_1 -18} mu_2 ,

a=b_1=sqrt{mu_2 eta_1}frac{eta_2+3}{10 eta2-12eta_1 -18},

b_2=frac{2 eta_2-3 eta_1 -6}{10 eta2-12eta_1 -18} .

According to Ord [Ord J.K. (1972) p2] , Pearson devised the underlying form of Equation (1) on the basis of, firstly, the formula for the derivative of the logarithm of the density function of the normal distribution (which gives a linear function) and, secondly, from a recurrence relation for values in the probability mass function of the hypergeometric distribution (which yields the linear-divided-by-quadratic structure).

In Eqaution (1), the parameter "a"0 determines a stationary point, and hence under some conditions a mode of the distribution, since

:p'(a_0) = 0 !

follows directly from the differential equation.

Since we are confronted with a linear differential equation with variable coefficients, its solution is straightforward:

:p(x) propto expleft( -!int!!frac{x-a}{b_2 x^2 + b_1 x + b_0} ,mathrm{d}x ight).!

The integral in this solution simplifies considerably when certain special cases of the integrand are considered. Pearson (1895, p. 367) distinguished two main cases, determined by the sign of the discriminant (and hence the number of real roots) of the quadratic function

:f(x) = b_2,x^2 + b_1,x + b_0.qquad (2)!

Particular types of distribution

Case 1, negative discriminant: The Pearson type IV distribution

If the discriminant of the quadratic function (2) is negative (b_1^2 - 4 b_2 b_0 < 0), it has no real roots. Then define

:y = x + frac{b_1}{2,b_2} ! and

:alpha = frac{sqrt{4,b_2,b_0 - b_1^2,{2,b_2}. !

Observe that alpha is a well-defined real number and alpha eq 0, because by assumption 4 b_2 b_0 - b_1^2 > 0 and therefore b_2 eq 0. Applying these substitutions, the quadratic function (2) is transformed into

:f(x) = b_2,(y^2 + alpha^2). !

The absence of real roots is obvious from this formulation, because alpha^2 is necessarily positive.

We now express the solution to the differential equation (1) as a function of "y":

:p(y) propto expleft(-frac{1}{b_2},intfrac{y - frac{b_1}{2,b_2} - a}{y^2 + alpha^2} ,mathrm{d}y ight). !

Pearson (1895, p. 362) called this the "trigonometrical case", because the integral

:intfrac{y - frac{2,b_2,a + b_1}{2,b_2{y^2 + alpha^2} ,mathrm{d}y= frac{1}{2} ln(y^2 + alpha^2)- frac{2,b_2,a + b_1}{2,b_2,alpha} arctanleft(frac{y}{alpha} ight)+ C_0!

involves the inverse trigonometic arctan function. Then

:p(y) propto expleft [-frac{1}{2,b_2} ln!left(1+frac{y^2}{alpha^2} ight)-frac{lnalpha}{2,b_2}+frac{2,b_2,a + b_1}{2,b_2^2,alpha} arctanleft(frac{y}{alpha} ight)+ C_1 ight] !

Finally, let

:m = frac{1}{2,b_2} ! and

: u = -frac{2,b_2,a + b_1}{2,b_2^2,alpha} !

Applying these substitutions, we obtain the parametric function:

:p(y) propto left [1 + frac{y^2}{alpha^2} ight] ^{-m}expleft [- u arctanleft(frac{y}{alpha} ight) ight] !

This unnormalized density has support on the entire real line. It depends on a scale parameter alpha > 0 and shape parameters m>1/2 and u. One parameter was lost when we chose to find the solution to the differential equation (1) as a function of "y" rather than "x". We therefore reintroduce a fourth parameter, namely the location parameter "λ". We have thus derived the density of the Pearson type IV distribution:

:p(x) =frac{left|frac{Gamma!left(m+frac{ u}{2}i ight)}{Gamma(m)} ight|^2} {alpha,mathrm{Beta}!left(m-frac12, frac12 ight)}left [1 + left(frac{x-lambda}{alpha} ight)^{!2,} ight] ^{-m}expleft [- u arctanleft(frac{x-lambda}{alpha} ight) ight] .!

The normalizing constant involves the complex Gamma function (Γ) and the Beta function (B).

The Pearson type VII distribution

The shape parameter "ν" of the Pearson type IV distribution controls its skewness. If we fix its value at zero, we obtain a symmetric three-parameter family. This special case is known as the Pearson type VII distribution (cf. Pearson 1916, p. 450). Its density is

:p(x) =frac{1}{alpha,mathrm{Beta}!left(m-frac12, frac12 ight)}left [1 + left(frac{x-lambda}{alpha} ight)^{!2,} ight] ^{-m},!

where B is the Beta function.

An alternative parameterization (and slight specialization) of the type VII distribution is obtained by letting

:alpha = sigma,sqrt{2,m-3}, !

which requires m > 3/2. This entails a minor loss of generality but ensures that the variance of the distribution exists and is equal to sigma^2. Now the parameter "m" only controls the kurtosis of the distribution. If "m" approaches infinity as "λ" and "σ" are held constant, the normal distribution arises as a special case:

:lim_{m oinfty} frac{1}{sigma,sqrt{2,m-3},mathrm{Beta}!left(m-frac12, frac12 ight)}left [1 + left(frac{x-lambda}{sigma,sqrt{2,m-3 ight)^{!2,} ight] ^{-m}!

:= frac{1}{sigma,sqrt{2},Gamma!left(frac12 ight)} imeslim_{m oinfty}frac{Gamma(m)}{Gamma!left(m-frac12 ight) sqrt{m-frac32 imeslim_{m oinfty}left [1 + frac{left(frac{x-lambda}{sigma} ight)^2}{2,m-3} ight] ^{-m}!

:= frac{1}{sigmasqrt{2,pi imes1 imesexp!left [-frac12 left(frac{x-lambda}{sigma} ight)^{!2,} ight] !

This is the density of a normal distribution with mean "λ" and standard deviation "σ".

It is convenient to require that m > 5/2 and to let

:m = frac52 + frac{3}{gamma_2}. !

This is another specialization, and it guarantees that the first four moments of the distribution exist. More specifically, the Pearson type VII distribution parameterized in terms of (lambda,sigma,gamma_2) has a mean of "λ", standard deviation of "σ", skewness of zero, and excess kurtosis of gamma_2.

Student's "t"-distribution

The Pearson type VII distribution subsumes Student's "t"-distribution, and hence also the Cauchy distribution. Student's "t"-distribution arises as the result of applying the following substitutions to its original parameterization:

:lambda = 0, !

:alpha = sqrt{ u}, ! and

:m = frac{ u+1}{2}, !

where u > 0. Observe that the constraint m > 1/2 is satisfied. The density of this restricted one-parameter family is

:p(x) =frac{1}{sqrt{ u},mathrm{Beta}!left(frac{ u}{2}, frac12 ight)}left [1 + frac{x^2}{ u} ight] ^{-frac{ u+1}{2,!

which is easily recognized as the density of Student's "t"-distribution.

Case 2, non-negative discriminant

If the quadratic function (2) has a non-negative discriminant (b_1^2 - 4 b_2 b_0 geq 0), it has real roots "a"1 and "a"2 (not necessarily distinct):

:a_1 = frac{-b_1 - sqrt{b_1^2 - 4 b_2 b_0{2 b_2}, !

:a_2 = frac{-b_1 + sqrt{b_1^2 - 4 b_2 b_0{2 b_2}, !

One have to define :: m_1=frac{a+a_1}{c_2 (a_2-a_1)} !: m_2=-frac{a+a_2}{c_2 (a_2-a_1)}!: C_1=frac{c_1}{2 c_2}!

In the presence of real roots the quadratic function (2) can be written as

:f(x) = b_2,(x-a_1)(x-a_2), !

and the solution to the differential equation is therefore

:p(x) propto expleft( -frac{1}{b_2} int!!frac{x-a}{(x - a_1) (x - a_2)} ,mathrm{d}x ight). !

Pearson (1895, p. 362) called this the "logarithmic case", because the integral

:int!!frac{x-a}{(x - a_1) (x - a_2)} ,mathrm{d}x= frac{(a_1-a)ln(x-a_1) - (a_2-a)ln(x-a_2)}{a_1-a_2} + C!

involves only the logarithm function, and not the arctan function as in the previous case.

Using the substitution

: u = frac{1}{b_2,(a_1-a_2)} !

we obtain the following solution to the differential equation (1):

:p(x) propto(x-r_1)^{- u (a_1-a)} (x-a_2)^{ u (a_2-a)}.!

Since this density is only known up to a hidden constant of proportionality, that constant can be changed and the density written as follows:

:p(x) proptoleft(1-frac{x}{a_1} ight)^{- u (a_1-a)}left(1-frac{x}{a_2} ight)^{ u (a_2-a)}!

The Pearson type I and type II distribution

The Pearson type I distribution (a generalization of the beta distribution) arises when the roots of the quadratic equation (2) are of opposite sign, that is, r_1 < 0 < r_2. Then the solution "p" is supported on the interval (r_1, r_2). Apply the substition

:x = a_1 + y (a_2 - a_1) qquad mbox{where} 0

which yields a solution in terms of "y" that is supported on the interval (0,1):

:p(y) proptoleft(frac{a_1-a_2}{a_1};y ight)^{(-a_1+a) u}left(frac{a_2-a_1}{a_2};(1-y) ight)^{(a_2-a) u}.!

Regrouping constants and parameters, this simplifies to:

:p(y) propto y^{m_1} (1-y)^{m_2}, !

Thus frac{x-lambda-a_1}{a_2-a_1}! follows a beta(m_1+1,m_2+1)!with lambda=mu_1- (a_2-a_1) frac{m_1+1}{m_1+m_2+2}-a_1 !

It turns out that m_1>-1 land m_2>-1 is necessary and sufficient for "p" to be a proper probability density function.

The Pearson type II distribution

The Pearson type II distribution is a special case of the Pearson type I family restricted to symmetric distributions.

For the Pearson Type II Curve [cite web
url = http://links.jstor.org/sici?sici=0362-9791(198923)14%3A3%3C245%3ACVFSRO%3E2.0.CO%3B2-L
title = Critical Values for Spearman's Rank Order Correlation
accessmonthday = August 22
accessyear = 2007
last = Ramsey
first = Philip H.
date = 1989-09-01
] ,

:y = y_{0}left(1-frac{x^2}{a^2} ight)^m

where

:x = sum d^2/2 -(n^3-n)/12

the ordinate, "y", is the frequency of sum d^2. The Pearson Type II Curve is used in computing the table of significant correlation coefficients for Spearman's rank correlation coefficient when the number of items in a series is less than 100 (or 30, depending on some sources). After that, the distribution mimics a standard Student's t-distribution. For the table of values, certain values are used as the constants in the previous equation:

:m = frac{5eta_{2}-9}{2(3-eta_{2})}:a^2 = frac{2mu_{2}eta_{2{3-eta_{2:y_{0} = frac{N [Gamma(2m+2)] }{a [2^{2m+1}] [Gamma(m+1)] }

The moments of "x" used are

:mu_{2} = (n-1) [(n^2+n)/12] ^2:eta_{2}=frac{3(25n^4-13n^3-73n^2+37n+72)}{25n(n+1)^2(n-1)}

The Pearson type III distribution

:lambda= mu1 + frac{b_0}{b_1} - (m+1) b_1!:b_0+b_1 (x-lambda)! follows a :gamma(m+1,b_1^2)!Pearson type III distribution gamma distribution, chi-square distribution

The Pearson type V distribution

:lambda=mu_1-frac{a-C_1} {1-2 b_2}!:x-lambda! follows a :inversegamma(frac{1}{b_2}-1,frac{a-C_1}{b_2})!Pearson type V distribution inverse-gamma distribution

The Pearson type VI distribution

:lambda=mu_1 + (a_2-a_1) frac{m_2+1}{m_2+m_1+2} - a_2!:frac{x-lambda-a_2}{a_2-a_1}! follows a :betaprime(m_2+1,-m_2-m_1-1)!Pearson type VI distribution beta prime distribution, "F"-distribution

Relation to other distributions

The Pearson family subsumes the following distributions, among others:

* beta distribution (type I)
* beta prime distribution (type VI)
* Cauchy distribution (type IV)
* chi-square distribution (type III)
* continuous uniform distribution (limit of type I)
* exponential distribution (type III)
* gamma distribution (type III)
* "F"-distribution (type VI)
* inverse-chi-square distribution (type V)
* inverse-gamma distribution (type V)
* normal distribution (limit of type I, III, IV, V, or VI)
* Student's "t"-distribution (type IV)

Applications

These models are used in financial markets, given their ability to be parametrised in a way that has intuitive meaning for market traders. A number of models are in current use that capture the stochastic nature of the volatility of rates, stocks etc. and this family of distributions may prove to be one of the more important.

In the United States, the Log-Pearson III is the default distribution for flood frequency analysis.

Notes

Sources

Primary sources

*cite journal
last = Pearson
first = Karl
authorlink = Karl Pearson
year = 1893
month =
title = Contributions to the mathematical theory of evolution [abstract]
journal = Proceedings of the Royal Society of London
volume = 54
issue =
pages = 329&ndash;333
doi = 10.1098/rspl.1893.0079
id =
url = http://links.jstor.org/sici?sici=0370-1662%281893%2954%3C329%3ACTTMTO%3E2.0.CO%3B2-D

*cite journal
last = Pearson
first = Karl
authorlink = Karl Pearson
year = 1895
month =
title = Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material
journal = Philosophical Transactions of the Royal Society of LondonARRAY
volume = 186
issue =
pages = 343&ndash;414
doi = 10.1098/rsta.1895.0010
id =
url = http://links.jstor.org/sici?sici=0264-3820%281895%29186%3C343%3ACTTMTO%3E2.0.CO%3B2-W

*cite journal
last = Pearson
first = Karl
authorlink = Karl Pearson
year = 1901
month =
title = Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation
journal = Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character
volume = 197
issue =
pages = 443&ndash;459
doi = 10.1098/rsta.1901.0023
id =
url = http://links.jstor.org/sici?sici=0264-3952%281901%29197%3C443%3AMCTTTO%3E2.0.CO%3B2-S

*cite journal
last = Pearson
first = Karl
authorlink = Karl Pearson
year = 1916
month =
title = Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation
journal = Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character
volume = 216
issue =
pages = 429&ndash;457
doi = 10.1098/rsta.1916.0009
id =
url = http://links.jstor.org/sici?sici=0264-3952%281916%29216%3C429%3AMCTTTO%3E2.0.CO%3B2-7

*cite journal
last = Rhind
first = A.
authorlink =
year = 1909
month = July/October
title = Tables to facilitate the computation of the probable errors of the chief constants of skew frequency distributions
journal = Biometrika
volume = 7
issue = 1/2
pages = 127&ndash;147
doi =
id =
url = http://links.jstor.org/sici?sici=0006-3444%28190907%2F10%297%3A1%2F2%3C127%3ATTFTCO%3E2.0.CO%3B2-R

Secondary sources

* Milton Abramowitz and Irene A. Stegun (1964). "Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables". National Bureau of Standards.

*Eric W. Weisstein et al. [http://mathworld.wolfram.com/PearsonTypeIIIDistribution.html Pearson Type III Distribution] . From MathWorld.

References

*Elderton, Sir W.P, Johnson, N.L. (1969) "Systems of Frequency Curves". Cambridge University Press.
*Ord J.K. (1972) "Families of Frequency Distributions". Griffin, London.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Pearson's chi-squared test — (χ2) is the best known of several chi squared tests – statistical procedures whose results are evaluated by reference to the chi squared distribution. Its properties were first investigated by Karl Pearson in 1900.[1] In contexts where it is… …   Wikipedia

  • Pearson's chi-square test — Pearson s chi square ( chi;2) test is the best known of several chi square tests – statistical procedures whose results are evaluated by reference to the chi square distribution. Its properties were first investigated by Karl Pearson. In contexts …   Wikipedia

  • Pearson — may refer to:In places: *Pearson, California, an unincorporated community *Pearson, Georgia, a US city *Pearson, Texas, an unincorporated community in the US *Pearson, Victoria, a ghost town in Australia.Other: *Pearson PLC, the media… …   Wikipedia

  • Pearson Scott Foresman — is an elementary educational publisher for PreK through Grade 6 in all subject areas. Company History Scott Foresman and Company was founded in 1896 by Erastus Howard Scott, Editor and President; Hugh A. Foresman, Salesman and Secretary; and his… …   Wikipedia

  • Distribution (Commerce) — Distribution (management) Le management de la distribution est l organisation de la mise à disposition d un produit ou d un service à un intermédiaire ou un consommateur final. Cette organisation sera différente si ce management est réalisé par… …   Wikipédia en Français

  • Distribution (commerce) — Distribution (management) Le management de la distribution est l organisation de la mise à disposition d un produit ou d un service à un intermédiaire ou un consommateur final. Cette organisation sera différente si ce management est réalisé par… …   Wikipédia en Français

  • Distribution (management) — Le management de la distribution est l organisation de la mise à disposition d un produit ou d un service à un intermédiaire ou un consommateur final. Cette organisation sera différente si ce management est réalisé par une entreprise dont l… …   Wikipédia en Français

  • Distribution De Temps De Séjour — L’expression de distribution de temps de séjour s utilise en génie des procédés. La distribution de temps de séjour est un modèle qui permet de caractériser l hydrodynamique d un réacteur chimique et de déterminer quel modèle de réacteur définit… …   Wikipédia en Français

  • Distribution de temps de sejour — Distribution de temps de séjour L’expression de distribution de temps de séjour s utilise en génie des procédés. La distribution de temps de séjour est un modèle qui permet de caractériser l hydrodynamique d un réacteur chimique et de déterminer… …   Wikipédia en Français

  • Pearson's Candy Company — Infobox Company company name = Pearson s Candy Company company company type = Private company foundation = 1909 founder = P. Edward Pearson location city = Saint Paul, Minnesota location country = key people = Larry HasslerPresident and CEOcite… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”