Multivariate Pólya distribution

Multivariate Pólya distribution

The multivariate Pólya distribution, named after George Pólya, also called the Dirichlet compound multinomial distribution, is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector α, and a set of discrete samples is drawn from the categorical distribution with probability vector p. The compounding corresponds to a Polya urn scheme. In document classification, for example, the distribution is used to represent probabilities over word counts for different document types.

Contents

Probability mass function

We are doing N independent draws from a categorical distribution with K categories. Let x=(n1,n2,...,nK) denote the vector of counts, where nk is the number of times category k was drawn. If the parameter of the categorical distribution is given as p=(p1,p2,...,pK), where pk is the probability to draw value k, the probability distribution for counts, P(x|p) is given by the associated multinomial distribution with parameter p. But now p is not given, but instead considered drawn from a Dirichlet distribution with parameter vector \boldsymbol\alpha=(\alpha_1,\alpha_2,\ldots,\alpha_K). The resulting compound distribution is obtained by integrating out p:

\Pr(\mathbf{x}\mid\boldsymbol{\alpha})=\int_{\mathbf{p}}\Pr(\mathbf{x}\mid \mathbf{p})\Pr(\mathbf{p}\mid\boldsymbol{\alpha})\textrm{d}\mathbf{p}

which results in the following explicit formula:

\Pr(\mathbf{x}\mid\boldsymbol{\alpha})=\frac{N!}
{\prod_{k}\left(n_{k}!\right)}\frac{\Gamma\left(A\right)}
{\Gamma\left(N+A\right)}\prod_{k}\frac{\Gamma(n_{k}+\alpha_{k})}{\Gamma(\alpha_{k})}

where Γ is the gamma function, with

A=\sum_k \alpha_k \,\text{and}\; N=\sum_k n_k.

Another form

The probability mass function may be written more compactly in terms of the beta function, as follows:

\Pr(\mathbf{x}\mid\boldsymbol{\alpha})=\frac{N B\left(A,N\right)}
{\prod_{k:n_k>0} n_k B\left(\alpha_k,n_k \right)}

where B is the beta function.

Related distributions

The one-dimensional version of the multivariate Pólya distribution is known as the Beta-binomial distribution.

Uses

The multivariate Pólya distribution is used in automated document classification and clustering, genetics, economy, combat modeling, and quantitative marketing.

See also



References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Multivariate Polya distribution — The multivariate Pólya distribution, also called the Dirichlet compound multinomial distribution, is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector alpha, and a set… …   Wikipedia

  • Multivariate stable distribution — multivariate stable Probability density function Heatmap showing a Multivariate (bivariate) stable distribution with α = 1.1 parameters: exponent shift/location vector …   Wikipedia

  • Multivariate Student distribution — Multivariate Student parameters: location (real vector) Σ scale matrix (positive definite real matrix) n is the degree of freedom support …   Wikipedia

  • Multivariate normal distribution — MVN redirects here. For the airport with that IATA code, see Mount Vernon Airport. Probability density function Many samples from a multivariate (bivariate) Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the… …   Wikipedia

  • Dirichlet distribution — Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4). In probability and… …   Wikipedia

  • Multinomial distribution — Multinomial parameters: n > 0 number of trials (integer) event probabilities (Σpi = 1) support: pmf …   Wikipedia

  • George Pólya — (b. December 13, 1887 ndash; d. September 7, 1985, in Hungarian Pólya György ) was a Hungarian mathematician.Life and worksHe was born as Pólya György in Budapest, Hungary, and died in Palo Alto, California, USA. He was a professor of mathematics …   Wikipedia

  • Compound probability distribution — In probability theory, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution F with an unknown parameter θ that is… …   Wikipedia

  • Generalized Dirichlet distribution — In statistics, the generalized Dirichlet distribution (GD) is a generalization of the Dirichlet distribution with a more general covariance structure and twice the number of parameters. Random variables with a GD distribution are neutral [R. J.… …   Wikipedia

  • Negative binomial distribution — Probability mass function The orange line represents the mean, which is equal to 10 in each of these plots; the green line shows the standard deviation. notation: parameters: r > 0 number of failures until the experiment is stopped (integer,… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”