Multivariate Polya distribution

Multivariate Polya distribution

The multivariate Pólya distribution, also called the Dirichlet compound multinomial distribution, is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector alpha, and a set of discrete samples x is drawn from the multinomial distribution with probability vector p. The compounding corresponds to a Polya urn scheme. In document classification, for example, the distribution is used to represent probabilities over word counts for different document types.

The probability of a vector of counts x given the parameter vector alpha is obtained by integrating out the parameters p of the multinomial distribution: extrm{P}(mathbf{x}midmathbf{alpha})=int_{mathbf{p extrm{P}(mathbf{x}mid mathbf{p}) extrm{P}(mathbf{p}midmathbf{alpha}) extrm{d}mathbf{p}

which results in the following explicit formula:

extrm{P}(mathbf{x}midmathbf{alpha})=frac{left(sum_{k}n_{k} ight)!}{prod_{k}left(n_{k}! ight)}frac{Gammaleft(sum_{k}alpha_{k} ight)}{Gammaleft(sum_{k}n_{k}+alpha_{k} ight)}prod_{k}frac{Gamma(n_{k}+alpha_{k})}{Gamma(alpha_{k})}

where Gamma is the gamma function, and n_{k} is the number of times the outcome in x was k.

The two-dimensional version of the multivariate Pólya distribution is known as the Beta-binomial model.

The multivariate Pólya distribution is used in automated document classification and clustering, genetics, economy, combat modeling, and quantitative marketing.

ee also

* Beta-binomial model
* Chinese restaurant process
* Dirichlet process
* Generalized Dirichlet distribution
* George Pólya
* Urn problem

References

*Elkan, C. (2006) [http://www.icml2006.org/icml_documents/camera-ready/037_Clustering_Documents.pdf Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution] . ICML, 289-296
*Kvam, P. and Day, D. (2001) The multivariate Polya distribution in combat modeling. Naval Research Logistics, 48, 1-17
*Madsen, RE., Kauchak, D. and Elkan, C. (2005) [http://www.cse.ucsd.edu/~dkauchak/kauchak05modeling.pdf Modeling Word Burstiness Using the Dirichlet Distribution] . ICML, 545-552
*Minka, T. (2003) [http://research.microsoft.com/~minka/papers/dirichlet/ Estimating a Dirichlet distribution] . Technical report Microsoft Research. Includes Matlab code for fitting distributions to data.
*Wagner, U. and Taudes, A. (1986) A Multivariate Polya Model of Brand Choice and Purchase Incidence. Marketing Science, 5(3), 219-244.


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Multivariate Pólya distribution — The multivariate Pólya distribution, named after George Pólya, also called the Dirichlet compound multinomial distribution, is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter …   Wikipedia

  • Multivariate stable distribution — multivariate stable Probability density function Heatmap showing a Multivariate (bivariate) stable distribution with α = 1.1 parameters: exponent shift/location vector …   Wikipedia

  • Multivariate Student distribution — Multivariate Student parameters: location (real vector) Σ scale matrix (positive definite real matrix) n is the degree of freedom support …   Wikipedia

  • Multivariate normal distribution — MVN redirects here. For the airport with that IATA code, see Mount Vernon Airport. Probability density function Many samples from a multivariate (bivariate) Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the… …   Wikipedia

  • Dirichlet distribution — Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4). In probability and… …   Wikipedia

  • Multinomial distribution — Multinomial parameters: n > 0 number of trials (integer) event probabilities (Σpi = 1) support: pmf …   Wikipedia

  • George Pólya — (b. December 13, 1887 ndash; d. September 7, 1985, in Hungarian Pólya György ) was a Hungarian mathematician.Life and worksHe was born as Pólya György in Budapest, Hungary, and died in Palo Alto, California, USA. He was a professor of mathematics …   Wikipedia

  • Compound probability distribution — In probability theory, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution F with an unknown parameter θ that is… …   Wikipedia

  • Generalized Dirichlet distribution — In statistics, the generalized Dirichlet distribution (GD) is a generalization of the Dirichlet distribution with a more general covariance structure and twice the number of parameters. Random variables with a GD distribution are neutral [R. J.… …   Wikipedia

  • Negative binomial distribution — Probability mass function The orange line represents the mean, which is equal to 10 in each of these plots; the green line shows the standard deviation. notation: parameters: r > 0 number of failures until the experiment is stopped (integer,… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”