Multinomial test

In statistics, the multinomial test is the test of the null hypothesis that the parameters of a multinomial distribution equal specified values. It is used for categorical data; see Read and Cressie^[1].

We begin with a sample of $N$ items each of which has been observed to fall into one of $k$ categories. We can define $\mathbf{x} = (x_1, x_2, \dots, x_k)$ as the observed numbers of items in each cell. Hence $\textstyle \sum_{i=1}^k x_{i} = N$ .

Next, we define a vector of parameters $H_0: \mathbf{\pi} = (\pi_{1}, \pi_{2}, \dots, \pi_{k})$ , where : $\textstyle \sum_{i=1}^k \pi_{i} = 1$ . These are the parameter values under the null hypothesis.

The exact probability of the observed configuration $\mathbf{x}$ under the null hypothesis is given by

$\Pr(\mathbf{x)_0} = N! \prod_{i=1}^k \frac{\pi_{i}^{x_i}}{x_i!}.$

The significance probability for the test is the probability of occurrence of the data set observed, or of a data set less likely than that observed, if the null hypothesis is true. Using an exact test, this is calculated as

$\Pr(\mathbf{sig})=\sum_{y: Pr(\mathbf{y}) \le Pr(\mathbf{x)_0}} \Pr(\mathbf{y})$

where the sum ranges over all outcomes as likely as, or less likely than, that observed. In practice this becomes computationally onerous as $k$ and $N$ increase so it is probably only worth using exact tests for small samples. For larger samples, asymptotic approximations are accurate enough and easier to calculate.

One of these approximations is the likelihood ratio. We set up an alternative hypothesis under which each value $π i$ is replaced by its maximum likelihood estimate $p i = x i / N$ . The exact probability of the observed configuration $\mathbf{x}$ under the alternative hypothesis is given by

$\Pr(\mathbf{x)_A} = N! \prod_{i=1}^k \frac{p_{i}^{x_i}}{x_i!}.$

The natural logarithm of the ratio between these two probabilities multiplied by $- 2$ is then the statistic for the likelihood ratio test

$-2\ln(LR) = \textstyle -2\sum_{i=1}^k x_{i}\ln(\pi_{i}/p_{i}) .$

If the null hypothesis is true, then as $N$ increases, the distribution of $- 2ln(L R)$ converges to that of chi-squared with $k - 1$ degrees of freedom. However it has long been known (e.g. Lawley 1956) that for finite sample sizes, the moments of $- 2ln(L R)$ are greater than those of chi-squared, thus inflating the probability of type I errors (false positives). The difference between the moments of chi-squared and those of the test statistic are a function of $N - 1$ . Williams (1976) showed that the first moment can be matched as far as $N - 2$ if the test statistic is divided by a factor given by

$q_1 = 1+\frac{\sum_{i=1}^k \pi_{i}^{-1}-1}{6N(k-1)}.$

In the special case where the null hypothesis is that all the values $π i$ are equal to $1 / k$ (i.e. it stipulates a uniform distribution), this simplifies to

$q_1 = 1+\frac{k+1}{6N}.$

Subsequently, Smith et al. (1981) derived a dividing factor which matches the first moment as far as $N - 3$ . For the case of equal values of $π i$ , this factor is

$q_2 = 1+\frac{k+1}{6N}+\frac{k^2}{6N^2}.$

The null hypothesis can also be tested by using Pearson's chi-squared test

$\chi^2 = \sum_{i=1}^{k} {(x_i - E_i)^2 \over E_i}$

where $E i = N π i$ is the expected number of cases in category $i$ under the null hypothesis. This statistic also converges to a chi-squared distribution with $k - 1$ degrees of freedom when the null hypothesis is true but does so from below, as it were, rather than from above as $- 2ln(L R)$ does, so may be preferable to the uncorrected version of $- 2ln(L R)$ for small samples.

References

^ Read, T. R. C. and Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. New York: Springer-Verlag. ISBN 0-387-96682-X.

Lawley, D. N. (1956). "A General Method of Approximating to the Distribution of Likelihood Ratio Criteria". Biometrika 43: 295–303.
Smith, P. J., Rae, D. S., Manderscheid, R. W. and Silbergeld, S. (1981). "Approximating the Moments and Distribution of the Likelihood Ratio Statistic for Multinomial Goodness of Fit". Journal of the American Statistical Association (American Statistical Association) 76 (375): 737–740. doi:10.2307/2287541. JSTOR 2287541.
Williams, D. A. (1976). "Improved Likelihood Ratio Tests for Complete Contingency Tables". Biometrika 63: 33–37. doi:10.1093/biomet/63.1.33.

Categories:

Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

Multinomial distribution — Multinomial parameters: n > 0 number of trials (integer) event probabilities (Σpi = 1) support: pmf … Wikipedia
G-test — In statistics, G tests are likelihood ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi square tests were previously recommended.The commonly used chi squared tests for goodness of … Wikipedia
Binomial test — In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories.For example, suppose we have a board game that depends on the roll of a … Wikipedia
Pearson's chi-squared test — (χ2) is the best known of several chi squared tests – statistical procedures whose results are evaluated by reference to the chi squared distribution. Its properties were first investigated by Karl Pearson in 1900.[1] In contexts where it is… … Wikipedia
Pearson's chi-square test — Pearson s chi square ( chi;2) test is the best known of several chi square tests – statistical procedures whose results are evaluated by reference to the chi square distribution. Its properties were first investigated by Karl Pearson. In contexts … Wikipedia
Negative multinomial distribution — notation: parameters: k0 ∈ N0 the number of failures before the experiment is stopped, p ∈ Rm m vector of “success” probabilities, p0 = 1 − (p1+…+pm) the probability of a “failure”. support … Wikipedia
List of mathematics articles (M) — NOTOC M M estimator M group M matrix M separation M set M. C. Escher s legacy M. Riesz extension theorem M/M/1 model Maass wave form Mac Lane s planarity criterion Macaulay brackets Macbeath surface MacCormack method Macdonald polynomial Machin… … Wikipedia
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
Hypergeometric distribution — Hypergeometric parameters: support: pmf … Wikipedia
Predictive analytics — encompasses a variety of techniques from statistics and data mining that analyze current and historical data to make predictions about future events. Such predictions rarely take the form of absolute statements, and are more likely to be… … Wikipedia

Academic Dictionaries and Encyclopedias

Multinomial test

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Multinomial test

References

Look at other dictionaries:

Share the article and excerpts

Direct link