- Hotelling's T-square distribution
In
statistics , Hotelling's T-square statistic, [ H. Hotelling (1931) "The generalization of Student's ratio", Ann. Math. Statist., Vol. 2, pp 360–378.] named forHarold Hotelling ,is a generalization of Student's t statistic that is used in multivariate hypothesis testing.Hotelling's T-square statistic is defined as
:t^2=n({mathbf x}-{mathbfmu})'{mathbf W}^{-1}({mathbf x}-{mathbfmu})
where "n" is a number of points (see below) mathbf x} is a column vector of p elements and mathbf W} is a p imes p
sample covariance matrix .If xsim N_p(mu,{mathbf V}) is a random variable with a
multivariate Gaussian distribution and mathbf W}sim W_p(m,{mathbf V}) (independent of "x") has aWishart distribution with the same non-singular variance matrix mathbf V and with m=n-1,then the distributionof t^2 is T^2(p,m), Hotelling's T-square distribution with parameters "p" and "m".It can be shown that:frac{m-p+1}{pm}T^2sim F_{p,m-p+1}where F is the
F-distribution .Now suppose that
:mathbf x}_1,dots,{mathbf x}_n
are "p"×1
column vector s whose entries arereal number s. Let:overline{mathbf x}=(mathbf{x}_1+cdots+mathbf{x}_n)/n
be their
mean . Let the "p"×"p"positive-definite matrix :mathbf W}=sum_{i=1}^n (mathbf{x}_i-overline{mathbf x})(mathbf{x}_i-overline{mathbf x})'/(n-1)
be their "
sample variance " matrix. (The transpose of any matrix "M" is denoted above by "M"′). Let μ be some known "p"×1 column vector (in applications a hypothesized value of a population mean). Then Hotelling's T-square statistic is:t^2=n(overline{mathbf x}-{mathbfmu})'{mathbf W}^{-1}(overline{mathbf x}-{mathbfmu}).
Note that t^2 is closely related to the squared
Mahalanobis distance .:mathbf{W} sim W_p(V,n-1)
and is independent of overline{mathbf x}, and
:overline{mathbf x}sim N_p(mu,V/n).
This implies that:
:t^2 = n(overline{mathbf x}-{mathbfmu})'{mathbf W}^{-1}(overline{mathbf x}-{mathbfmu}) sim T^2(p, n-1).
Hotelling's two-sample T-square statistic
If mathbf x}_1,dots,{mathbf x}_{n_x}sim N_p(oldsymbol{mu},{mathbf V}) and mathbf y}_1,dots,{mathbf y}_{n_y}sim N_p(oldsymbol{mu}_Y,{mathbf V}), with the samples independently drawn from two independent
multivariate normal distribution s with the same mean and covariance, and we define:overline{mathbf x}=frac{1}{n_x}sum_{i=1}^{n_x} mathbf{x}_i qquad overline{mathbf y}=frac{1}{n_y}sum_{i=1}^{n_y} mathbf{y}_ias the sample means, and:mathbf W}= frac{sum_{i=1}^{n_x}(mathbf{x}_i-overline{mathbf x})(mathbf{x}_i-overline{mathbf x})'+sum_{i=1}^{n_y}(mathbf{y}_i-overline{mathbf y})(mathbf{y}_i-overline{mathbf y})'}{n_x+n_y-2}as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-square statistic is
:t^2 = frac{n_x n_y}{n_x+n_y}(overline{mathbf x}-overline{mathbf y})'{mathbf W}^{-1}(overline{mathbf x}-overline{mathbf y})sim T^2(p, n_x+n_y-2)
and it can be related to the F-distribution by
:frac{n_x+n_y-p-1}{(n_x+n_y-2)p}t^2 sim F(p,n_x+n_y-1-p).
The non-null distribution of this statistic is the
noncentral F-distribution (the ratio of a non-central Chi-square random variable and an independent centralChi-square random variable) :frac{n_x+n_y-p-1}{(n_x+n_y-2)p}t^2 sim F(p,n_x+n_y-1-p;delta),with :delta = frac{n_x n_y}{n_x+n_y}oldsymbol{ u}'mathbf{V}^{-1}oldsymbol{ u},where oldsymbol{ u} is the difference vector between the population means.ee also
*
Student's t-distribution (the univariate equivalent)
*F-distribution (commonly tabulated or available in software libraries, and hence used for testing the T-square statistic using the relationship given above)
*Wilks' lambda distribution (inmultivariate statistics Wilks' Lambda is to Hotelling's T^2 as Snedecor's F is to Student's t in univariate statistics).References
Wikimedia Foundation. 2010.