- Ancillary statistic
In
statistics , an ancillary statistic is a statistic whoseprobability distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken. This concept was introduced by the great statistical geneticist SirRonald Fisher .Example
Suppose "X"1, ..., "X""n" are independent and identically distributed, and are normally distributed with
expected value "μ" andvariance 1. Let:be the sample mean.The following statistical measures of dispersion of the sample
*Range: max("X"1, ..., "Xn") − min("X"1, ..., "Xn")
*Interquartile range : "Q"3 − "Q"1
*Sample variance ::: are all "ancillary statistics", because their probability distributions do not change as "μ" changes.Ancillary complement
Given a statistic "T" that is not sufficient, an ancillary complement is a statistic "U" that is ancillary to "T" and such that is sufficient. [ [http://www.utstat.toronto.edu/dfraser/documents/237.pdf Ancillary Statistics: A Review] by M. Ghosh, N. Reid and D.A.S. Fraser] Intuitively, an ancillary complement "adds the missing information" (without duplicating any).
The statistic is particularly useful if one takes "T" to be a
Maximum Likelihood Estimator , which in generally will not be sufficient; then one can ask for an ancillary complement. In this case, Fisher argues that one must condition on an ancillary complement to determine information content: one should consider theFisher information content of "T" to not be the marginal of "T", but the conditional distribution of "T", given "U": how much information does "T" "add"? This is not possible in general, as no ancillary complement need exist, and if one exists, it need not be unique, nor does a maximum ancillary complement exist.Example
In
baseball , suppose a scout observes a batter in "N" at-bats. Suppose (unrealistically) that the number "N" is chosen by some random process that is independent of the batter's ability -- say a coin is tossed after each at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number "N" of at-bats and the number "X" of hits: the data are a sufficient statistic. The observedbatting average "X"/"N" fails to convey all of the information available in the data because it fails to report the number "N" of at-bats (e.g., a batting average of .400, which is very high, based on only five at-bats does not inspire anywhere near as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number "N" of at-bats is an ancillary statistic because
* It is a part of the observable data (it is a "statistic"), and
* Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability.This ancillary statistic is an ancillary complement to the observed batting average "X"/"N", i.e., the batting average "X"/"N" is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with "N", it becomes sufficient.Notes
ee also
*
Basu's theorem
Wikimedia Foundation. 2010.