U-statistic

U-statistic

In statistical theory, a U-statistic is a specific type of estimator defined in a particular way. One use of the concept in statistical theory is that it allows a minimum-variance unbiased estimator to be derived from essentially any unbiased estimator [Cox & Hinkley (1974),p. 200, p. 258] [Hoeffding (1948), between Eq's(4.3),(4.4)] , in contexts where no assumnption is made about the form of the distribution and where estimation is for a functional (such as the mean or variance) of the unknown distribution. Of even more importance [Sen (1992)] is that the theory related to U-statistics allows a single theoretical framework to be used in non-parametric statistics to prove results for a wide range of test-statistics and estimators relating to the asymptotic normality and to the variance (in finite samples) of such quantities. In addition the theory has applications to estimators which are not themselves U-statistics.

Suppose that a problem involves independent and identically-distributed random variables and that estimation of a certain parameter is required. Suppose that a simple unbiased estimate can be constructed based on only a few observations: this defines the basic estimator based on a given number of observations. For example, a single observation is itself an unbiased estimate of the mean and a pair of observations can be used to derive an unbiased estimate of the variance. The U-statistic based on this estimator is defined as the average (across all combinatorial selections of the given size from the full set of observations) of the basic estimator applied to the sub-samples.

Note that the theory of U-statistics set out by Hoeffding (1948) is not limited to [Sen (1992), p306] the case of independent and identically-distributed random variables or to scalar random variables.

Sen (1992) provides a review of the paper by Hoeffding (1948), which introduced U-statistics and set out the theory relating to them, and in doing so outlines the importance U-statistics have in statistical thoeory. Sen says [Sen (1992) p. 307] "The impact of Hoeffding (1948) is overwhelming at the present time and is very likely to continue in the years to come".

Formal definition

The term U-statistic, due to Hoeffding (1948), is defined as follows.Let fcolon R^r o R be a real-valued or complex-valued function of r variables.For each nge r the associated U-statistic f_ncolon R^n o R isequal to the average over ordered samples varphi(1),ldots, varphi(r) of size r ofthe sample values f(xvarphi).In other words, f_n(x_1,ldots, x_n) = mathop{ave} f(x_{varphi(1)},ldots, x_{varphi(k)}),the average being taken over distinct ordered samples of size r taken from {1,ldots, n}.Each U-statistic f_n(x_1,ldots, x_n) is necessarily a symmetric function.

U-statistics are very natural in statistical work, particularly in Hoeffding's context of independent and identically distributedrandom variables.They also arise naturally in the context of simple random sampling from a finite population, where the defining propertyis termed `inheritance on the average'.Fisher's k-statistics and Tukey's polykays are examples of homogeneous polynomial U-statistics(Fisher, 1929; Tukey, 1950).For a simple random sample varphi of size n taken from a population of size N,the U-statistic has the property that the average over sample values f_n(xvarphi)is exactly equal to the population value f_N(x).

Examples

Some examples:If f(x) = x the U-statistic f_n(x) = ar x_n = (x_1 + cdots + x_n)/n is the sample mean.

If f(x_1, x_2) = |x_1 - x_2|, the U-statistic is the mean pairwise deviationf_n(x_1,ldots, x_n) = sum_{i eq j} |x_i - x_j| / (n(n-1)), defined for nge 2.

If f(x_1, x_2) = (x_1 - x_2)^2/2, the U-statistic is the sample variance f_n(x) = sum(x_i - ar x_n)^2/(n-1)with divisor n-1, defined for nge 2.

The third k-statistic k_{3,n}(x) = sum(x_i - ar x_n)^3 n/((n-1)(n-2)),the sample skewness defined for nge 3,is a U-statistic.

The following case highlights an important point. If f(x_1, x_2, x_3) is the median of three values, f_n(x_1,ldots, x_n) is not the median of n values. However, it is a minimum variance unbiased estimate of the expected value of the median of three values and in this application of the theory it is the population parameter defined as "the expected value of the median of three values" which is being estimated, not the median of the population. Similar estimates play a central role where the parameters of a family of probability distibutions are being estimated by probability weighted moments or L-moments.

Notes

References

Cox, D.R., Hinkley, D.V. (1974) Theoretical statistics. Chapman and Hall. ISBN 0-412-12420-3

Fisher, R.A. (1929) Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society, 2, 30:199-238.

Hoeffding, W. (1948) A class of statistics with asymptotically normal distributions. Annals of Statistics, 19:293-325. (Partially reprinted in: Kotz, S., Johnson, N.L. (1992) "Breakthroughs in Statistics", Vol I, p308-334. Springer-Verlag. ISBN 0-387-94037-5)

Lee, A.J. (1990) U-Statistics: Theory and Practice. Marcel Dekker, New York. pp320 ISBN 0824782534

Sen, P.K (1992) Introduction to Hoeffding(1948) A Class of Statistics with Asymptotically Normal Distribution. In: Kotz, S., Johnson, N.L. "Breakthroughs in Statistics", Vol I, p299-307. Springer-Verlag. ISBN 0-387-94037-5.

Tukey, J.W. (1950) Some Sampling Simplified. J. Amer. Statist. Assoc. 45:501-519.


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • statistic — sta‧tis‧tic [stəˈtɪstɪk] noun STATISTICS 1. [countable usually plural] a collection of numbers that represent facts or measurements: • Statistics show that 35% of new businesses fail in their first year. • the October employment statistics… …   Financial and business terms

  • statistic — STATÍSTIC, Ă, statistici, ce, s.f., adj. I. s.f. 1. Evidenţă numerică, situaţie cifrică referitoare la diverse fenomene (izolate sau generale); numărătoare. 2. Culegere, prelucrare şi valorificare a unor date legate de fenomene generale. 3.… …   Dicționar Român

  • Statistic — Sta*tis tic (st[.a]*t[i^]s t[i^]k), Statistical Sta*tis tic*al ( t[i^]*kal), a. [Cf. F. statistique.] Of or pertaining to statistics; as, statistical knowledge; statistical tabulation. [1913 Webster] …   The Collaborative International Dictionary of English

  • statistic — index computation, poll (canvass) Burton s Legal Thesaurus. William C. Burton. 2006 …   Law dictionary

  • statistic — (n.) quantitative fact or statement, 1880; see STATISTICS (Cf. statistics) …   Etymology dictionary

  • statistic — ► NOUN ▪ a fact or piece of data obtained from a study of a large quantity of numerical data. ORIGIN German Statistik …   English terms dictionary

  • statistic — [stə tis′tik] adj. rare var. of STATISTICAL n. a statistical item or element …   English World dictionary

  • Statistic (role-playing games) — Part of a series on …   Wikipedia

  • Statistic — A statistic (singular) is the result of applying a function (statistical algorithm) to a set of data. More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample s… …   Wikipedia

  • statistic — UK [stəˈtɪstɪk] / US noun Word forms statistic : singular statistic plural statistics 1) statistics [plural] a group of numbers that represent facts or that describe a situation New statistics show the economy is continuing to grow. Official… …   English dictionary

  • statistic — sta|tis|tic [ stə tıstık ] noun 1. ) statistics plural a group of numbers that represent facts or describe a situation: New statistics show the economy is continuing to grow. Official statistics underestimate the actual level of crime. statistic… …   Usage of the words and phrases in modern English

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”