High-dimensional statistics

High-Dimensional Statistics

High-dimensional statistics is the branch of mathematical and applied multivariate statistics aimed at treatment of statisical data whose dimension is so large that it is comparable in magnitude to sample size and may be much greater.

History

The classical Fisher approach to statistics is based on the concept of fixed population and fixed model, whose parameters can be infinitely sharpened in the process of data accumulation. The main requirement to estimators is the consistency, that is, the convergence to unknown true population parameters. Well known classical statistical procedures provide satisfactory solutions for "p"-dimensional data only for samples of size "n" much greater than "p". Meanwhile, in multivariate statistics, practical investigators often meet the situation when programs included into most of standard statistical packages prove to be inefficient and do not guarantee any stable results. The existing theory could recommend nothing else as to ignore a part of data in hope to obtain a plausible solution.

In 1968 A.N.Kolmogorov proposed another setting of statistical problems and another asymptotics, in which the dimension of variables "p" increases along with the sample size "n" so that the ratio "p"/"n" tends to a constant. It was called “the increasing dimension asymptotics” or “the Kolmiogorov asymptotics” (see in [1] ). This method makes it easy to isolate principal terms of error probabilities and of standard quality functions for large "p" and "n". On the other side, the basic concept of traditional statistics is changed: the interest to estimation of separate parameters and to consistency is replaced by quality functions maximization in the Wald optimal decision rule meaning.

Mathematical Theory

Extensive mathematical investigations were carried out that resulted in the creation of systematic theory of improved and asymptotically unimprovable versions of multivariate statistical procedures (see references at URL [2] ). A special parameter "G" - a function of fourth moments of variables was found, whose small value produces a number of specifically many-parametric phenomena. For increasing "p" and "n" so that "p"/"n" tends to a constant and "G" → 0, the principal terms of rotation invariant functionals occurring in statistics prove to be dependent on only first two moments of variables. Under "n" and "p" tending to infinity, "p"/"n" → "y" > 0, and "G" → 0, these functionals have vanishing variance and converge to constants that present limit functions of empirical means andvariances. As a consequence, some stable integral relations are produced between functions of parameters and functions of observable variables. They were called “stochastic canonical equations” or “dispersion equations” (see [3] ). Using them one can express principle parts of standard quality functions of regularized multivariate statistical procedures as functions of only observed variables. This provides the possibility to choose better procedures and find asymptoticaly unimprovable solutions

More Efficient Methods

A number of more efficient “essentially multivariate” statistical procedures were suggested that have obvious advantages over traditional consistent ones: they never degenerate, are applicable to observations of any dimension, and are approximately unimprovable for a wide class of populations. This method of statistical investigations was called “the essentially multivariate analysis”, and this approach was called the “multiparametric statistics” of “high-dimensional statistics”.

New Regions of Applications

Meanwhile, in the last decade due to progress of computer technologies, a number of new urgent statistical problems were put forth, in which the dimension of observations "p" is so high that it is much larger "n". In this situation, all existing multivariate procedures (including improved ones) do not provide satisfactory solutions even under a strong assumption of variable independence. Such problems would arise in connection with the necessity to treat huge amounts (terabytes) of genetic information, in the image analysis, for natural language text analysis, and other applications. Theoretical and practical aspects of analyzing high-dimensional data were intensely discussed at a number of seminars and workshops [4–7] . Some remarkable progress was achieved as in the development of methods, as in practical applications. Nevertheless, until now no regular methods exist for efficient treatment of so many variables. This region of statistical investigations got generally accepted name “High-Dimensional Statistics” or “HD-Statistics” (see [4–7] and references at URL [2] ).

REFERENCES

1. S.A.Aivasian, V.M.Buchstaber, I.S.Yenyukov, L.D.Meshalkin. Applied Statistics. Classification and Reduction of Dimensionality. Moscow, 1989 (in Russian).

2. URL [hd-stat.narod.ru]

3. V.L.Girko. Canonical Stochastic Equations, vol. 1,2, Kluwer Academic Publishers, Dordrecht, 2000

4. Program on High-Dimensional Inference for 2006-2007. SAMSI, USA.

5. Workshop in High-Dimensional Data Analysis, National University of Singapore. February, 2008.

6. Workshops HD-statistics in biology, Isaac Newton Inst. for Math. Sci., Cambridge. 31.03-27.06 2008.

7. Young European Statistics Workshop (YES-2), Eindhoven, Netherland. June, 2008.

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
Semantic mapping (statistics) — The semantic mapping (SM) is a dimensionality reduction method that extracts new features by clustering the original features in semantic clusters and combining features mapped in the same cluster to generate an extracted feature. Given a data… … Wikipedia
Brooklyn Technical High School — Infobox School name = Brooklyn Technical High School motto = established = 1922 type = Public, Specialized High School principal = Randy Asher founder = Dr. Albert L. Colston faculty = 227 [http://schools.nyc.gov/daa/SchoolReports/05asr/313430.pdf… … Wikipedia
Klein Oak High School — Infobox School name = Klein Oak High School imagesize = thumb established = 1982 type = Public high school principal = Kelly Schumacher enrollment = 3,170 (as of 2008 09) faculty = 134.2 (on FTE basis) ratio = 17.8 grades = 9 12 teamname =… … Wikipedia
List of Stuyvesant High School people — This article lists notable people associated with Stuyvesant High School in New York City, New York, organized into rough professional areas and listed in order by their graduating class. MathematicsStuyvesant High School has produced a steady… … Wikipedia
Mayo High School — (Rochester, Minnesota) Mayo Motto To be a Spartan is the very best one can be Established 1966 Type High School Category Public School … Wikipedia
New Hyde Park Memorial High School — Location 500 Leonard Boulevard New Hyde Park, NY 11040 Information Type Public high school School … Wikipedia
St. Charles North High School — Infobox Secondary school name = Saint Charles North High School motto = Leadership, Integrity, Respect. principal = Kimberly Zupock established = 2000 type = Public primary grades = 9–12 city = 255 Red Gate Road St. Charles state = Illinois… … Wikipedia
Joel Barlow High School — is a high school (grades 9 12) in Redding, Fairfield County, Connecticut. The school serves Redding and Easton students.The school is the sole institution in the Region 9 School District of Connecticut, made up of the towns of Redding and Easton … Wikipedia
List of mathematics articles (H) — NOTOC H H cobordism H derivative H index H infinity methods in control theory H relation H space H theorem H tree Haag s theorem Haagerup property Haaland equation Haar measure Haar wavelet Haboush s theorem Hackenbush Hadamard code Hadamard… … Wikipedia

Academic Dictionaries and Encyclopedias

High-dimensional statistics

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

High-dimensional statistics

Look at other dictionaries:

Share the article and excerpts

Direct link