Summary statistic

Summary statistic: Box plot of the Michelson–Morley experiment, showing several summary statistics.

In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible. Statisticians commonly try to describe the observations in

a measure of location, or central tendency, such as the arithmetic mean

a measure of statistical dispersion like the standard deviation

a measure of the shape of the distribution like skewness or kurtosis

if more than one variable is measured, a measure of statistical dependence such as a correlation coefficient

A common collection of order statistics used as summary statistics are the five-number summary, sometimes extended to a seven-number summary, and the associated box plot.

Entries in an analysis of variance table can also be regarded as summary statistics.^[1]

Contents

1 Example

2 Examples of summary statistics

2.1 Location

2.2 Spread

2.3 Shape

2.4 Percentiles

2.5 Dependence

3 See also

4 References

Example

The following example using R is the standard summary statistics of a randomly sampled normal distribution, with a mean of 0, standard deviation of 1, and a population of 50:

> x <- rnorm(n=50, mean=0, sd=1) > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.72700 -0.49650 -0.05157 0.07981 0.67640 2.46700

Examples of summary statistics

Location

Common measures of location, or central tendency, are the arithmetic mean, median, mode, and interquartile mean.

Spread

Common measures of statistical dispersion are the standard deviation, variance, range, interquartile range, absolute deviation and the distance standard deviation. Measures that assess spread in comparison to the typical size of data values include the coefficient of variation.

The Gini coefficient was originally developed to measure income inequality and is equivalent to one of the L-moments.

Shape

Common measures of the shape of a distribution are skewness or kurtosis, while alternatives can be based on L-moments. A different measure is the Distance skewness, for which a value of zero implies central symmetry.

Percentiles

A simple summary of a dataset is sometimes given by quoting particular order statistics as approximations to selected percentiles of a distribution.

Dependence

The common measure of dependence between paired random variables is the Pearson product-moment correlation coefficient, while a common alternative summary statistic is Spearman's rank correlation coefficient. Distance correlation equals zero implies independence.

See also

Descriptive statistics

Sufficient statistic

References

^ Upton, G., Cook, I. (2006). Oxford Dictionary of Statistics, OUP. ISBN 978-0-19-954145-4

v · d · eStatistics

Descriptive statistics

Continuous data

Location

Mean (Arithmetic, Geometric, Harmonic) · Median · Mode

Dispersion

Range · Standard deviation · Coefficient of variation · Percentile · Interquartile range

Shape

Variance · Skewness · Kurtosis · Moments · L-moments

Count data

Index of dispersion

Summary tables

Grouped data · Frequency distribution · Contingency table

Dependence

Pearson product-moment correlation · Rank correlation (Spearman's rho, Kendall's tau) · Partial correlation · Scatter plot

Statistical graphics

Bar chart · Biplot · Box plot · Control chart · Correlogram · Forest plot · Histogram · Q-Q plot · Run chart · Scatter plot · Stemplot · Radar chart

Data collection

Designing studies

Effect size · Standard error · Statistical power · Sample size determination

Survey methodology

Sampling · Stratified sampling · Opinion poll · Questionnaire

Controlled experiment

Design of experiments · Factorial experiment · Randomized experiment · Random assignment · Replication · Blocking · Optimal design

Uncontrolled studies

Natural experiment · Quasi-experiment · Observational study

Statistical inference

Statistical theory

Sampling distribution · Sufficient statistic · Meta-analysis

Bayesian inference

Bayesian probability · Prior · Posterior · Credible interval · Bayes factor · Bayesian estimator · Maximum posterior estimator

Frequentist inference

Confidence interval · Hypothesis testing · Likelihood-ratio

Specific tests

Z-test (normal) · Student's t-test · F-test · Pearson's chi-squared test · Wald test · Mann–Whitney U · Shapiro–Wilk · Signed-rank · Kolmogorov–Smirnov test

General estimation

Mean-unbiased · Median-unbiased · Maximum likelihood · Method of moments · Minimum distance · Density estimation

Correlation and regression analysis

Correlation

Pearson product-moment correlation · Partial correlation · Confounding variable · Coefficient of determination

Regression analysis

Errors and residuals · Regression model validation · Mixed effects models · Simultaneous equations models

Linear regression

Simple linear regression · Ordinary least squares · General linear model · Bayesian regression

Non-standard predictors

Nonlinear regression · Nonparametric · Semiparametric · Isotonic · Robust

Generalized linear model

Exponential families · Logistic (Bernoulli) · Binomial · Poisson

Partition of variance

Analysis of variance (ANOVA) · Analysis of covariance · Multivariate ANOVA · Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data

Cohen's kappa · Contingency table · Graphical model · Log-linear model · McNemar's test

Multivariate statistics

Multivariate regression · Principal components · Factor analysis · Cluster analysis · Copulas

Time series analysis

Decomposition (Trend · Stationary process) · ARMA model · ARIMA model · Vector autoregression · Spectral density estimation

Survival analysis

Survival function · Kaplan–Meier · Logrank test · Failure rate · Proportional hazards models · Accelerated failure time model

Applications

Biostatistics

Bioinformatics · Biometrics · Clinical trials & studies · Epidemiology · Medical statistics · Pharmaceutical statistics

Engineering statistics

Methods engineering · Probabilistic design · Process & Quality control · Reliability · System identification

Social statistics

Actuarial science · Census · Crime statistics · Demography · Econometrics · National accounts · Official statistics · Population · Psychometrics

Spatial statistics

Cartography · Environmental statistics · Geographic information system · Geostatistics · Kriging

Category · Portal · Outline · Index

Categories:
Summary statistics

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

Summary statistics — In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate as much as possible as simply as possible. Statisticians commonly try to describe the observations in # a measure of location, or… … Wikipedia
Order statistic — Probability distributions for the n = 5 order statistics of an exponential distribution with θ = 3 In statistics, the kth order statistic of a statistical sample is equal to its kth smallest value. Together with rank statistics, order statistics… … Wikipedia
PRESS statistic — In statistics, the predicted residual sums of squares (PRESS) statistic is used in regression analysis to provide a summary measure of the fit of a model to a sample of observations. These observation were not themselves used to estimate the… … Wikipedia
Five-number summary — In descriptive statistics, the five number summary of a data set consists of:# the minimum (smallest observation) # the lower quartile or first quartile (which cuts off the lowest 25% of the data) # the median (middle value) # the upper quartile… … Wikipedia
Optimal design — This article is about the topic in the design of experiments. For the topic in optimal control theory, see shape optimization. Gustav Elfving developed the optimal design of experiments, and so minimized surveyors need for theodolite measurements … Wikipedia
Absolute deviation — In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median… … Wikipedia
Bond credit rating — In investment, the bond credit rating assesses the credit worthiness of a corporation s or government debt issues. It is analogous to credit ratings for individuals. Contents 1 Table 2 Credit rating agencies 3 Credit rating tiers … Wikipedia
Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation … Wikipedia
Receiver operating characteristic — In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot of the sensitivity vs. (1 specificity) for a binary classifier system as its discrimination threshold is varied. The ROC can also be… … Wikipedia
Correlation — In probability theory and statistics, correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co relation … Wikipedia

Academic Dictionaries and Encyclopedias

Summary statistic

Contents

Example

Examples of summary statistics

Location

Spread

Shape

Percentiles

Dependence

See also

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Summary statistic

Contents

Example

Examples of summary statistics

Location

Spread

Shape

Percentiles

Dependence

See also

References

Look at other dictionaries:

Share the article and excerpts

Direct link