Anscombe's quartet

Anscombe's quartet

Anscombe's quartet comprises four datasets which have identical simple statistical properties, yet which are revealed to be very different when inspected graphically. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician F.J. Anscombe to demonstrate the importance of graphing data before analyzing it, and of the effect of outliers on the statistical properties of a dataset.

For all four datasets:

The first one (top left) seems to be distributed normally, and corresponds to what one would expect when considering two variables correlated and following the assumption of normality. The second one (top right) is not distributed normally; while an obvious relationship between the two variables can be observed, it is not linear, and the Pearson correlation coefficient is not relevant. In the third case (bottom left), the linear relationship is perfect, except for one outlier which exerts enough influence to lower the correlation coefficient from 1 to 0.81. Finally, the fourth example (bottom right) shows another example when one outlier is enough to produce a high correlation coefficient, even though the relationship between the two variables is not linear.

Edward Tufte uses the quartet to emphasize the importance of "looking" at one's data before analyzing it in the first page of the first chapter of his book, "The Visual Display of Quantitative Information".

The datasets are as follows. The "x" values are the same for the first three datasets.

References

* F.J. Anscombe, [http://links.jstor.org/sici?sici=0003-1305%28197302%2927%3A1%3C17%3AGISA%3E2.0.CO%3B2-J "Graphs in Statistical Analysis,"] American Statistician, 27 (February 1973), 17-21.
* Tufte, Edward R. (2001). "The Visual Display of Quantitative Information," 2nd Edition, Cheshire, CT: Graphics Press. ISBN 0961392142

See also

* Exploratory data analysis

External links

* [http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html Department of Physics, University of Toronto]
* [http://www.soi.city.ac.uk/~dcd/ig/s2viscom/lb_datg/l03.htm Department of Computing, City University, London]
* [http://exploringdata.cqu.edu.au/curv_fit.htm Curve fitting, Central Queensland University, Australia]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Anscombe — may refer to:* Edmund Anscombe (1874–1948), New Zealand architect. * Francis Anscombe (1918–2001), British statistician, noted for: ** Anscombe s quartet of datasets with identical statistical properties but radically different shapes. ** The… …   Wikipedia

  • Quartet d'Anscombe — Les quatre ensembles de données possèdent les mêmes propriétés statistiques simples, mais leur expression graphique est très différente. Le quartet d Anscombe comprend quatre ensemble de données qui possèdent les mêmes propriétés statistiques… …   Wikipédia en Français

  • Francis Anscombe — Infobox Scientist name = Francis Anscombe box width = image width = caption = birth date = Birth date|1918|05|13|df=y birth place = Hove, East Sussex death date = Death date and age|2001|10|17|1918|05|13|df=y death place = residence = United… …   Wikipedia

  • Cuarteto de Anscombe — Saltar a navegación, búsqueda El cuarteto de Anscombe comprende cuatro conjuntos de datos que tienen las mismas propiedades estadísticas, pero que evidentemente son distintas al inspeccionar sus gráficos respectivos. Cada conjunto consiste de… …   Wikipedia Español

  • Linear regression — Example of simple linear regression, which has one independent variable In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one… …   Wikipedia

  • Correlation — In probability theory and statistics, correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co relation …   Wikipedia

  • Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (A) — NOTOC A A Beautiful Mind A Beautiful Mind (book) A Beautiful Mind (film) A Brief History of Time (film) A Course of Pure Mathematics A curious identity involving binomial coefficients A derivation of the discrete Fourier transform A equivalence A …   Wikipedia

  • Exploratory data analysis — (EDA) is an approach to analyzing data for the purpose of formulating hypotheses worth testing, complementing the tools of conventional statistics for testing hypotheses And roughly the only mechanism for suggesting questions is exploratory. And… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”