Stemplot

Stemplot

A stemplot (or stem-and-leaf plot), in statistics, is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. They evolved from Arthur Bowley's work in the early 1900s, and are useful tools in exploratory data analysis.

Unlike histograms, stemplots retain the original data to at least two significant digits, and put the data in order, thereby easing the move to order-based inference and non-parametric statistics.

A basic stemplot contains two columns separated by a vertical line. The left column contains the "stems" and the right column contains the "leaves".

Constructing a stemplot

To construct a stemplot, the observations must first be sorted in ascending order. Here is the sorted set of data values that will be used in the following example: 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106Next, it must be determined what the stems will represent and what the leaves will represent. Typically, the leaf contains the last digit of the number and the stem contains all of the other digits. In the case of very large or very small numbers, the data values may be rounded to a particular place value (such as the hundreds place) that will be used for the leaves. The remaining digits to the left of the rounded place value are used as the stems.

In this example, the leaf represents the ones place and the stem will represent the rest of the number (tens place and higher).

The stemplot is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line. It is important that each stem is listed only once and that no numbers are skipped, even if it means that some stems have no leaves. The leaves are listed in increasing order in a row to the right of each stem. 4 | 4 6 7 9 5
6 | 3 4 6 8 8 7 | 2 2 5 6 8 | 1 4 8 9 | 10 | 6 key: 5|4=54 leaf unit: 1.0 stem unit: 10.0

For negative numbers, a negative is placed in front of the stem unit, which is still the value X / 10. Non-integers are rounded. This allowed the stem and leaf plot to retain its shape, even for more complicated data sets. As in this example below: -2 | 4 -1 | 2 -0 | 3 0 | 4 6 6 1 | 7 2 | 5 3 | 4 | 5 | 7

Which, with rounding to the nearest unit, represents the set of data: -23.678758, -12.45, -3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8

Usage

Stemplots are useful for displaying the relative density and shape of the data, giving the reader a quick overview of distribution. They retain most of the raw numerical data, in some cases with perfect integrity. They are also useful for highlighting outliers and finding the mode. However, stemplots are only useful for moderately sized data sets (around 15-150 data points). With very small data sets a stemplot can be of little use, as a reasonable number of data points are required to establish definitive distribution properties. A dot plot may be better suited for such data. With very large data sets, a stemplot will become very cluttered, since each data point must be represented numerically. A box plot or histogram may become more appropriate as the data size increases.

The ease with which histograms can now be generated on computers has meant that stemplots are less used today than in the 1980s, when they first became widely utilized as a quick method of displaying information graphically by hand.

References

*Wild, C. and Seber, G. (2000) "Chance Encounters: A First Course in Data Analysis and Inference" pp. 49-54 John Wiley and Sons. ISBN 0-471-32936-3


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Stemplot — Das Stamm Blatt Diagramm, auch: Zweig Blätter Diagramm (en. stem and leaf plot oder stemplot) ist ein Werkzeug der explorativen Statistik. Ähnlich wie bei einem Histogramm oder einem Boxplot werden in einem Stamm Blatt Diagramm die Daten einer… …   Deutsch Wikipedia

  • stemplot — noun A means of displaying data used especially in exploratory data analysis. Syn: stem and leaf, stem and leaf diagram, stem and leaf display, stem and leaf graph, stem and leaf plot …   Wiktionary

  • Plot (graphics) — Scatterplot of the eruption interval for Old Faithful (a geyser). A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a… …   Wikipedia

  • Arithmetic mean — In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space. The term arithmetic mean is preferred in mathematics and… …   Wikipedia

  • Analysis of variance — In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of… …   Wikipedia

  • Descriptive statistics — quantitatively describe the main features of a collection of data.[1] Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set, rather than use the… …   Wikipedia

  • Design of experiments — In general usage, design of experiments (DOE) or experimental design is the design of any information gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms… …   Wikipedia

  • Frequency probability — Statistical probability redirects here. For the episode of Star Trek: Deep Space Nine, see Statistical Probabilities. John Venn Frequency probability is the interpretation of probability that defines an event s probability as the limit of its… …   Wikipedia

  • Kurtosis — In probability theory and statistics, kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is any measure of the peakedness of the probability distribution of a real valued random variable.[1] In a similar way to the concept… …   Wikipedia

  • Linear regression — Example of simple linear regression, which has one independent variable In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”