Compositional data

In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information.

This definition, given by John Aitchison (1986) has several consequences:

A compositional data point, or composition for short, can be represented by a positive real vector with as many parts as considered. Sometimes, if the total amount is fixed and known, one component of the vector can be omitted.

As compositions only carry relative information, the only information is given by the ratios between components. Consequently, a composition multiplied by any positive constant contains the same information as the former. Therefore, proportional positive vectors are equivalent when considered as compositions.

As usual in mathematics, equivalent classes are represented by some element of the class, called a representative. Thus, equivalent compositions can be represented by positive vectors whose components add to a given constant $\scriptstyle\kappa$ . The vector operation assigning the constant sum representative is called closure and is denoted by $\scriptstyle\mathcal{C}[\cdot]$ :

$\mathcal{C}[x_1,x_2,\dots,x_D]=\left[\frac{x_1}{\sum_{i=1}^D x_i},\frac{x_2}{\sum_{i=1}^D x_i}, \dots,\frac{x_D}{\sum_{i=1}^D x_i}\right],\$

where D is the number of parts (components) and $[\cdot]$ denotes a row vector.

Compositional data can be represented by constant sum real vectors with positive components, and this vectors span a simplex, defined as

$\mathcal{S}^D=\left\{\mathbf{x}=[x_1,x_2,\dots,x_D]\in\mathbb{R}^D \left| x_i>0,i=1,2,\dots,D; \sum_{i=1}^D x_i=\kappa \right. \right\}. \$

This is the reason why $\scriptstyle\mathcal{S}^D$ is considered to be the sample space of compositional data. The positive constant $\scriptstyle\kappa$ is arbitrary. Frequent values for $\scriptstyle\kappa$ are 1 (per unit), 100 (percent, %), 1000, 10⁶ (ppm), 10⁹ (ppb), ...

In statistics, compositional data is frequently considered to be data in which each data point is an D-tuple of nonnegative numbers whose sum is 1. Typically each of the D components x_i of each data point [x₁, ..., x_D] says what proportion (or "percentage") of a statistical unit falls into the ith category in a list of D categories. Very often ternary plots are used in analysis of compositional data to represent a three part composition.

An alternative nomenclatures for compositional analysis is simplicial analysis, motivated by the concept of simplicial sets.

Remarks on the definition of the simplex:

In mathematical frameworks, the superscript of $\scriptstyle\mathcal{S}^D$ , accounting for the number of parts, is often changed to D − 1, describing the dimension.

The components of the vector are assumed to be positive. However, in some definitions of the simplex, non-negative components are admitted. Here null components are avoided, because ratios between components of which some are zero are meaningless.

Examples

Each data point may correspond to a rock composed of three different minerals; a rock of which 10% is the first mineral, 30% is the second, and the remaining 60% is the third would correspond to the triple [0.1, 0.3, 0.6]; a data set would contain one such triple for each rock in a sample of rocks.

Each data point may correspond to a town; a town in which 35% of the people are Christians, 55% are Muslims, 6% are Jews, and the remaining 4% are others would correspond to the quadruple [0.35, 0.55, 0.06, 0.04]; a data set would correspond to a list of towns.

In chemistry, compositions can be expressed as molar concentrations of each component. As the sum of all concentrations is not determined, the whole composition of D parts is needed and thus expressed as a vector of D molar concentrations. These compositions can be translated into weight per cent multiplying each component by the appropriated constant.

In a survey, the proportions of people positively answering some different items can be expressed as percentages. As the total amount is identified as 100, the compositional vector of D components can be defined using only D − 1 components, assuming that the remaining component is the percentage needed for the whole vector to add to 100.

In probability and statistics, a partition of the sampling space into disjoint events is described by the probabilities assigned to such events. The vector of D probabilities can be considered as a composition of D parts. As they add to one, one probability can be suppressed and the composition is completely determined.

External links

References

J. Aitchison, 1986: The Statistical Analysis of Compositional Data, Chapman & Hall, reprinted in 2003 with additional material by The Blackburn Press

Categories:

Statistical data types

Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

Compositional data for selected solar system objects — ▪ Table Compositional data for selected solar system objects object distance from Sun (AU)* mean density (g/cm3) general composition Sun 1.4 hydrogen, helium Mercury 0.4 5.4 iron, nickel, silicates Venus 0.7 5.2 silicates, iron, nickel Earth 1… … Universalium
solar system — the sun together with all the planets and other bodies that revolve around it. [1695 1705] * * * The Sun, its planets, and the small bodies (see asteroid, Centaur object, comet, Kuiper belt, meteorite, and Oort cloud) interplanetary dust and gas… … Universalium
moon — mooner, n. moonless, adj. /moohn/, n. 1. the earth s natural satellite, orbiting the earth at a mean distance of 238,857 miles (384,393 km) and having a diameter of 2160 miles (3476 km). 2. this body during a particular lunar month, or during a… … Universalium
Moon — /moohn/, n. Sun Myung /sun myung/, born 1920, Korean religious leader: founder of the Unification Church. * * * Sole natural satellite of Earth, which it orbits from west to east at a mean distance of about 238,900 mi (384,400 km). It is less… … Universalium
John Aitchison — (born 1926) is a Scottish statistician. He studied at the Universities of Edinburgh and Cambridge. From 1966 to 1976 he was Titular Professor of Statistics, and Mitchell Lecturer in Statistics at the University of Glasgow. In 1976 he joined the… … Wikipedia
Early modern glass in England — The early modern period in England (c. 1500 1800) brought on a revival in local glass production. Medieval glass had been limited to the small scale production of forest glass for window glass and vessels, predominantly in the WealdKenyon, G.H.,… … Wikipedia
List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… … Wikipedia
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
Mixture model — See also: Mixture distribution In statistics, a mixture model is a probabilistic model for representing the presence of sub populations within an overall population, without requiring that an observed data set should identify the sub population… … Wikipedia
Takalik Abaj — muestra una ocupación continua a través de casi dos mil años.[1] Esta foto muestra la escalinata de acceso a la Terraza 3, fecha al período preclásico tardío.[ … Wikipedia Español

Academic Dictionaries and Encyclopedias

Compositional data

Examples

External links

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Compositional data

Examples

External links

References

Look at other dictionaries:

Share the article and excerpts

Direct link