Count data

Count data

In statistics, count data is data in which the observations can take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking. The statistical treatment of count data is distinct from that of binary data, in which the observations can take only two values, usually represented by 0 and 1.

Contents

Introduction

Statistical analyses involving count data can take several forms depending on the context in which the data arise.

  • simple counts, such as the number of occurrences of thunderstorms in a calendar year, observed for several years.
  • categorical data in which the counts represent the numbers of items falling into each of several categories.

The latter are treated separately as different methodologies apply, and the following applies to simple counts.

Analysing simple count data alone

The Poisson, binomial and negative binomial distributions are commonly used to represent the distributions of count data when these are treated as random variables.

Graphical examination of count data may be aided by the use of data transformations chosen to have the property of stabilising the sample variance. In particular, the square root transformation might be used when data can be approximated by a Poisson distribution (although other transformation have modestly improved properties), while an inverse sine transformation is available when a binomial distribution is preferred.

Relating count data to other variables

Here the count data would be treated as a dependent variable. Statistical methods such as least squares and analysis of variance are designed to deal with continuous dependent variables. These can be adapted to deal with count data by using data transformations such as the square root transformation, but such methods have several drawbacks; they are approximate at best and estimate parameters that are often hard to interpret.

The Poisson distribution can form the basis for some analyses of count data and in this case Poisson regression may be used. This is a special case of the class of generalized linear models which also contains specific forms of model capable of using the binomial distribution (binomial regression, logistic regression) or the negative binomial distribution where the assumptions of the Poisson model are violated, in particular when the range of count values is limited or when overdispersion is present.

See also

Further reading

  • Cameron, A.C. and P.K. Trivedi (1998). Regression analysis of count data, Cambridge University Press. ISBN 0-521-63201-3
  • Winkelmann, Rainer (2000), Econometric Analysis of Count Data, Springer, ISBN 354040404X 

Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Count Key Data — (CKD) is a disk data architecture. Each physical disk record consists of a count field, an optional key field, and a ( user ) data field with error correction/detection information appended to each field and gaps separating each field [1].… …   Wikipedia

  • Data collection — is a term used to describe a process of preparing and collecting data, for example, as part of a process improvement or similar project. The purpose of data collection is to obtain information to keep on record, to make decisions about important… …   Wikipedia

  • Data — For data in a computer science context, see Data (computing). For other senses of the word, see Data (disambiguation). See also datum, a disambiguation page. The term data refers to qualitative or quantitative attributes of a variable or set of… …   Wikipedia

  • Data transformation (statistics) — A scatterplot in which the areas of the sovereign states and dependent territories in the world are plotted on the vertical axis against their populations on the horizontal axis. The upper plot uses raw data. In the lower plot, both the area and… …   Wikipedia

  • Count Zero —   Cover of first edition (hardcover) …   Wikipedia

  • Count Down TV — Title screen used in 2009. Format Music Live Program, Music chart Presented by …   Wikipedia

  • Data integration — involves combining data residing in different sources and providing users with a unified view of these data.[1] This process becomes significant in a variety of situations, which include both commercial (when two similar companies need to merge… …   Wikipedia

  • Data profiling — is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to: Find out whether existing data can easily… …   Wikipedia

  • Data, context and interaction — (DCI) is a paradigm used in computer software to program systems of communicating objects. Its goals are: To improve the readability of object oriented code by giving system behavior first class status; To cleanly separate code for rapidly… …   Wikipedia

  • Data center bridging — (DCB) refers to a set of enhancements to Ethernet local area networks for use in data center environments. Specifically, DCB goals are, for selected traffic, to eliminate loss due to queue overflow and to be able to allocate bandwidth on links.… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”