Pseudocount

Pseudocount

A pseudocount is a count added to observed data in order to change the probability in a model of those data, which is known not to be zero, to being negligible rather than being zero.

In any observed data set or sample there is the possibility, especially with low-probability events and/or small data sets, of a possible event not occurring. Its observed frequency is therefore 0, implying a probability of 0. This is an oversimplification and is often unhelpful, particularly in probability-based machine learning techniques such as artificial neural networks and hidden Markov models.By artificially adjusting the probability of rare (but not impossible) events so those probabilities are not exactly zero, we avoid the zero-frequency problem.

The simplest approach is to add "1" to each observed number of events including the zero-count one. This is sometimes called "Laplace's rule" (more formally known as Laplace's rule of succession).

A more complex approach is to estimate the probability of the events from other factors and adjust accordingly.

ee also

*Principle of indifference
*prior probability
*offset
*substitution matrix
*n-gram

External links

* [http://www.soe.ucsc.edu/research/compbio/html_format_papers/tr-95-11/node8.html Pseudocounts]
** [http://www.soe.ucsc.edu/research/compbio/html_format_papers/tr-95-11/node30.html Bayesian interpretation of pseudocount regularizers]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Prediction by Partial Matching — (PPM) is an adaptive statistical data compression technique based on context modeling and prediction. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream.Predictions are usually… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Rule of succession — In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre Simon Laplace in the course of treating the sunrise problem.The formula is still used, particularly to estimate underlying probabilities for… …   Wikipedia

  • N-gram — An n gram is a sub sequence of n items from a given sequence. n grams are used in various areas of statistical natural language processing and genetic sequence analysis. The items in question can be phonemes, syllables, letters, words or base… …   Wikipedia

  • List of mathematics articles (P) — NOTOC P P = NP problem P adic analysis P adic number P adic order P compact group P group P² irreducible P Laplacian P matrix P rep P value P vector P y method Pacific Journal of Mathematics Package merge algorithm Packed storage matrix Packing… …   Wikipedia

  • Good–Turing frequency estimation — is a statistical technique for predicting the probability of occurrence of objects belonging to an unknown number of species, given past observations of such objects and their species. (In drawing balls from an urn, the objects would be balls and …   Wikipedia

  • Regla de Laplace — Saltar a navegación, búsqueda En la teoría de probabilidad, la regla de sucesión es una fórmula desarrollada por Pierre Simon Laplace en el siglo XVIII al analizar el problema del amanecer. La fórmula todavía se utiliza, particularmente para… …   Wikipedia Español

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”