- Rule of succession
In
probability theory , the rule of succession is a formula introduced in the 18th century byPierre-Simon Laplace in the course of treating thesunrise problem .The formula is still used, particularly to estimate underlying probabilities for events which have not been observed to occur at all in (finite) sample data. Assigning such events a zero probability would contravene
Cromwell's rule , and is not justified by the evidence.tatement of the rule of succession
Suppose "p" is uniformly distributed on the interval [0, 1] . Suppose "X"1, ..., "X""n"+1 are conditionally independent
random variable s given the value of "p", and conditional on "p" are Bernoulli-distributed with expected value "p", i.e., each has probability "p" of being equal to 1 and probability 1 − "p" of being equal to 0. Then:
Mathematical details
The proportion "p" is treated as a uniformly distributed random variable. (Some who take an extreme Bayesian approach to applied probability insist that the word "random" should be banished altogether from probability theory, on the grounds of examples like this one. This proportion is not random, but uncertain. We assign a probability distribution to "p" to express our uncertainty, not to attribute randomness to "p".)
Let "X""i" be the number of "successes" on the "i"th trial, with probability "p" of success on each trial. Thus each "X" is 0 or 1; each "X" has a
Bernoulli distribution . Suppose these "X"s are conditionally independent given "p".Bayes' theorem says that in order to get the conditional probability distribution of "p" given the data "X""i", "i" = 1, ..., "n", one multiplies the "prior" (i.e., marginal) probability measure assigned to "p" by thelikelihood function :
where "s" = "x"1 + ... + "x""n" is the number of "successes" and "n" is of course the number of trials, and then normalizes, to get the "posterior" (i.e., conditional on the data) probability distribution of "p". (We are using capital "X" to denote a random variable and lower-case "x" either as the dummy in the definition of a function or as the data actually observed.)
The prior
probability density function is equal to 1 for 0 < "p" < 1 and equal to 0 for "p" < 0 or "p" > 1. To get the normalizing constant, we find:
(see
beta function for more on integrals of this form).The posterior probability density function is therefore
:
This is a
beta distribution withexpected value :
Since the conditional probability of tomorrow's sunrise, given the value of "p", is just "p", the
law of total probability tell us that the probability of tomorrow's sunrise is just the expected value of "p". Since all of this is conditional on the observed data "X""i" for "i" = 1, ..., "n", we have:
ee also
*
Pseudocount
Wikimedia Foundation. 2010.