Sampling equiprobably with dice

Sampling equiprobably with dice

An illustrative example of how to compute with the probabilities associated to dice, as well as the analysis of algorithms, is the following problem. Suppose you have a group of 19 individuals and an ordinary six-sided die. Your task is to use that die to select one of these individuals equiproabably, i.e. so that all individuals are equally likely, having probability 1/19 of being selected, and using as few rolls of the die as possible. This is an example of the more general problem of using dice to sample a range equiprobably, where the range has no common factor with the number of faces of the die, which is in turn equivalent to one of the key problems of the algorithmics of random number generators. This is because with today's hardware, even if a random number generator samples a physical source, it will need to load the sample into a "discrete" register consisting of a finite number of bits. Hence the question arises of how to sample equiprobably from a range not being a power of two using a random number generator that returns some fixed number of random bits.

The main result is that the expected number E [T] of rolls when an "n"-sided die is used to select equiprobably from a range "m" where n < m and (n,m) = 1 and the discrete random variable "T" represents the number of rolls, obeys the bound:E [T] le lfloor log_n m floor + 1 +frac{m-1}{m} frac{n}{n-1}.This article presents the basic concepts and a basic analysis of the problem. For a much more sophisticated treatment of the case n=2, consult the article by H. Prodinger, who also discusses the problem in terms of leader election on a network. (The reference is here).

Building the algorithm

First, we return to the case of an ordinary die and nineteen individuals. A moment's thought reveals that a bounded number of rolls of the die, say "k", will never suffice, as there is no way to distribute the values from one to nineteen among the 6^k possible outcomes so that all values are equally likely. We note, however, that if 6^k is large, and we use q , mod ,19, where qin [0, 6^k) is the outcome, to choose the individual, the probabilities will be very close to frac{1}{19}, being perturbed only slightly by the remainder r = 6^k ,mod ,19. This suggests that we roll the die until the number 6^k of possible outcomes is at least nineteen, and take q , mod ,19, if q < 6^k - r., If q ge 6^k - r, however, we will need to roll the die again.

Recall that we require as few rolls of the die as possible. Therefore we should make use of the r discarded outcomes that trigger another set of rolls, so as to not lose any valuable random bits. Each of these discarded outcomes may combine with any one of the outcomes of the next set of rolls, forming 6r possible pairs. This finally suggests the following procedure. Introduce the variable c_k, where "k" is the number of rolls, and set c_0 = 1. This is the sequence of remainders "r". Similarly, introduce the variable q_k and set q_0 = 0. This is the index into the remainder interval if applicable. At every step, roll the die, obtaining a value "q" between zero and five, i.e. modulo six. Let c_k = 6 c_{k-1} ,mod, 19 and p = 6 q_{k-1} + q., If p < 6 c_{k-1} - c_k,, take p , mod ,19 and halt, otherwise, set q_k = p - (6 c_{k-1} - c_k), and repeat.

This algorithm is actually quite simple. Roll the die. If the value obtained is below the remainder interval, take the value modulo nineteen. If not, use six times the remainder interval as the new available range. Roll again and combine with the previous index into the remainder interval to obtain an index into the new range, and repeat.

Analysis

This procedure may be analysed in various ways. The simplest is to fix an individual and compute the probability of his/her being chosen. If this turns out to be frac{1}{19}, then the algorithm is indeed correct. With this in mind we introduce the probability p_k of the individual being chosen after "k" rolls, and make use of the fact that the sequence of remainders must repeat with a period that divides varphi(19) = 18, according to Fermat's little theorem.

We find that:p_1 = frac{0}{6} + frac{6}{6} p_2, quadp_2 = frac{1}{36} + frac{17}{36} p_3, quadp_3 = frac{5}{102} + frac{7}{102} p_4

:p_4 = frac{2}{42} + frac{4}{42} p_5, quadp_5 = frac{1}{24} + frac{5}{24} p_6, quadp_6 = frac{1}{30} + frac{11}{30} p_7

:p_7 = frac{3}{66} + frac{9}{66} p_8, quadp_8 = frac{2}{54} + frac{16}{54} p_9, quadp_9 = frac{5}{96} + frac{1}{96} p_1.

This yields p_1 = frac{1}{19}, so the algorithm is correct. In fact we have:p_1 = p_2 = p_3 = ldots = p_9 = frac{1}{19}quad mbox{because} quadfrac{1}{19} = frac{1}{19} x + (1-x) frac{1}{19}.

We introduce t_k, the expected number of rolls after "k" rolls, in order to verify that the algorithm fulfills the requirement of needing few rolls. Using the same probabilities as above, we find that:t_1 = frac{0}{6} + frac{6}{6} (1 + t_2), quadt_2 = frac{19}{36} + frac{17}{36} (1 + t_3), quadt_3 = frac{95}{102} + frac{7}{102} (1+ t_4)

:t_4 = frac{38}{42} + frac{4}{42} (1 + t_5), quadt_5 = frac{19}{24} + frac{5}{24} (1 + t_6), quadt_6 = frac{19}{30} + frac{11}{30} (1 + t_7)

:t_7 = frac{57}{66} + frac{9}{66} (1 + t_8), quadt_8 = frac{38}{54} + frac{16}{54} (1 + t_9), quadt_9 = frac{95}{96} + frac{1}{96} (1 + t_1).

This yields t_1 = frac{25281276}{10077695} approx 2.508636747, so we need on average two and a half rolls of a six-sided die to select one of nineteen individuals equiprobably.

The same result may be obtained by introducing the probability generating function f(x) given by:f(x) = frac{19}{6^2} x^2 + frac{95}{6^3} x^3 + frac{38}{6^4} x^4 + frac{19}{6^5} x^5 + frac{19}{6^6} x^6 + frac{57}{6^7} x^7 + frac{38}{6^8} x^8 + frac{95}{6^9} x^9 + frac{1}{6^9} x^9 f(x)and using the fact that: t_1 = left. frac{d}{dx} f(x) ight|_{x=1}= frac{25281276}{10077695}.The correctness of this equivalent approach will be shown in the next section.

Analysis of the general case

The analysis of the general case, i.e. the expected number of rolls where an "n"-sided die is used to select equiprobably from a range "m" where n < m and (n, m)=1 is very instructive, and serves to illustrate the techniques used in the manipulation of discrete random variables.

Here it is useful to introduce an infinite tree, consisting of internal nodes and of outdegree "n", where each level of the tree represents the n^k possible outcomes after "k" rolls, the root being the initial state (zero rolls). We may think of the levels being divided from left to right into three zones, a blue zone, a green one, and a red one. The blue nodes represent outcomes that correspond to a halt after fewer than "k" rolls, the green ones, to a halt at the current level (exactly "k" rolls) and the red ones, to the remainder case (another roll is required). All children of the blue nodes are blue, as are children of the green nodes. There are c_k red nodes, and of their children, c_{k+1} = n c_k , mod , m are in red in turn, while n c_k - c_{k+1}, are green. The number of green nodes is always a multiple of "m" and a fraction of 1/m of them correspond to a particular individual or value, thus showing by inspection that the algorithm samples the range equiprobably.

The three colors correspond to three sequences, (a_k), (b_k), and (c_k), where a_k/n^k is the probability of having halted after fewer than "k" rolls, b_k/n^k is the probability of halting after exactly "k" rolls, and c_k/n^k is the probability of needing at least k+1 rolls. These sequences obey the following recurrences::a_{k+1} = n a_k + n b_k, quad b_{k+1} = n c_k - c_{k+1}, quad mbox{and} quadc_{k+1} = n c_k, mod , mas well as:a_k + b_k + c_k = ,n^k.

Let "T" be the random variable giving the number of rolls.The fact that c_k/n^k is the probability of needing at least k+1 rolls may also be derived from the conditional probabilities of needing another roll at any time.:P [T ge k+1] =frac{c_k}{b_k+c_k}frac{c_{k-1{b_{k-1}+c_{k-1cdotsfrac{c_1}{c_1+b_1}.But b_k + c_k = , n c_{k-1}, so this is:frac{c_k}{n c_{k-1frac{c_{k-1{n c_{k-2cdotsfrac{c_1}{n c_0} = frac{c_k}{n^k}.

Using the definition of the expected value, we find that:E [T] = sum_{kge 1} k P [T=k] = sum_{k ge 1} P [Tge k] = sum_{kge 0} frac{c_k}{n^k}.

Recalling the definition of c_k, this becomes

:egin{align}E [T] & {} = sum_{kge 0} frac{n^k , mod , m}{n^k} \& {} = sum_{n^k < m} 1 + sum_{n^k > m} frac{n^k , mod , m}{n^k} \& {} = lfloor log_n m floor + 1 + frac{1}{n^{lfloor log_n m floor + 1 sum_{kge 0} frac{n^{k + lfloor log_n m floor + 1} , mod , m}{n^k} \& {} le lfloor log_n m floor + 1 + frac{1}{n^{lfloor log_n m floor + 1sum_{kge 0} frac{m-1}{n^k} \& {} le lfloor log_n m floor + 1 + frac{1}{n^{log_n m -1 + 1(m-1) frac{1}{1-1/n} \& {} = lfloor log_n m floor + 1 +frac{m-1}{m} frac{n}{n-1}.end{align}

Hence we need about two more rolls than lfloor log_n m floor to select equiprobably from "m" individuals using an "n"-sided die.

External links

* Quim Testar, Antonio González, Marko Riedel, et al. Aleae iactae sunt, newsgroup es.ciencia.matematicas, In Spanish.
* Quim Testar, Antonio González, Marko Riedel, et al. Acotando el numero de tiradas de dado esperadas, newsgroup es.ciencia.matematicas, In Spanish.

References

* H. Prodinger, [http://math.sun.ac.za/~prodinger/postscriptfiles/loser.ps "How to select a loser"] , Discrete Mathematics, 120 (1993) 149-159.
* Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. "Introduction to Algorithms", Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7.


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • List of mathematics articles (S) — NOTOC S S duality S matrix S plane S transform S unit S.O.S. Mathematics SA subgroup Saccheri quadrilateral Sacks spiral Sacred geometry Saddle node bifurcation Saddle point Saddle surface Sadleirian Professor of Pure Mathematics Safe prime Safe… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Statistical randomness — A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal die roll, or the digits of π exhibit statistical randomness.Statistical randomness does not …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”