- Sampling equiprobably with dice
An illustrative example of how to compute with the probabilities associated to

dice , as well as theanalysis of algorithms , is the following problem. Suppose you have a group of 19 individuals and an ordinary six-sided die. Your task is to use that die to select one of these individuals equiproabably, i.e. so that all individuals are equally likely, having probability 1/19 of being selected, and using as few rolls of the die as possible. This is an example of the more general problem of using dice to sample a range equiprobably, where the range has no common factor with the number of faces of the die, which is in turn equivalent to one of the key problems of the algorithmics ofrandom number generator s. This is because with today's hardware, even if a random number generator samples a physical source, it will need to load the sample into a "discrete" register consisting of a finite number of bits. Hence the question arises of how to sample equiprobably from a range not being a power of two using arandom number generator that returns some fixed number of random bits.The main result is that the expected number $E\; [T]$ of rolls when an "n"-sided die is used to select equiprobably from a range "m" where $n\; <\; m$ and $(n,m)\; =\; 1$ and the

discrete random variable "T" represents the number of rolls, obeys the bound:$E\; [T]\; le\; lfloor\; log\_n\; m\; floor\; +\; 1\; +frac\{m-1\}\{m\}\; frac\{n\}\{n-1\}.$This article presents the basic concepts and a basic analysis of the problem. For a much more sophisticated treatment of the case $n=2$, consult the article by H. Prodinger, who also discusses the problem in terms ofleader election on a network. (The reference is here).**Building the algorithm**First, we return to the case of an ordinary die and nineteen individuals. A moment's thought reveals that a bounded number of rolls of the die, say "k", will never suffice, as there is no way to distribute the values from one to nineteen among the $6^k$ possible outcomes so that all values are equally likely. We note, however, that if $6^k$ is large, and we use $q\; ,\; mod\; ,19$, where $qin\; [0,\; 6^k)$ is the outcome, to choose the individual, the probabilities will be very close to $frac\{1\}\{19\}$, being perturbed only slightly by the remainder $r\; =\; 6^k\; ,mod\; ,19.$ This suggests that we roll the die until the number $6^k$ of possible outcomes is at least nineteen, and take $q\; ,\; mod\; ,19$, if $q\; <\; 6^k\; -\; r.,$ If $q\; ge\; 6^k\; -\; r,$ however, we will need to roll the die again.

Recall that we require as few rolls of the die as possible. Therefore we should make use of the $r$ discarded outcomes that trigger another set of rolls, so as to not lose any valuable random bits. Each of these discarded outcomes may combine with any one of the outcomes of the next set of rolls, forming $6r$ possible pairs. This finally suggests the following procedure. Introduce the variable $c\_k$, where "k" is the number of rolls, and set $c\_0\; =\; 1.$ This is the sequence of remainders "r". Similarly, introduce the variable $q\_k$ and set $q\_0\; =\; 0.$ This is the index into the remainder interval if applicable. At every step, roll the die, obtaining a value "q" between zero and five, i.e. modulo six. Let $c\_k\; =\; 6\; c\_\{k-1\}\; ,mod,\; 19$ and $p\; =\; 6\; q\_\{k-1\}\; +\; q.,$ If $p\; <\; 6\; c\_\{k-1\}\; -\; c\_k,$, take $p\; ,\; mod\; ,19$ and halt, otherwise, set $q\_k\; =\; p\; -\; (6\; c\_\{k-1\}\; -\; c\_k),$ and repeat.

This algorithm is actually quite simple. Roll the die. If the value obtained is below the remainder interval, take the value modulo nineteen. If not, use six times the remainder interval as the new available range. Roll again and combine with the previous index into the remainder interval to obtain an index into the new range, and repeat.

**Analysis**This procedure may be analysed in various ways. The simplest is to fix an individual and compute the probability of his/her being chosen. If this turns out to be $frac\{1\}\{19\}$, then the algorithm is indeed correct. With this in mind we introduce the probability $p\_k$ of the individual being chosen after "k" rolls, and make use of the fact that the sequence of remainders must repeat with a period that divides $varphi(19)\; =\; 18$, according to

Fermat's little theorem .We find that:$p\_1\; =\; frac\{0\}\{6\}\; +\; frac\{6\}\{6\}\; p\_2,\; quadp\_2\; =\; frac\{1\}\{36\}\; +\; frac\{17\}\{36\}\; p\_3,\; quadp\_3\; =\; frac\{5\}\{102\}\; +\; frac\{7\}\{102\}\; p\_4$

:$p\_4\; =\; frac\{2\}\{42\}\; +\; frac\{4\}\{42\}\; p\_5,\; quadp\_5\; =\; frac\{1\}\{24\}\; +\; frac\{5\}\{24\}\; p\_6,\; quadp\_6\; =\; frac\{1\}\{30\}\; +\; frac\{11\}\{30\}\; p\_7$

:$p\_7\; =\; frac\{3\}\{66\}\; +\; frac\{9\}\{66\}\; p\_8,\; quadp\_8\; =\; frac\{2\}\{54\}\; +\; frac\{16\}\{54\}\; p\_9,\; quadp\_9\; =\; frac\{5\}\{96\}\; +\; frac\{1\}\{96\}\; p\_1.$

This yields $p\_1\; =\; frac\{1\}\{19\}$, so the algorithm is correct. In fact we have:$p\_1\; =\; p\_2\; =\; p\_3\; =\; ldots\; =\; p\_9\; =\; frac\{1\}\{19\}quad\; mbox\{because\}\; quadfrac\{1\}\{19\}\; =\; frac\{1\}\{19\}\; x\; +\; (1-x)\; frac\{1\}\{19\}.$

We introduce $t\_k$, the expected number of rolls after "k" rolls, in order to verify that the algorithm fulfills the requirement of needing few rolls. Using the same probabilities as above, we find that:$t\_1\; =\; frac\{0\}\{6\}\; +\; frac\{6\}\{6\}\; (1\; +\; t\_2),\; quadt\_2\; =\; frac\{19\}\{36\}\; +\; frac\{17\}\{36\}\; (1\; +\; t\_3),\; quadt\_3\; =\; frac\{95\}\{102\}\; +\; frac\{7\}\{102\}\; (1+\; t\_4)$

:$t\_4\; =\; frac\{38\}\{42\}\; +\; frac\{4\}\{42\}\; (1\; +\; t\_5),\; quadt\_5\; =\; frac\{19\}\{24\}\; +\; frac\{5\}\{24\}\; (1\; +\; t\_6),\; quadt\_6\; =\; frac\{19\}\{30\}\; +\; frac\{11\}\{30\}\; (1\; +\; t\_7)$

:$t\_7\; =\; frac\{57\}\{66\}\; +\; frac\{9\}\{66\}\; (1\; +\; t\_8),\; quadt\_8\; =\; frac\{38\}\{54\}\; +\; frac\{16\}\{54\}\; (1\; +\; t\_9),\; quadt\_9\; =\; frac\{95\}\{96\}\; +\; frac\{1\}\{96\}\; (1\; +\; t\_1).$

This yields $t\_1\; =\; frac\{25281276\}\{10077695\}\; approx\; 2.508636747,$ so we need on average two and a half rolls of a six-sided die to select one of nineteen individuals equiprobably.

The same result may be obtained by introducing the

probability generating function $f(x)$ given by:$f(x)\; =\; frac\{19\}\{6^2\}\; x^2\; +\; frac\{95\}\{6^3\}\; x^3\; +\; frac\{38\}\{6^4\}\; x^4\; +\; frac\{19\}\{6^5\}\; x^5\; +\; frac\{19\}\{6^6\}\; x^6\; +\; frac\{57\}\{6^7\}\; x^7\; +\; frac\{38\}\{6^8\}\; x^8\; +\; frac\{95\}\{6^9\}\; x^9\; +\; frac\{1\}\{6^9\}\; x^9\; f(x)$and using the fact that:$t\_1\; =\; left.\; frac\{d\}\{dx\}\; f(x)\; ight|\_\{x=1\}=\; frac\{25281276\}\{10077695\}.$The correctness of this equivalent approach will be shown in the next section.**Analysis of the general case**The analysis of the general case, i.e. the expected number of rolls where an "n"-sided die is used to select equiprobably from a range "m" where $n\; <\; m$ and $(n,\; m)=1$ is very instructive, and serves to illustrate the techniques used in the manipulation of

discrete random variable s.Here it is useful to introduce an infinite

tree , consisting of internal nodes and of outdegree "n", where each level of the tree represents the $n^k$ possible outcomes after "k" rolls, the root being the initial state (zero rolls). We may think of the levels being divided from left to right into three zones, a blue zone, a green one, and a red one. The blue nodes represent outcomes that correspond to a halt after fewer than "k" rolls, the green ones, to a halt at the current level (exactly "k" rolls) and the red ones, to the remainder case (another roll is required). All children of the blue nodes are blue, as are children of the green nodes. There are $c\_k$ red nodes, and of their children, $c\_\{k+1\}\; =\; n\; c\_k\; ,\; mod\; ,\; m$ are in red in turn, while $n\; c\_k\; -\; c\_\{k+1\},$ are green. The number of green nodes is always a multiple of "m" and a fraction of $1/m$ of them correspond to a particular individual or value, thus showing by inspection that the algorithm samples the range equiprobably.The three colors correspond to three sequences, $(a\_k),\; (b\_k),$ and $(c\_k)$, where $a\_k/n^k$ is the probability of having halted after fewer than "k" rolls, $b\_k/n^k$ is the probability of halting after exactly "k" rolls, and $c\_k/n^k$ is the probability of needing at least $k+1$ rolls. These sequences obey the following recurrences::$a\_\{k+1\}\; =\; n\; a\_k\; +\; n\; b\_k,\; quad\; b\_\{k+1\}\; =\; n\; c\_k\; -\; c\_\{k+1\},\; quad\; mbox\{and\}\; quadc\_\{k+1\}\; =\; n\; c\_k,\; mod\; ,\; m$as well as:$a\_k\; +\; b\_k\; +\; c\_k\; =\; ,n^k.$

Let "T" be the random variable giving the number of rolls.The fact that $c\_k/n^k$ is the probability of needing at least $k+1$ rolls may also be derived from the conditional probabilities of needing another roll at any time.:$P\; [T\; ge\; k+1]\; =frac\{c\_k\}\{b\_k+c\_k\}frac\{c\_\{k-1\{b\_\{k-1\}+c\_\{k-1cdotsfrac\{c\_1\}\{c\_1+b\_1\}.$But $b\_k\; +\; c\_k\; =\; ,\; n\; c\_\{k-1\},$ so this is:$frac\{c\_k\}\{n\; c\_\{k-1frac\{c\_\{k-1\{n\; c\_\{k-2cdotsfrac\{c\_1\}\{n\; c\_0\}\; =\; frac\{c\_k\}\{n^k\}.$

Using the definition of the

expected value , we find that:$E\; [T]\; =\; sum\_\{kge\; 1\}\; k\; P\; [T=k]\; =\; sum\_\{k\; ge\; 1\}\; P\; [Tge\; k]\; =\; sum\_\{kge\; 0\}\; frac\{c\_k\}\{n^k\}.$Recalling the definition of $c\_k$, this becomes

:$egin\{align\}E\; [T]\; \{\}\; =\; sum\_\{kge\; 0\}\; frac\{n^k\; ,\; mod\; ,\; m\}\{n^k\}\; \backslash \; \{\}\; =\; sum\_\{n^k\; m\}\; 1\; +\; sum\_\{n^k\; m\}\; frac\{n^k\; ,\; mod\; ,\; m\}\{n^k\}\; \backslash \; \{\}\; =\; lfloor\; log\_n\; m\; floor\; +\; 1\; +\; frac\{1\}\{n^\{lfloor\; log\_n\; m\; floor\; +\; 1\; sum\_\{kge\; 0\}\; frac\{n^\{k\; +\; lfloor\; log\_n\; m\; floor\; +\; 1\}\; ,\; mod\; ,\; m\}\{n^k\}\; \backslash \; \{\}\; le\; lfloor\; log\_n\; m\; floor\; +\; 1\; +\; frac\{1\}\{n^\{lfloor\; log\_n\; m\; floor\; +\; 1sum\_\{kge\; 0\}\; frac\{m-1\}\{n^k\}\; \backslash \; \{\}\; le\; lfloor\; log\_n\; m\; floor\; +\; 1\; +\; frac\{1\}\{n^\{log\_n\; m\; -1\; +\; 1(m-1)\; frac\{1\}\{1-1/n\}\; \backslash \; \{\}\; =\; lfloor\; log\_n\; m\; floor\; +\; 1\; +frac\{m-1\}\{m\}\; frac\{n\}\{n-1\}.end\{align\}$

Hence we need about two more rolls than $lfloor\; log\_n\; m\; floor$ to select equiprobably from "m" individuals using an "n"-sided die.

**External links*** Quim Testar, Antonio González, Marko Riedel, et al.

*Aleae iactae sunt*, newsgroup es.ciencia.matematicas, In Spanish.

* Quim Testar, Antonio González, Marko Riedel, et al.*Acotando el numero de tiradas de dado esperadas*, newsgroup es.ciencia.matematicas, In Spanish.**References*** H. Prodinger, [

*http://math.sun.ac.za/~prodinger/postscriptfiles/loser.ps "How to select a loser"*] , Discrete Mathematics, 120 (1993) 149-159.

*Thomas H. Cormen ,Charles E. Leiserson ,Ronald L. Rivest , andClifford Stein . "Introduction to Algorithms ", Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7.

*Wikimedia Foundation.
2010.*