Fisher-Yates shuffle

Fisher-Yates shuffle

The Fisher-Yates shuffle, named after Ronald Fisher and Frank Yates, also known as the Knuth shuffle, after Donald Knuth, is an algorithm for generating a random permutation of a finite set—in plain terms, for randomly shuffling the set. A variant of the Fisher-Yates shuffle, known as Sattolo's algorithm, may be used to generate cyclic permutations instead. Properly implemented, the Fisher-Yates shuffle is unbiased, so that every permutation is equally likely. The modern version of the algorithm is also rather efficient, requiring only time proportional to the number of items being shuffled and no additional storage space.

The basic process of Fisher-Yates shuffling is similar to randomly picking numbered tickets out of a hat, or cards from a deck, one after another until there are no more left. What the specific algorithm provides is a way of doing this numerically in an efficient and rigorous manner that, properly done, guarantees an unbiased result.

Fisher and Yates' original method

The Fisher-Yates shuffle, in its original form, was described in 1938 by Ronald A. Fisher and Frank Yates in their book "Statistical tables for biological, agricultural and medical research".cite book
author = Fisher, R.A.; Yates, F.
title = Statistical tables for biological, agricultural and medical research
origyear = 1938
edition = 3rd ed.
year = 1948
pages = pp. 26–27
publisher = Oliver & Boyd
location = London
oclc = 14222135
(note: 6th edition, ISBN 0-02-844720-4, is [http://digital.library.adelaide.edu.au/coll/special/fisher/stat_tab.pdf available on the web] , but gives a different shuffling algorithm by C. R. Rao) ] (Later editions describe a somewhat different method attributed to C. R. Rao.) Their method was designed to be implemented using pencil and paper, with a precomputed table of random numbers as the source of randomness. The basic method given for generating a random permutation of the numbers 1–"N" goes as follows:

# Write down the numbers from one to "N".
# Pick a random number "k" between one and the number of unstruck numbers remaining (inclusive).
# Counting from the low end, strike out the "k"th number not yet struck out, and write it down elsewhere.
# Repeat from step 2 until all the numbers have been struck out.
# The sequence of numbers written down in step 3 is now a random permutation of the original numbers.

Provided that the random numbers picked in step 2 above are truly random and unbiased, so will the resulting permutation be. Fisher and Yates took care to describe how to obtain such random numbers in any desired range from the supplied tables in a manner which avoids any bias. They also suggested the possibility of using a simpler method — picking random numbers from one to "N" and discarding any duplicates—to generate the first half of the permutation, and only applying the more complex algorithm to the remaining half, where picking a duplicate number would otherwise become frustratingly common.

The modern algorithm

The modern version of the Fisher-Yates shuffle, designed for computer use, was introduced by Richard Durstenfeld in 1964 in "Communications of the ACM" volume 7, issue 7, as "Algorithm 235: Random permutation",cite journal
title = Algorithm 235: Random permutation
first = Richard
last = Durstenfeld
journal = Communications of the ACM
issn = 0001-0782
volume = 7
issue = 7
year = 1964
month = July
pages = 420
url = http://doi.acm.org/10.1145/364520.364540
doi = 10.1145/364520.364540
] and was popularized by Donald E. Knuth in volume 2 of his book "The Art of Computer Programming" as "Algorithm P".cite book
title = The Art of Computer Programming volume 2: Seminumerical algorithms
first = Donald E.
last = Knuth
pages = 124–125
year = 1969
publisher = Addison-Wesley
location = Reading, MA
oclc = 85975465
] Neither Durstenfeld nor Knuth, in the first edition of his book, acknowledged the earlier work of Fisher and Yates in any way, and may not have been aware of it. Subsequent editions of "The Art of Computer Programming" do, however, mention Fisher and Yates' contribution.cite book
title = The Art of Computer Programming vol. 2
last = Knuth
origyear = 1969
edition = 3rd ed.
year = 1998
pages = 145–146
isbn = 0-201-89684-2
oclc = 38207978
]

The algorithm described by Durstenfeld differs from that given by Fisher and Yates in a small but significant way. Whereas a naive computer implementation of Fisher and Yates' method would spend needless time counting the remaining numbers in step 3 above, Durstenfeld's solution is to move the "struck" numbers to the end of the list by swapping them with the last unstruck number at each iteration. This reduces the algorithm's time complexity to "O"("n"), compared to "O"("n"2) for the naive implementation.cite web
first = Paul E.
last = Black
work = Dictionary of Algorithms and Data Structures
title = Fisher-Yates shuffle
publisher = National Institute of Standards and Technology
url = http://www.nist.gov/dads/HTML/fisherYatesShuffle.html
date = 2005-12-19
accessdate = 2007-08-09
] The algorithm thus becomes, for a set of N elements:

# Let "A"1 := 1, "A"2 := 2 and so on up to "A""N" := "N", and let "n" := "N".
# Pick a random number "k" between 1 and "n" inclusive.
# If "k" ≠ "n", swap the values of "A""k" and "A""n".
# Decrease "n" by one.
# Repeat from step 2 until "n" is less than 2.

The Fisher-Yates shuffle, as implemented by Durstenfeld, is an "in-place shuffle". That is, given a preinitialized array, it shuffles the elements of the array in place, rather than producing a shuffled copy of the array. This can be an advantage if the array to be shuffled is large. An example implementation of Durstenfeld's algorithm in Java (with 0-based arrays) is:

public static void shuffle (int [] array) { Random rng = new Random(); // i.e., java.util.Random. int n = array.length; // The number of items left to shuffle (loop invariant). while (n > 1) { int k = rng.nextInt(n); // 0 <= k < n. n--; // n is now the last pertinent index; int temp = array [n] ; // swap array [n] with array [k] (does nothing if k = n). array [n] = array [k] ; array [k] = temp; } }

The implementation above relies on Random.nextInt("int") providing sufficiently random and unbiased results; see below for potential problems if this is not the case.

Examples

Pencil-and-paper method

As an example, we'll permute the numbers from 1 to 8 using Fisher and Yates' original method. We'll start by writing the numbers out on a piece of scratch paper:

Now we pick the next random number from 1 to 6, and then from 1 to 5, and so on, always repeating the strike-out process as above:

The next random number we roll from 1 to 7, and turns out to be 2. Thus, we swap the 2nd and 7th numbers and move on:

The next random number we roll is from 1 to 6, and just happens to be 6, which means we leave the 6th number in the list (which, after the swap above, is now number 8) in place and just move to the next step. Again, we proceed the same way until the permutation is complete:

At this point there's nothing more that can be done, so the resulting permutation is 7 5 4 3 1 8 2 6.

Variants

Sattolo's algorithm

A very similar algorithm was published in 1986 by Sandra Sattolo for generating uniformly distributed cyclic permutations.cite conference
last = Wilson
first = Mark C.
date = 2004-06-21
title = Overview of Sattolo's Algorithm
url = http://algo.inria.fr/seminars/summary/Wilson2004b.pdf
conference = Algorithms Seminar 2002–2004
conferenceurl = http://algo.inria.fr/seminars/allyears.html
editor = F. Chyzak (ed.)
others = summary by Éric Fusy.
booktitle = INRIA Research Report
volume = 5542
pages = 105–108
id = ISSN|0249-6399
] The only difference between Durstenfeld's and Sattolo's algorithms is that in the latter, in step 2 above, the random number "k" is chosen from the range between 1 and "n"−1 (rather than between 1 and "n") inclusive. To turn the Java example above into an example of Sattolo's algorithm, simply replace rng.nextInt(n) with rng.nextInt(n-1) in the code. This simple change modifies the algorithm so that the resulting permutation always consists of a single cycle.

In fact, as described below, it's quite easy to "accidentally" implement Sattolo's algorithm when the ordinary Fisher-Yates shuffle is intended. This will bias the results by causing the permutations to be picked from the smaller set of ("N"−1)! cyclic permutations instead of the full set of all "N"! possible permutations.

The fact that Sattolo's algorithm in fact produces a cyclic permutation, and that it produces each such permutation with equal probability, may not be immediately obvious. The former can be shown inductively: Assume that, before step 2 of the modified algorithm, the permutation has "n" distinct cycles, each containing exactly one member "A""i" for which "i" ≤ "n". This is clearly true at the start, when "A""i" = "i" for all 1 ≤ "i" ≤ "N", and "n" = "N". Given the assumption, for any randomly chosen "k" < "n", "A""n" and "A""k" must belong to distinct cycles, and thus swapping their values in step 3 will merge those cycles, reducing the number of distinct cycles by one. This merged cycle will have two members ("A""n" and "A""k") with indices less than or equal to "n", but will lose one of them when "n" is correspondingly decreased by one in step 4, and thus the assumption given above will continue to hold. Eventually, of course, "n", and thus the number of cycles, will decrease down to one, at which point the algorithm will terminate.

As for the equal probability of the permutations, it suffices to observe that the modified algorithm involves ("N"−1)! distinct possible sequences of swaps, each of which clearly produces a different permutation, and each of which occurs—assuming the random number source is unbiased—with equal probability. Since ("N"−1)!, the number of distinct permutations the algorithm can produce, is also known to be exactly the total number of cyclic permutations of "N" elements, it is clear that the algorithm must be able to produce them all.

Comparison with other shuffling algorithms

The Fisher-Yates shuffle is quite efficient; indeed, its asymptotic time and space complexity are optimal. Combined with a high-quality unbiased random number source, it is also guaranteed to produce unbiased results. Compared to some other solutions, it also has the advantage that, if only part of the resulting permutation is needed, it can be stopped halfway through, or even stopped and restarted repeatedly, generating the permutation incrementally as needed.

In high-level programming languages with a fast built-in sorting algorithm, an alternative method, where each element of the set to be shuffled is assigned a random number and the set is then sorted according to these numbers, may be faster in practice, despite having worse asymptotic time complexity ("O"("n" log "n") vs. "O"("n")). Like the Fisher-Yates shuffle, this method will also produce unbiased results if correctly implemented, and may be more tolerant of certain kinds of bias in the random numbers. However, care must be taken to ensure that the assigned random numbers are never duplicated, since sorting algorithms in general won't order elements randomly in case of a tie.

A variant of the above method that has seen some use in languages that support sorting with user-specified comparison functions is to shuffle a list by sorting it with a comparison function that returns random values. However, "this does not always work": with a number of commonly used sorting algorithms, the results end up biased due to internal asymmetries in the sorting implementation. [cite web
work = require ‘brain’
title = A simple shuffle that proved not so simple after all
url = http://szeryf.wordpress.com/2007/06/19/a-simple-shuffle-that-proved-not-so-simple-after-all/
date = 2007-06-19
accessdate = 2007-08-09
]

Potential sources of bias

Care must be taken when implementing the Fisher-Yates shuffle, both in the implementation of the algorithm itself and in the generation of the random numbers it is built on, otherwise the results may show detectable bias. A number of common sources of bias have been listed below.

Implementation errors

A common error when implementing the Fisher-Yates shuffle is to pick the random numbers from the wrong range. The resulting algorithm may appear to work, but will produce biased results. For example, a common off-by-one error would be moving the line n-- "before" the line k = rng.nextInt(n) in the Java example above, so that "k" always strictly less than the index it will be swapped with. (In Java, Random.nextInt("int") returns a random non-negative integer "less than" its argument.cite web
work = Java 2 Platform SE v1.4.2 documentation
title = java.util.Random.nextInt(int)
url = http://java.sun.com/j2se/1.4.2/docs/api/java/util/Random.html#nextInt(int)
accessdate = 2007-08-09
] ) This turns the Fisher-Yates shuffle into Sattolo's algorithm, which only ever produces cyclic permutations: in particular, it is easy to see that, with this modification, the last element of the array can never end up in its original position.

Similarly, always selecting "k" from the entire range of valid array indexes on "every" iteration ("i.e." using k = rng.nextInt(array.length) in the Java example above) also produces a result which is biased, albeit less obviously so. This can be seen from the fact that doing so yields "N""N" distinct possible sequences of swaps, whereas there are only "N"! possible permutations of an "N"-element array. Since "N""N" can never be evenly divisible by "N"! (as the latter is divisible by "N"−1, which shares no prime factors with "N"), some permutations must be produced by more of the "N""N" sequences of swaps than others.

Modulo bias

Doing a Fisher-Yates shuffle involves picking uniformly distributed random integers from various ranges. Most random number generators, however—whether true or pseudorandom—will only directly provide numbers in some fixed range, such as, say, from 0 to 232−1. A simple and commonly used way to force such numbers into a desired smaller range is to apply the modulo operator; that is, to divide them by the size of the range and take the remainder. However, the need, in a Fisher-Yates shuffle, to generate random numbers in every range from 0–1 to 0–"N" pretty much guarantees that some of these ranges will not evenly divide the natural range of the random number generator. Thus, the remainders will not always be evenly distributed and, worse yet, the bias will be systematically in favor of small remainders.

For example, assume that your random number source gives numbers from 0 to 99 (as was the case for Fisher and Yates' original tables), and that you wish to obtain an unbiased random number from 0 to 15. If you simply divide the numbers by 16 and take the remainder, you'll find that the numbers 0–3 occur about 17% more often than others. This is because 16 does not evenly divide 100: the largest multiple of 16 less than or equal to 100 is 6×16 = 96, and it is the numbers in the incomplete range 96–99 that cause the bias. The simplest way to fix the problem is to discard those numbers before taking the remainder and to keep trying again until a number in the suitable range comes up. While in principle this could, in the worst case, take forever, in practice the expected number of retries will always be less than one.

A related problem occurs with implementations that first generate a random floating-point number—usually in the range [0,1)—and then multiply it by the size of the desired range and round down. The problem here is that random floating-point numbers, however carefully generated, always have only finite precision. This means that there are only a finite number of possible floating point values in any given range, and if the range is divided into a number of segments that doesn't divide this number evenly, some segments will end up with more possible values than others. While the resulting bias will not show the same systematic downward trend as in the previous case, it will still be there.

Limited PRNG state space

An additional problem occurs when the Fisher-Yates shuffle is used with a pseudorandom number generator: as the sequence of numbers output by such a generator is entirely determined by its internal state at the start of a sequence, a shuffle driven by such a generator cannot possibly produce more distinct permutations than the generator has distinct possible states. Even when the number of possible states exceeds the number of permutations, the irregular nature of the mapping from sequences of numbers to permutations means that some permutations will occur more often than others. Thus, to minimize bias, the number of states of the PRNG should exceed the number of permutations by at least several orders of magnitude.

For example, the built-in pseudorandom number generator provided by many programming languages and/or libraries may often have only 32 bits of internal state, which means it can only produce 232 different sequences of numbers. If such a generator is used to shuffle a deck of 52 playing cards, it can only ever produce a vanishingly small fraction of the 52! ≈ 2225.6 possible permutations. It's impossible for a generator with less than 226 bits of internal state to produce all the possible permutations of a 52-card deck, and for a (reasonably) unbiased shuffle, the generator must have "at least" about 250 bits of state.

A further problem occurs when a simple linear congruential PRNG is used with the divide-and-take-remainder method of range reduction described above. The problem here is that the low-order bits of a linear congruential PRNG are less random than the high-order ones: the low "n" bits of the generator themselves have a period of at most 2"n". When the divisor is a power of two, taking the remainder essentially means throwing away the high-order bits, such that one ends up with a significantly less random value.

Also, of course, no pseudorandom number generator can produce more distinct sequences than there are distinct seed values it may be initialized with. Thus, it doesn't matter much if a generator has 1024 bits of internal state if it is only ever initialized with a 32-bit seed.

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Ronald Fisher — R. A. Fisher Born 17 February 1890(1890 02 17) East Finchley, London …   Wikipedia

  • Shuffling — Shuffle redirects here. For other uses, see Shuffle (disambiguation). Shuffling is a procedure used to randomize a deck of playing cards to provide an element of chance in card games. Shuffling is often followed by a cut, to help ensure that the… …   Wikipedia

  • Permutation aléatoire — Une permutation aléatoire de taille N est une permutation prise de manière uniforme dans l ensemble des permutations de taille N. Par exemple pour N=5, nous pouvons obtenir (15423) ou encore (34125). Les permutations aléatoires peuvent être mises …   Wikipédia en Français

  • List of mathematics articles (F) — NOTOC F F₄ F algebra F coalgebra F distribution F divergence Fσ set F space F test F theory F. and M. Riesz theorem F1 Score Faà di Bruno s formula Face (geometry) Face configuration Face diagonal Facet (mathematics) Facetting… …   Wikipedia

  • Permutation — For other uses, see Permutation (disambiguation). The 6 permutations of 3 balls In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting (rearranging) objects or values.… …   Wikipedia

  • Steinhaus-Johnson-Trotter algorithm — The Steinhaus Johnson Trotter algorithm or Johnson Trotter algorithm is an algorithm which generates permutations by transposing elements.AlgorithmThe algorithm is setup with the idea that only one set of neighbors needs to swap positions and… …   Wikipedia

  • List of permutation topics — This is a list of topics on mathematical permutations.*Alternating group *Alternating permutation *Bijection *Circular shift *Combination *Cycle index *Cycle notation *Cyclic order *Cyclic permutation *Derangement *Even and odd permutations… …   Wikipedia

  • performing arts — arts or skills that require public performance, as acting, singing, or dancing. [1945 50] * * * ▪ 2009 Introduction Music Classical.       The last vestiges of the Cold War seemed to thaw for a moment on Feb. 26, 2008, when the unfamiliar strains …   Universalium

  • List of Alpha Phi Alpha brothers — The list of Alpha Phi Alpha brothers (commonly referred to as Alphas cite web |url=http://www.union.arizona.edu/csil/greek/chapters/view.php?id=39 |title=Arizona Student Unions |work=Fraternity and sorority programs |publisher=University of… …   Wikipedia

  • literature — /lit euhr euh cheuhr, choor , li treuh /, n. 1. writings in which expression and form, in connection with ideas of permanent and universal interest, are characteristic or essential features, as poetry, novels, history, biography, and essays. 2.… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”