- Birthday problem
In
probability theory , the birthday problem, [This is not aparadox in the sense of leading to alogic al contradiction, but is called a paradox because the mathematical truth contradicts naïve intuition: most people estimate that the chance is much lower than 50%.] pertains to theprobability that in a set ofrandom ly chosen people some pair of them will have the samebirthday . In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, tending toward 100% as the pool of people grows. [ Note that birthdays are not evenly distributed throughout the year; not only does February 29 occur less than a quarter as often as any other day, but birth rates vary for the other 365 days.] The mathematics behind this problem leads to a well-known cryptographic attack called thebirthday attack ."'
Understanding the problem
The birthday problem asks whether "any" of the 23 people have a matching birthday with "any" of the others — not one in particular. (See "Same birthday as you" below for an analysis of this much less surprising alternative problem.)
In a list of 23 people, comparing the birthday of the first person on the list to the others allows 22 chances for a matching birthday, but comparing every person to all of the others allows 253 distinct chances: in a group of 23 people there are 23×22/2 = 253 pairs. The probability that two people chosen from the entire population at random have the same birthday is 1/365. Although the pairings in a group of 23 people are not statistically equivalent to 253 pairs chosen independently, the birthday paradox becomes less surprising if a group is thought of in terms of the number of possible pairs, rather than the number of individuals.
It is easier to figure the probability that the birthdays will be different, such as: with one person they have 365 opportunities to have a different birthday. The second person only has 364 possibilities to have a different birthday than the first person. The third person has 363 days, and so on. Thus when the group reaches 366 a clash is inevitable — all the days will have been used up, save for leap years.
Calculating the probability
To compute the approximate probability that in a room of "n" people, at least two have the same birthday, we disregard variations in the distribution, such as
leap year s,twin s, seasonal or weekday variations, and assume that the 365 possible birthdays are equally likely. Real-life birthday distributions are not uniform since not all dates are equally likely. [In particular, many children are born in the summer, especially the months of August and September (for the northern hemisphere) [http://scienceworld.wolfram.com/astronomy/LeapDay.html] , and in the U.S. it has been noted that many children are conceived around the holidays ofChristmas andNew Year's Day ; and, in environments like classrooms where many people share a birth year, it becomes relevant that due to the way hospitals work, where C-sections and induced labor are not generally scheduled on the weekend, more children are born on Mondays and Tuesdays than on weekends. Both of these factors tend to increase the chance of identical birth dates, since a denser subset has more possible pairs (in the extreme case when everyone was born on three days, there would obviously be many identical birthdays). The birthday problem for such non-constant birthday probabilities was tackled byMurray Klamkin in 1967. A formal proof that the probability of two matching birthdays is least for a uniform distribution of birthdays was given by D. Bloom (1973)]It is easier to first calculate the probability "p"("n") that all "n" birthdays are "different". If "n" > 365, by the
pigeonhole principle this probability is 0. On the other hand, if "n" ≤ 365, it is:ar p(n) = 1 imes left(1-frac{1}{365} ight) imes left(1-frac{2}{365} ight) cdots left(1-frac{n-1}{365} ight) = { 365 imes 364 cdots (365-n+1) over 365^n } = { 365! over 365^n (365-n)!}
because the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), etc.
The event of at least two of the "n" persons having the same birthday is complementary to all "n" birthdays being different. Therefore, its probability "p"("n") is
:p(n) = 1 - ar p(n) .
This probability surpasses 1/2 for "n" = 23 (with value about 50.7%). The following table shows the probability for some other values of "n" (This table ignores the existence of leap years, as described above):
Thus in a group of just seven random people, it is more likely than not that two of them will have a birthday within a week of each other.M. Abramson and W. O. J. Moser (1970) "More Birthday Surprises",
American Mathematical Monthly 77, 856–858]Collision counting
The probability that the "k"th integer randomly chosen from [1, "d"] will repeat at least one previous choice equals "q"("k" − 1; "d") above. The expected total number of times a selection will repeat a previous selection as "n" such integers are chosen equals
:sum_{k=1}^n q(k-1;d) = n - d + d left (frac {d-1} {d} ight )^n.
Average number of people
In an alternative formulation of the birthday problem, one asks the "average" number of people required to find a pair with the same birthday. The problem is relevant to several hashing algorithms analyzed by
Donald Knuth in his monumental work "The Art of Computer Programming ". It may be shownD. E. Knuth; "The Art of Computer Programming . Vol. 3, Sorting and Searching" (Addison-Wesley, Reading, Massachusetts, 1973)] P. Flajolet, P. J. Grabner, P. Kirschenhofer, H. Prodinger (1995), "On Ramanujan's Q-Function", Journal of Computational and Applied Mathematics 58, 103–116] that if one samples uniformly, with replacement, from a population of size M, the number of trials required for the first repeated sampling of "some" individual hasexpected value scriptstyleoverline{n},=,1+Q(M), where: Q(M)=sum_{k=1}^{M} frac{M!}{(M-k)! M^k}.
The function
: Q(M)= 1 + frac{M-1}{M} + frac{(M-1)(M-2)}{M^2} + cdots + frac{(M-1)(M-2) cdots 1}{M^{M-1
has been studied by
Srinivasa Ramanujan and hasasymptotic expansion :: Q(M)simsqrt{frac{pi M}{2-frac{1}{3}+frac{1}{12}sqrt{frac{pi}{2n-frac{4}{135n}+ldots.
With "M" = 365 days in a year, the average number of people required to find a pair with the same birthday is scriptstyleoverline{n},=,1+Q(M)approx 24.61658, slightly more than the number required for a 50% chance. In the best case, two people will suffice; at worst, the maximum possible number of "M" + 1 = 366 people is needed; but on
average , only 25 people are required.An "informal" demonstration of the problem can be made from the
List of Prime Ministers of Australia , in whichPaul Keating , the 24th Prime Minister, is the first to share a birthday with another on the list.Partition problem
A related problem is the
partition problem , a variant of theknapsack problem from computer science. Some weights are put on a balance; each weight is an integer number of grams randomly chosen between one gram and one million grams (one metric ton). The question is whether you can usually (that is, with probability close to 1) transfer the weights between the left and right arms to balance the scale. (In case the sum of all the weights is an odd number of grams, a discrepancy of one gram is allowed.) If there are only two or three weights, the answer is very clearly no; although there are some combinations which work, the majority of randomly selected combinations of three weights do not. If there are very many weights, the answer is clearly yes. The question is, how many are just sufficient? That is, what is the number of weights such that it is equally likely for it to be possible to balance them as impossible?Some people's intuition is that the answer is above 100,000. Most people's intuition is that it is in the thousands or tens of thousands, while others feel it should at least be in the hundreds. The correct answer is approximately 23.
The reason is that the correct comparison is to the number of partitions of the weights into left and right. There are 2"N"−1 different partitions for "N" weights, and the left sum minus the right sum can be thought of as a new random quantity for each partition. The distribution of the sum of weights is approximately Gaussian, with a peak at 1,000,000 "N" and width scriptstyle 1,000,000sqrt{N}, so that when 2"N"−1 is approximately equal to scriptstyle 1,000,000sqrt{N} the transition occurs. 223−1 is about 4 million, while the width of the distribution is only 5 million. [C. Borgs, J. Chayes, and B. Pittel (2001) "Phase Transition and Finite Size Scaling in the Integer Partition Problem", Random Structures and Algorithms 19(3–4), 247–288.]
Notes
References
*E. H. McKinney (1966) "Generalized Birthday Problem",
American Mathematical Monthly 73, 385–387.
*M. Klamkin and D. Newman (1967) "Extensions of the Birthday Surprise", Journal of Combinatorial Theory 3, 279–282.
*M. Abramson and W. O. J. Moser (1970) "More Birthday Surprises",American Mathematical Monthly 77, 856–858
*D. Bloom (1973) "A Birthday Problem",American Mathematical Monthly 80, 1141–1142.
*Shirky, Clay "Here Comes Everybody: The Power of Organizing Without Organizations", (2008.) New York. 25–27.External links
* http://www.efgh.com/math/birthday.htm
* http://planetmath.org/encyclopedia/BirthdayProblem.html
*
* [http://www.nestel.net/maple/bd/bd.html Maple vs. birthday paradox]
* [http://www.damninteresting.com/?p=402 A humorous article explaining the paradox]
* [http://www.excelexchange.com/Birthday%20Problem.xls The Birthday Problem Spreadsheet]
* [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_BirthdayExperiment SOCR EduMaterials Activities BirthdayExperiment]
* [http://jeff.aaron.ca/cgi-bin/birthday The Birthday Paradox: Odds Calculator]
Wikimedia Foundation. 2010.