The other night I got into a wikipedia hole, prompted by xkcd's excellent "What If?" post "Twitter Timeline Height." Two great finds:
The German Tank Problem
i.e. "the problem of estimating the maximum of a discrete uniform distribution from sampling without replacement." From xkcd:
Allied troops faced a version of this problem in World War II. German tank parts had serial numbers, many of which were sequential (1, 2 ... N). Suppose they captured a random tank. If they determined it was Tank #27, then they can be sure that the Germans had made at least 27 tanks. It also told them there probably weren't millions of tanks; if there were, they would have been unlikely to get a two-digit serial number.
Wikipedia describes the discrepancy, in WWII, between serial number analysis and what traditional intelligence believed about German tank production:
According to conventional Allied intelligence estimates the Germans were producing around 1,400 tanks a month between June 1940 and September 1942. Applying the formula below to the serial numbers of captured German tanks, (both serviceable and destroyed) the number was calculated to be 246 a month. After the war, captured German production figures from the ministry of Albert Speer showed the actual number to be 245.
In case that's not clear: Spies (who were presumably fooled by counterintelligence) were saying that Germany was producing about 1,400 tanks per month, but serial number analysis said that it was closer to 246. The actual number was 245.
The Two Envelopes Problem
From Wikipedia:
You have two indistinguishable envelopes that each contain money. One contains twice as much as the other. You may pick one envelope and keep the money it contains. You pick at random, but before you open the envelope, you are offered the chance to take the other envelope instead.
It can be argued that it is to your advantage to swap envelopes by showing that your expected return on swapping exceeds the sum in your envelope. This leads to the paradoxical conclusion that it is beneficial to continue to swap envelopes indefinitely.
- A partial explanation: There's a 50-50 split here - either you've got the smaller envelope or the larger one. Let's say there's $20 in your envelope:
The probability of either of these scenarios is one half, since there is a 50% chance that I initially happened to select the larger envelope and a 50% chance that I initially happened to select the smaller envelope. The expected value calculation for how much money is in the other envelope would be the amount in the first scenario times the probability of the first scenario plus the amount in the second scenario times the probability of the second scenario, which is $10 * 1/2 + $40 * 1/2. The result of this calculation is that the expected value of money in the other envelope is $25. Since this is greater than my selected envelope, it would appear to my advantage to always switch envelopes.
- This is totally weird. There are a few purported solutions on the wikipedia page, most of which elude me.
- The takeaway: I really need to get into some Bayesian probability theory. Stat.