## March 27, 2006

### Back from Portland

Great times in Portland. I can now add to my store of anecdotes:

Oregon: amused a stranger in an elevator (and horrified my companions) by arguing that since my room was on the third floor, it was reasonable to expect that the hotel was on average only six floors tall

and I have to say I don't see what's so bad about that argument. Except that my prior probability that the hotel was six floors tall should've been pretty low. And that I had additional evidence for the height of the hotel, having been in the elevator. Well, OK.

I'd also like to change my Texas anecdote to:

Texas: unloaded a truck with a friend who happened to show up wearing the same shirt

Richard Mason has been tracking the progress of the states meme.

And, of course, the APA was a great time, with lots of good philosophy, which I will probably not blog about because I've come back to lots of work.

Posted by Matt Weiner at March 27, 2006 08:56 AM

This kind of scares me.

Posted by: joe o at March 27, 2006 03:43 PM

"Doomsday" is a fun word. I think it is one of my favorite words. Doomsday argument. Doomsday Book. Doomsday weapon. Doomsday equation. This is a good book by the way.

Posted by: Richard Mason at March 28, 2006 11:17 PM

What the hell does it mean to say that a particular hotel is *on average* six stories tall? I would prefer my hotels to be a fixed number of stories, thank you very much.

Posted by: Leo at March 29, 2006 12:25 AM

(2n - 1) is a better estimate than (2n). You should have concluded that the hotel was five floors tall!

Posted by: Richard Mason at March 29, 2006 09:45 AM

(2n - 1) is a better estimate

This can be shown by considering that if a person on each floor of the building is queried and each person on the nth floor guesses (2n - 1), then their average guess will be correct.

It follows, with inexorable logic, that you should always assume your hotel has an odd number of floors.

Posted by: Richard Mason at March 29, 2006 11:23 AM

No rooms on the first floor, though. (Like I thought of that answer in advance. Also, I think it yields that I should have said 2n-2.)

Posted by: Matt Weiner at March 29, 2006 01:56 PM

Leo, I think I have to claim that "on average" is indissoluble from "reasonable to expect" there. Anyway, it becomes increasingly clear why everyone was laughing.

Richard, I thought that link would go here, which is a good book, though I prefer this.

Posted by: Matt Weiner at March 29, 2006 02:09 PM

I have read the first but not the second.

Posted by: Richard Mason at March 29, 2006 02:52 PM

A propos of not much (Connie Willis), did you know that Stanislas Lem has died?

Posted by: Matt's mom at March 29, 2006 03:45 PM

Hey math geniuses,

It sounds like you are assuming a uniform prior probability dn=constant for the number of floors n in the hotel. I doubt this is correct (insofar as a choice of prior probability can be "correct" or not, to a Bayesian). The number of floors is bounded positive, which to many Bayesians would suggest a prior like dn/n (even though that is an improper prior). Although I'm not sure if this changes given that the number of floors is usually an integer, which eliminates a certain type of scaling argument for the dx/x prior.

If you saw "Being John Malkovich," you know that while a prior assumption that the number of floors is an integer is plausible, it is not always correct. This is the type of thing a good experimental designer needs to keep in mind.

Posted by: Ben at March 30, 2006 10:37 AM

If you want the probability to add up to 1 don't you need to make it 1/2^n or something like that? I will disregard your dn, which I suspect involves some kind of icky continuous distribution; if you want to fight about that you'll have to listen to my explanation of why Being John Malkovich's plot doesn't make sense, when you think about it. (Note also my remark about the priors in the post.)

I'm actually thinking of using a similar argument in re priors to prove that the Doomsday Argument isn't so worrisome, but I've been too fuzzy-headed to actually work out the math.

Posted by: Matt Weiner at March 30, 2006 03:58 PM

You can't normalize the probability density dP = dx/x on positive values because its integral from 0 to infinity (or 1 to infinity) is itself infinite. This is what is meant by an improper prior. By dn/n I meant just the same thing but a discrete version, so I guess it should be P(n) = 1/n.

It may seem that improper priors are destined to give you trouble, but actually once you try calculating a posterior probability the normalization often cancels out. For example see http://en.wikipedia.org/wiki/Prior_distribution. The dx/x prior is often used for something that ought to be scale-free; wikipedia gives the usual example of a length, where you know it's positive but the form of the probability shouldn't depend on the units.

I read, or skimmed, that Doomsday Argument page and it seemed to continually evade the problem of priors and then magically wave its hand at the end and say changing the priors wouldn't help. If you read it very carefully there is a place in the argument quoted at the bottom where the author sets up a calculation and then says: Well we assume that Pr(i) = 0 for i greater than some n, to avoid infinities. Isn't that basically assuming the answer?

A lot of the Doomsday Argument page reduced the problem to various bafflegab analogies where the choice is between two alternatives (few/many, 100 balls/urn vs 1 million, etc). I doubt that these analogies are a good way to address the problem. Another way of looking at it is that the entire argument is an argument about how to normalize probabilities or expectations. For example: if there are 10 or 11 balls in the urn (or number of people that have ever lived) that difference of 1 is significant. But if there are 1 million, versus 1 million and 1, the difference of 1 is not really significant. You could say that in an iteration of this experiment, though P(10) is greater than P(1000000), the "many" case is the sum of P(1000000), P(1000001, P(1000002) and so on.

In fact, a sensible way to think about this is logarithmically: to talk about the probabilities per decade (order of magnitude), i.e. P(10-99), P(100-999), P(1000-9999) and so on. This is often how people think when using "few" and "many" in colloquial terms. The improper prior dx/x is exactly the prior which assigns equal probability to each decade.

Improper priors are mildly controversial, but not really. I think Rich Gott understands this, but I'm not sure that Nick Bostrom does from reading his page.

Posted by: Ben at March 30, 2006 04:59 PM

I could do an entire blog on the states meme. And I just might.

Posted by: teofilo at March 30, 2006 08:55 PM

Ben: It sounds like you are assuming a uniform prior probability dn=constant for the number of floors n in the hotel.

No, I don't think so. At least, that is not required for (2n-1) to be an unbiased estimate of building height, which will be correct on average, regardless of what power law distribution building heights actually do or don't follow.

Then again, "edge" is an unbiased estimate of the result of a coin toss.

I am assuming that the number of floors is a positive integer.

Posted by: Richard Mason at March 31, 2006 01:12 PM

Richard, I don't know how to work power laws in this context, but it seems to me that if the prior probability of the hotel being around 20 stories high was a lot higher than the prior probability of its being 5 stories high, then my expected value for the height of the hotel should have been higher than 5 even given that my room is on the 3rd floor. I think that's what Ben meant by the uniform prior. Or am I messing something up with power law distributions? Pls show work.

Posted by: Matt Weiner at March 31, 2006 05:23 PM

Let's begin by observing that the DA will often give wrong guesses. If the building code requires that all hotels have an even number of floors, then a guess of (2n - 1) floors will never, ever be right. But at least it will still have zero bias. If you repeat the experiment over and over, the total number of floors guessed too low and the total number of floors guessed too high will balance.

Because this is true for the experiment repeated over and over in an individual hotel, no matter how high, it is also true for the experiment repeated over and over in any collection of hotels.

Now, if you have prior knowledge about the heights of hotels, then certainly you can make a better guess about the height of the hotel you're in. You can't improve the bias, but you can make a guess which is better by some other metric (e.g., likelihood of being correct).

For example, suppose you know for a fact that 99% of hotels have 20 floors and only 1% have 5 floors. Accordingly, you always guess (even when you are on the 3rd floor) that your hotel has 20 floors. This guess is probably exactly right (but note that this guess has non-zero bias, since when it is wrong, it is always wrong by fifteen floors to the high side).

Ben is likely correct that a 1/n distribution of hotel heights is more plausible than a uniform distribution of hotel heights. I am making the limited point that the zero bias of the DA does not assume a uniform distribution of hotel heights. It makes no assumption at all about the distribution of hotel heights, which some might say was its charm.

Posted by: Richard Mason at April 1, 2006 08:51 AM

I believe I was wrong about 2n-1 and uniform priors. Richard has demonstrated that it is from a basically frequentist argument. Frequentist statements about probability are about convergence in the limit of large samples, or large ensembles of reality. So if a hotel has 1..N floors uniformly occupied, and we poll people on their floor numbers F, eventually we'll converge to an average floor number F_mean = (N+1)/2. That's okay. Richard pointed out using the estimator N_est = 2F-1 is unbiased in the large sample limit, but also has the strange property that if you only measure one F, N is always odd.

I don't want to get into a holy war about Bayesian and frequentist methods, but two issues with this method are that there is no way to factor in your prior beliefs about the model, and that you have to do a lot more work to estimate confidence limits. That is, suppose I measure one F. My estimate for the number of floors is N_est = 2F-1. But is that the mean? Expectation value? What is the probability distribution of N? There's certainly no guarantee that it is normally distributed!

A Bayesian approach lets you calculate this easily. One aphorism is that it calculates the probability of the model given the data, while frequentism usually calculates the probability of the data given the model. Bayes's theorem, a fairly basic statement about conditional probabilities, is:

P(A_i|B) = [ P(B|A_i) * P(A_i) ] / [ sum_j P(B|A_j)P(A_j) ]

If that looks like gibberish look at the wikipedia page and the formula under improper priors.

Call A_i the model where the hotel has i floors, and B the data (in this case the measurement that you are on floor F). P(A_i) is your prior probability for A_i. For example you could assume a uniform prior between 1 and 100 floors, or a prior that falls as 1/i up to 100 floors, or so on. The probability P(B|A_j) is the probability that you get the data B=F given the model A_j. In this case the probability P(B|A_j) is 0 if F>j and 1/j if F<=j. (Uniform probability among the j floors; zero chance of being on the 6th floor of a 5 story hotel.)

This is enough information to solve for the posterior distribution P(A_i|B), the probability distribution of models with i floors given the data B (that you are on the F'th floor). You need to assume a prior P(A_j) and then it's all just sums. I'll give an example in the next comment.

Posted by: Ben at April 1, 2006 02:15 PM

As an example, we can test the effect of different priors - essentially our preconceptions about hotels. Here is a plot showing four prior probability distributions for the number of floors: 1. uniform between 1 and 100; 2. 1/N between 1 and 100 and 0 above 100; 3. 1/N between 1 and 1000 and 0 above 1000 (allows for a small probability of very tall hotels); 4. Gaussian with mean 20 and dispersion 10, and left-truncated at 1. I don't think I can do inline images so I'll put links:

Prior probabilities (beliefs) for the hotel N floors problem

Matt, please feel free to download the plots and host them on your webpage, as there is no guarantee they'll stay forever on mine.

Now, if you land on floor F, I can do the calculation described above via Bayes's Theorem. Here are the posterior probability distributions if you were on floor 5:

Posterior probabilities if you're on floor 5

And here are the posterior probability distributions if you were on floor 30:

Posterior probabilities if you're on floor 30

The primary effect here is a sort of convolution of the prior with the probability of observation, that cuts off below F and falls as 1/j; this is why even for the uniform prior (black line), the posterior falls for large N.

Given these probability distributions you can start to answer questions about the model. Because the distributions are highly non-Gaussian they don't always behave in a typical way. The modal or most likely value, the median value, and the mean or expectation value can be significantly different. For the distributions with long tails to high N the mean is greater than the median which is greater than the mode.

If you take a second measurement - you meet someone in the elevator and discover they are on floor 8 - you can repeat the calculation but using the posterior probability as the new input prior.
If you get enough data, the model is well-constrained enough that it ceases to matter what your ingoing prior was.

Posted by: Ben at April 1, 2006 02:39 PM