John Tierney today writes about Richard Gott’s Copernican principle. He has a little more on his blog, along with some useful discussion from Bradley Monton. The principle in question says that you should treat the time of your observation of some event as being a random point in its duration. Slightly more formally, quoting Gott via a paper Monton wrote with Brian Kierland,

Assuming that whatever we are measuring can be observed only in the interval between times t_{begin}and t_{end}, if there is nothing special about t_{now}, we expect t_{now}to be located randomly in this interval.

As Monton and Kierland note, we can use this to argue that the probability that

t_{future}is betweenaandbtimes t_{past}

is 1/ ( *a* + 1) – 1 / ( *b* + 1), where t_{past} is the past duration of the event in question, and t_{future} is its future duration. Most discussion of this has focussed on the case where *a* = *b* = 39. But I think the more interesting, or at least easy to interpret, case is where *a* = 0 and *b* = 1. In this case we get the result that the probability of the entity in question lasting longer into the future than its current life-span is 1/2.

As a rule I tend to be very hostile to these attempts to get precise probabilities from very little data. I have a short argument against Gott’s Copernican formula below. (Against the general version, not for any particular values of *a* and *b*.) But first I want to try a little mockery. I’d like to know anyone who would like to take any of the following bets.

Wikipedia’s History of the Internet dates the founding of the World Wide Web to around the early 1990s, so it is 15 or so years old. Gott’s formula would say that it less than 50/50 that it will survive until around 2025. I’ll take that bet if anyone is offering.

The iPhone has been around for about 3 weeks at this time of writing. Again, Gott’s formula would suggest that it is 50/50 that it will last for more than 3 weeks from now. Again, I’ll take that bet!

Finally, it has been about 100 years since there were over 4,000,000 people on the Australian continent. I’m unlikely to be around long enough to see whether there still will be more than 4,000,000 in 100 years time, but I’m a lot more than 50/50 confident that there will be. I will most likely be around in 10 years to see whether there are more than 4,000,000 people there in 11 years time. Gott’s formula says that the probability of that is around 0.9. I’m a little more optimistic than that, to say the least.

Anyway, enough mockery, here is the argument. Consider any two plays, A and B, that have been running for x and y weeks respectively, with x > y. And consider the following three events.

E1 = Play A is running

E2 = Play B is running

E3 = Plays A and B are both running

Note that E3 has been ongoing for y, just like E2. The Copernican formula tells us that at some time z in the future, the probabilities of these three events are

Pr(E1 at z) = x / (x + z)

Pr(E2 at z) = y / (y + z)

Pr(E3 at z) = y / (y + z)

Now let’s try and work out the conditional probability that A will still be running at z, given that B is running at z. That is, Pr(E1 at z | E2 at z). It is

Pr(E1 at z & E2 at z) / Pr(E2 at z)

= Pr(E3 at z) / Pr(E2 at z)

= (y / (y + z)) / (y / (y + z))

= 1

So using the Copernican formula, we can deduce that the conditional probability of A still running at z given that B is still running at z is 1. And that’s given only the information that z is in the future, and that A has been running at B. That is, to say the least, an absurd result. So I’m sure there is something deeply mistaken with the Copernican formula.

{ 1 trackback }

{ 41 comments }

jim 07.17.07 at 3:35 pm

You’re ignoring “if there’s nothing special about tsub(now).” In each case you mock and in the case you’re arguing, there is something special about tsub(now).

Brian Weatherson 07.17.07 at 3:42 pm

If I know something special about each of the three cases I mention, then I know something special in the salient sense about pretty much everything. So the formula is useless.

But I have no idea how I am supposed to know anything special in the generic case at the end. All I know is that the event E3 has been running for as long as the event E2, which follows from their definitions. Is knowing the definition of an event ‘special information’? In that case the formula really can’t be applied. I can’t apply it to show running times if I know what it is for a show to be running. This makes the formula utterly trivial. In that case it might not be false, but it is even more useless than I thought.

Kieran Healy 07.17.07 at 3:42 pm

This is at the opposite end of the argumentative spectrum where the opposite pole is defined by, e.g., Nick Bostrom’s ideas about Bayesian inference to the end of the word, right?

lemuel pitkin 07.17.07 at 3:43 pm

The interesting applications of this type of reasoning come with the expected survival time of the human species, no? For instance, a natural extension is to suppose that you are a randomly chosen individual from the set of all human beings who will ever live, from which you can (if you like) draw some fairly strong conclusions about the expected number of human beings in the future.

There’s some interesting overlap with Fermi’s paradox here as well.

Brian Weatherson 07.17.07 at 3:45 pm

I mean, I’ll sort of grant that this might be a special moment in the history of the iPhone. But I don’t at all see how that can be true in the case of the Australian continent having a population of more than 4,000,000. It has been happening for a while now – over 100 years in fact. Gott seems to want to apply the formula in cases that are much more like special cases than this.

Brian Weatherson 07.17.07 at 3:47 pm

Actually there is a connection between Bostrom and Gott’s reasoning. In both cases they rely pretty heavily on a principle of random selection, and draw very strong conclusions from it. Arguably some forms of Bostrom’s arguments rely on something like a Copernican principle as well. But all these arguments seem completely absurd to me – you can’t get important information, like the probability of extinction, from as little information as they provide.

lemuel pitkin 07.17.07 at 3:56 pm

So Brian, what do

youthink is the probability that climate change or some other catastrophe will reduce the population of Australia below 4 million within the next 100 years?dsquared 07.17.07 at 4:14 pm

the slip is clearly from “randomly” to “a random draw from a uniform distribution”, because this is the point at which Monton appears to have circumvented the normal licensing procedure for taking expectations.

In fact actually I don’t think that you

cantake a random draw from a uniform distribution without “knowing something special” in the sense Brian means in 2, because this would be equivalent to saying that you could pick out a specific point on a continuous line without any identifying characteristics at all. I can feel myself shading into my general war against the Axiom of Choice here though.Brian Weatherson 07.17.07 at 4:23 pm

Over the next 100 years I don’t have a firm probability. I’m sure it is more than 0.5. But I think the thing in cases like this is that we shouldn’t have numerical probabilities.

Probability is a guide to life. My state of ignorance about the future of the climate shouldn’t be taken by anyone (including me) as a precise guide to much of anything. Hence it shouldn’t determine a precise probability. But I think I know enough to be confident that a rich few (million) can survive and adapt to whatever we’re faced. I’d be stunned if life in Australia any time in this century is as tough as it was at the start of the last century.

Andrew Edwards 07.17.07 at 4:28 pm

Isn’t the issue with the Australia example that it is not a stable process, but a process showing exponential growth over time?

I always understood the “when will the world end” questions as based on a stable process which had a constant P(end of the world) on any given day. Or the Berlin wall classic example as having P(Berlin Wall coming down) that was the same on each given day (or week, or whatever observation period seems sensible).

It strikes me that the question “will the population grow or shrink in the next decade?”, which does not describe a stable process (rather it describes a roughly exponential growth) is a very different question, proabalistically, than, say, “will the weather be cloudy a month from now?”, which describes a random but stable process (i.e. the chances of cloud on a given day are roughly the same for any given day).

Does that make any sense to the people who understand stats better than I?

P.D. 07.17.07 at 4:29 pm

to #8: Continua and Choice are red herrings here, because we could just as easily specify that t-now is of finite length and then break up the interval between t-past and t-future into finite parts. If we treat t-now as the result of a uniform probability distribution on those parts, we are off to the races.

Walt 07.17.07 at 4:32 pm

That’s it, dsquared. You’ve gone to far this time. Against the Axiom of Choice? You’re going down.

"Q" the Enchanter 07.17.07 at 4:50 pm

I’ve never heard of the Copernican principle, so take this with a grain of salt. But it seems to me the principle would be meant to apply only in cases where we don’t have relevant independent probabilistic information (from genealogy, limiting market conditions, etc.) about the likely duration of E. Or?

lemuel pitkin 07.17.07 at 4:53 pm

Probability is a guide to life. My state of ignorance about the future of the climate shouldn’t be taken by anyone (including me) as a precise guide to much of anything. Hence it shouldn’t determine a precise probability.The whole non-ergodicity/fundamental uncertainty thing, right. I’m sure this is correct … but I’d like a better explanation exactly why.

Brian Weatherson 07.17.07 at 4:53 pm

“Q”: It certainly applies at most in those cases. I’d say we need something *considerably* stronger in fact. We don’t just need absence of independent info, we need information that the current moment is (whatever this could mean in context) randomly drawn from a uniform distribution. And that’s not something we’re allowed to assume by default, because in general it can’t possibly be true. (That’s what my two plays example is meant to show.)

Bloix 07.17.07 at 5:48 pm

This fellow’s thesis was the subject of a New Yorker article several years ago, which took it very seriously. It was ridiculous then and it was ridiculous now. As Brian shows, it gives absurd results. But more importantly, it NEVER has any useful predictive force.

Imagine a phenonemon whose life span was unknown: the Berlin Wall, say. We begin making monthly observations of the Wall from the time it was erected in August 1961. We make our calculation as to its life expectancy, which each month becomes longer as the Wall’s lifespan lengthens. On November 1, 1989, we make our observation and predict with 95% confidence that the Wall’s likely lifespan is between 8.1 months and 1053 years. On December 1 we return and – the Wall is down! Where did we go wrong?

Well, it is true that if we look at our data, we find that, except for the first three and the last seven months, our prediction did accurately bracket the life of the Wall. This is by definition. So Gott would say that his method worked. But it tells us nothing that we need to know. The Wall could have come down in any given month, or be standing still, and the calculations would be proven accurate to within 95% overall while always being incorrect as of the month prior to the fall of the Wall. So the method will always mislead us about changes in future events. If we rely on it, we will always be surprised when an event ends.

Note, by the way, that Gott games the system with his use of examples. Broadway shows, restaurants, and heads of government in parliamentary systems share a common characteristic: they are very unstable early in their existences, but gain stability for a long middle period. So if you apply his method to them, they appear to prove it. But if you apply his method to things that have a different lifetime trajectory – human beings, say, or American presidencies – they will not fit his model nearly as well.

Neel Krishnaswami 07.17.07 at 5:55 pm

12: No, it’s good. It’s only a short step to constructive mathematics for dsquared now!

I’m sure Brian Weatherson will help push; he wrote a really nice paper on intutionistic probability theory. (I’d be very interested in seeing these ideas extended to decision theory, and how they differ from traditional decision theory, as a matter of fact.)

Barry 07.17.07 at 6:00 pm

“Note, by the way, that Gott games the system with his use of examples. Broadway shows, restaurants, and heads of government in parliamentary systems share a common characteristic: they are very unstable early in their existences, but gain stability for a long middle period. So if you apply his method to them, they appear to prove it. But if you apply his method to things that have a different lifetime trajectory – human beings, say, or American presidencies – they will not fit his model nearly as well.”

Posted by Bloix

This sounds like an exponential distribution for the survival time (‘gain stability for a long middle period’). For the exponential distribution, an interesting characteristic is memorylessness – increasing observed survival times don’t affect the expected future survival time.

Walt 07.17.07 at 6:11 pm

Hitler was in favor of constructive mathematics. Advantage: me.

leederick 07.17.07 at 6:28 pm

“So using the Copernican formula, we can deduce that the conditional probability of A still running at z given that B is still running at z is 1… the conditional probability of A still running at z given that B is still running at z is 1. So I’m sure there is something deeply mistaken with the Copernican formula.”No there isn’t. The formula assumes that your position in time is a random sample during the lifespan of the object you observe.

The *only* situation in your counter example where this is true is when E1, E2 & E3 all start and end at the same time. In which case the conditional probability of A still running at z given that B is still running at z is 1 for obvious reasons. It’s a tautology – your application assumes they all start and end at the same time so it’s not really a surprise that A’s always running given B is.

engels 07.17.07 at 6:46 pm

Ah yes, but it’s a slippery slope. Axiom of Choice —> Tychonoff’s Theorem —> COMMUNISM!

Brett Bellmore 07.17.07 at 6:51 pm

It’s a valid, though not particularly useful, technique for deriving

somekind of estimate in cases where you know next to nothing about the situation. The problem is that people keep insisting on applying it to situations where the real problem is not a lack of information, but a lack of any good idea how to apply it.For instance, we have an absolutely enormous collection of information which is, presumably, relevant to the lifetime of the human race. Biology, geology, stellar mechanics, rates of technological progress, practically everything we know has some relevance.

We just don’t know how to do the calculation. That doesn’t mean the Copernican priciple is applicable.

Brian Weatherson 07.17.07 at 7:56 pm

Leedrick,

It’s true that you should only be able to derive something like my result in cases where we take it as given that E1 and E2 end at the same time. But that’s not what’s on the table. The principle is meant to give us probabilities of the duration of things of unknown length. If we only apply it when we know how long the event is, well then it will give plausible results, but won’t tell us anything we don’t already know.

gus 07.17.07 at 8:13 pm

I haven’t thought too much about it, but if the running of A and B are independent events, then Prob(E3) = Prob(E1) Prob(E2). I’m not sure what the mistake is in your calculation, maybe you cannot assume that the time of observation is a random variable and then use the same time for both A and B.

John Quiggin 07.17.07 at 8:38 pm

As Brian’s example illustrates, the Copernican principle is (in essence) a rediscovery of the principle of insufficient reason, and shares the same defects.

Dan Simon 07.17.07 at 9:14 pm

Suppose you’ve been imprisoned by the king and told that you will be executed some time in the future, and that the day of your execution will come as a surprise to you. The king knows you’re a good statistician, so you conclude that you’ll be executed on a day when your likelihood of dying that day is less than 5 percent. On what day will you be executed?

Neveu 07.17.07 at 10:11 pm

Matt Kuzma 07.17.07 at 10:12 pm

Whenever you make statistical calculations without knowing the underlying mechanics of thing you’re studying, you’ll get screwy results like these. I could sample random wavelenghts of light coming from the sun, assume they must follow a bell curve and conclude that our sun will run out of energy in 10 years. As it turns out, assuming Guassian distributions in black-body radiation lead to catastrophic (and wrong) conclusions.

Likewise, the Copernican principle applies to measurements of the event, not to the fact of its existence. To assume that the existence of everything is a stochastic process that flips from true to false with some fixed probability is, as you demonstrate ridiculous. It also throws out a lot of information about the event in question.

gus 07.17.07 at 10:13 pm

After looking at the Monton-Kierland paper, it appears that the Copernican formula can be derived by using a specific form of a prior probability function for the duration of a given process. But if E3 is E1 + E2, it cannot have the same prior probability of either E1 or E2.

The point, I think, is that one should only apply the formula for an event of which nothing else is known. It can be applied to E1 and E2 separately, but not to their combination.

leederick 07.17.07 at 11:39 pm

“

It’s true that you should only be able to derive something like my result in cases where we take it as given that E1 and E2 end at the same time.“The problem more fundamental. If we observe Plays A and B are both running – and we know Play A started before Play B – then we know our observation isn’t a random sample from within the life span of Play A. Because we can’t have observed the duration of time when Play A was running but Play B wasn’t. So we’ve no business using the formula to calculate Pr(E1 at z). The formula does not apply to E1.

“

The principle is meant to give us probabilities of the duration of things of unknown length. If we only apply it when we know how long the event is, well then it will give plausible results, but won’t tell us anything we don’t already know.“I don’t want to sound like I’m being too picky about something you took three seconds to type into a comment box, but I think that phrasing misses an important distinction. Your claim is different to Gott’s. There’s a difference between: (A) knowing how long an event is, and (B) knowing that you’ve a random sample from within the lifespan of the event.

B’s is significantly different to A. You can know you’ve chosen a random raffle ticket without knowing how many tickets are in the raffle. In this different context you can use Gott’s maths to infer the number of tickets. All we need to know is that we’ve randomly sampled them.

Obviously, that kicks the debate back to arguing for or against the validity of the random sampling assumption. I think John’s wrong that Gott’s suggestion is just the principle of insufficient reason. I think Gott would argue that in some cases you can have positive reason to believe that your observation is a random sample – from knowledge of the nature of what your observing and the circumstance of the observation – and this justifies applying the formula.

John Quiggin 07.18.07 at 6:45 am

“I think Gott would argue that in some cases you can have positive reason to believe that your observation is a random sample – from knowledge of the nature of what your observing and the circumstance of the observation – and this justifies applying the formula.”

Similarly, there are cases when you can be confident that the partition to which you are applying the principle of insufficient reason is symmetric and hence the principle is valid. The problem is to specify the criterion.

gus 07.18.07 at 7:03 am

leederick :

I also thought at first that the time of observation could not be random for both A and B, but this is not so as long as the time was not chosen to be in the lifetime of A and B, it just happened to be so. Otherwise, whenever I observed an event E, someone else could argue that the time was not random, simply by observing another event that started after E. Imagine that the observations of E1 and E2 were done by two different persons, each one not knowing anything about the other play. Then the first person, observing E1, would apply the Copernican formula; whether he can apply it or not cannot depend on whether some other person is observing another event.

Besides, even if you leave out E1, you are still left with the problem that Pr(E2 at z)=Pr(E3 at z).

J Thomas 07.18.07 at 7:45 am

Gus got it right. Your example is bad because you start by assuming that E1 and E2 are independent uniform distributions with unknown interval, and then you assume that E3 is a uniform distribution with unknown interval that’s independent of E1 and E2. But E3 does depend on E1 and E2. If E1 and E2 are independent (they might not be, there could be a recession and nobody has ticket money. There could be a terrorist threat directed at shows. Etc) then the E3= E1E2 . This is not a uniform distribution.

Your mockeries are similarly bad. WWW is a communication protocol. If it started when 2 users used it, and it will end when the last 2 users quit, that isn’t nearly uniform. When there are 4 million users, the chance they’ll all quit in a short period is much less than when there are 4 users. Similarly with australians and iPods.

But I agree with your criticism of the method. When you don’t have much information to work with, you get predictions that aren’t very useful. The main thing they might possibly be good for is as a baseline for additional work. “This is our Bayesian prior. Now when we add *this* information, how does that change it?”

Very often the assumption of uniformity is a bad assumption. If you assume there’s a single interval and you’re trying to find the length of that interval in the lack of other informtion then it makes sense. But if you’re sampling from a distribution of intervals that’s exponentially distributed, then you’re more likely to sample from close to the beginning.

Say you have 1000 japanese cars and each of them have a lifetime of precisely 150,000 miles, but their odometers are broken. You have reason to think that their mileage is all the same within 3000 miles, and they’re uniformly distributed within that interval. Then you run them all the same distance. How long will they last? You can get a pretty good estimate when the first one quits. You can expect the last one to quit within 3000 miles or so. Contrived? Yes. To make it useful you need a sample size larger than 1. But you need a fixed interval or the distribution of interval sizes will matter.

When you argue from ignorance you have to be careful about your assumptions about just what it is you’re ignorant of.

Brian Weatherson 07.18.07 at 2:24 pm

If it’s wrong, for reasons of excessive knowledge, to apply the principle to E3, then I can’t see why it would be OK to apply the principle to E1 and E2. After all, we know something about E1 and E2 as well – namely that they are entailed by E3.

In any case, I certainly wasn’t *assuming* that the events were independent, or that either distribution was uniform. I was just applying the principle given; a principle I think is utterly crazy. The principle says that in the absence of information to the contrary, we can assume we are dealing with uniform distributions here.

Now either knowing the logical relations between E1, E2 and E3 counts as information to the contrary, or it doesn’t. If it does, then we can never apply the principle, because we always know that two events collectively entail, and individually are entailed by, their conjunction. So the principle is vacuous. If it doesn’t, then the principle leads to an absurd result. So we get the conclusion that it seems to me we always get with these principles saying we can derive something from ignorance, either vacuity or absurdity.

abb1 07.18.07 at 4:22 pm

…the probability of the entity in question lasting longer into the future than its current life-span is 1/2.I don’t remember ever learning about this principle, but what it sounds like is simply this: if something we don’t know much about has been going on for a while, chances are it’s going to go on for a while longer.

And if that’s what it’s saying, then perhaps the phrase I quoted needs to be modified to say “the probability … is

at least1/2″. Which means that the guy who suggested this principle likes all the same odds you do.leederick 07.18.07 at 6:08 pm

The mistake is so simple I don’t know how we’ve managed to made such a meal of it.

Gott’s method is to make a frequentist statement about

tfuturebased on the long run properties of a random sample.What was the first thing any of us were taught about frequentist statistics? You determine what hypothesis you’re going to test, then you look at your data. The example supposes we’ve looked at the data, seen that both plays are still running, then decided to do the calculations for E1, E2 and E3.

This means the results for E1 & E2 are wrong. We wouldn’t be worrying about this example had Play B not yet started, or the play’s not overlapped, or had Play A ended. So the probability statements about E1 & E2 can’t be justified based on the long run properties of random samples that Gott uses to contruct his formula. Our running these tests have been influenced by our looking at the data.

I think E3 is the only circumstance in your example where it’s sound to apply the principle. You can’t apply it to E1 or E2.

gus 07.19.07 at 10:10 am

# 34:

I don’t know why I am taking it upon myself to defend this principle. I am in no way committed to it, but neither seems to me to be so crazy. We don’t derive anything more than we put in, that is the assumption of the a priori distribution (I don’t know why some call it uniform distribution; the assumption is that the a priori probability that the total duration is T is proportional to 1/T ). In most cases the assunption is inappropriate, because most events we want to consider are very complex and arise from thousands of different factors, so even if the assumption holds for each of the factors, it will be very far off for the total event. One would be better off using a distribution that is more stable, like a Gaussian; but then one needs to put in more knowledge, namely the variance of the distribution. If one is unwilling to put in any information whatsoever, then the 1/T law is the simplest guess, and it may even be the unique possibility. It may not be very useful, I totally agree, but its application shouldn’t lead to contradictions.

What does this mean? In your example, the relationship between E1 and E3 is by no means symmetrical. E3 implies E1, the other way around is not true. So it is not a contradiction that you could use the formula for E1 but not for E3.

J Thomas 07.19.07 at 11:37 am

I didn’t like the counterexamples, but I don’t like the original reasoning either. I’ll repeat it first in case I misunderstood.

Choose that has a distinct beginning and a distinct end. Note the beginning. Then at some random time between the beginning and end, note that it has not ended yet. Since we assume the time was random, we can suppose that it’s a sample from a uniform distribution. There’s 1/2 chance it is in the first half and 1/2 chance it’s in the back half, so that’s a 50% chance the end will come in less time than has already passed, and a 50% chance it will last longer than it’s already lasted. By the same reasoning there’s only a 5% chance that it will end in less than 1/19 the time that’s already passed, and a 5% chance its total life will be 20 times or more what it’s already survived.

This bothers me somehow. If it’s right I should be able to get the same result with a different method. So — If the interval is some unknown N, and I assume nothing about N except that it’s larger than my observation A, what is the maximum entropy distribution? N is in the interval (A, infinity) and I’ve assumed nothing else about it, so I must suppose that every value in that interval is equally likely. So any value I choose for the mean is too small. I get a different result.

Let me try again. In a finite universe there can be a lot more material things that last a short time than that last a long time. The longer something lasts the longer it takes up mass and space that something else could use. There’s only room for so many long-lived stars. There’s room for a lot of short-lived tritium atoms. So maybe I shouldn’t assume that every lifetime is just as likely.

If we make the assumption that there is some unknown mean duration M, and nothing else, then the maximum entropy distribution is an exponential. And if we remove all the cases where x

J Thomas 07.19.07 at 11:40 am

Oops! Remove all the cases where x is less than A then we still get an exponential with mean A+M and median A+M ln 2. A gives no information about M in this case.

It should be possible to work backward from Gott’s conclusions to see what assumption he makes about the distribution of N. I haven’t done that.

gus 07.19.07 at 12:01 pm

j thomas: it is worked out in the paper of Monton-Kierland cited in the original post: Gott’s conclusion follows if the assumption on the distribution is that Pr(N) is proportional to 1/N .

You are correct in saying that without this assumption, you can’t conclude anything.

Tom Bozzo 07.19.07 at 5:32 pm

Re #39 and #41. Maybe a clarification — N is fixed but unknown. In fact, it’s vitally important that N is fixed, and the observation time is random (and uniformly distributed over [0,N].

Re #38 (and #34), what seems to me to be going on (and I’d welcome correction if I’m wrong, ’cause it’s been bugging me) is that applied to E3, Gott’s method throws away any info on ‘x’,

andimplies joint distribution assumptions (i.e., dependence or perfect correlation) re A and B that aren’t absurd as such but which are facially inconsistent with the supposed lack of information on the processes.As for trying to make hay from the direction of the method’s probabilities for E1 and E2, to decide on how to make use of Pr(E1 at z) and Pr(E2 at z), it’s necessary to make some assumption on the joint distribution, and a “reasonable” assumption probably won’t lead to the Gott’s Pr(E3 at z).

Comments on this entry are closed.