The question of disciplinary boundaries seems to be coming up a lot lately, and Brian’s post on Gott’s Copernican principle provides yet another instance. Gott, an astrophysicist, is interested in the question of whether you can infer the future duration of a process from its present age, and this issue seems to received some discussion in philosophy journals.

It may be beneath the notice of these lofty souls, but statisticians and social scientists have actually spent a fair bit of time worrying about this question of survival analysis (also called duration analysis). For example, my labour economist colleagues at ANU were very interested in the question of how to infer the length of unemployment spells, based on observations of how long currently unemployed people had actually been unemployed. The same question arises in all sorts of contexts (crime and recidivism, working life of equipment, individual life expectancy and so on). Often, the data available is a set of incomplete durations, and you need to work out the implied survival pattern.

Given a suitably large sample (for example, the set of observations of Broadway plays, claimed as a successful application of Gott’s principle) this is a tricky technical problem, and requires some assumptions about entry rates, but raises no fundamental logical difficulties. The problem is to find a distribution that fits the data reasonably well and estimate its parameters. I don’t imagine anyone doing serious work in this field would be much impressed by Gott’s apparent belief that imposing a uniform distribution for each observation is a good way to go.

Of course, social scientists tend not to like working with a sample size of one, so the Copernicans have a bit more room to move in unique cases. Still, if you are willing to assume a functional form for your probability distribution, and there’s only one free parameter, you can calculate a maximum likelihood estimate from one data point. The arbitrary choice you make determines the confidence interval.

In Bayesian terms, picking the ML estimator is (broadly speaking) the equivalent of assuming a diffuse prior. The big problem in the Copernican approach is this assumption, which is, in effect, that you have no relevant information at all, except for your single sample observation. If the problem is of any interest at all, this assumption is almost certain to be wrong. Take the example of the likely duration of the space program. We can, at the very least, observe that NASA and its competitors have missions scheduled for years ahead, which makes very short durations much more unlikely than those derived from a uniform distribution (Brian’s examples also made this point).

The real lesson from Bayesian inference is that, with little or no sample data, even limited prior information will have a big influence on the posterior distribution. That is, if you are dealing with the kinds of cases Gott is talking about, you’re better off thinking about the problem than relying on an almost valueless statistical inference.

{ 26 comments }

leederick 07.19.07 at 7:55 am

“

It may be beneath the notice of these lofty souls, but statisticians and social scientists have actually spent a fair bit of time worrying about this question of survival analysis (also called duration analysis).”Is survival analysis what Gott’s doing? I thought in survival analysis you modeled time of death as a random. Gott doesn’t. He views the lifespan as deterministic – it has a fixed start point and fixed but unknown end point – and views our observation as random.

John Quiggin 07.19.07 at 8:51 am

If you’re going to adopt a brand-new model that gives strange conclusions, shouldn’t you at least explain why you don’t do what everyone else who’s looked at this kind of problem has done? How does this differ from Ross McKitrick (an economist) deciding that there is no such thing as global temperature and inventing his own temperature scale to prove it?

Katherine 07.19.07 at 9:26 am

Well, the philosophers might be looking at from a different perspective to the social scientists and statisticians, so I wouldn’t take it personally.

After all, the philosophers will tell you that you can’t guarantee the rising of the sun tomorrow merely on the evidence of previous days, so it’s unlikely they’d be willing to say, from a philosophical point of view, that someone will be umemployed tomorrow because they were unemployed for X days previously.

abb1 07.19.07 at 9:38 am

Well, one philosopher said this many years ago:

Barry 07.19.07 at 11:30 am

Yeah, but he never got tenure.

bill in turkey 07.19.07 at 11:33 am

Or at least, he hasn’t got tenure *yet*

Barry 07.19.07 at 1:14 pm

No, you can’t get in that way – almost everybody on Earth hasn’t gotten tenure *yet*.

Timothy Burke 07.19.07 at 1:52 pm

Almost everybody on Earth needs to publish more.

marcel 07.19.07 at 2:01 pm

Or perish more. (re #8)

Michael Mouse 07.19.07 at 3:56 pm

Like Zhou Enlai, I say it’s too early to tell.

rb 07.19.07 at 5:38 pm

In Bayesian terms, picking the ML estimator is (broadly speaking) the equivalent of assuming a diffuse prior. The big problem in the Copernican approach is this assumption, which is, in effect, that you have no relevant information at all, except for your single sample observation.I would disagree here. The ‘diffuse’ or flat prior is in fact a statement of knowledge; it does not express pure ignorance mathematically.

(There is a semantic counter-argument here based on the Bayesian notion of probability as ‘personal uncertainty’, but that’s another conversation.)

So the flat prior

isthe flaw, but it’s not a flaw because ignorance is unlikely (we may stipulate that total ignorance is unlikely), but rather because total ignorance about a random variable has no probabilistic expression.Brian Weatherson 07.19.07 at 6:19 pm

So the flat prior is the flaw, but it’s not a flaw because ignorance is unlikely (we may stipulate that total ignorance is unlikely), but rather because total ignorance about a random variable has no probabilistic expression.I think this is the heart of the matter, and totally correct. It’s fun (in a geeky way) to try and prove this fact using the various paradoxes of indifference. That’s especially true when, as in my prior post, you have to make up a new paradox. But it should have been clear all along.

Saying the probability of something is 0.5 is not the same as saying you have no idea whether it will happen; it is saying that there are reasons to take its happening to be as likely as its not happening. That’s not something that an ignorant person can do.

rb 07.19.07 at 6:57 pm

I see that in comments on Tierney’s blog, Steve Goodman posts an LTE he had published in

Naturelo these many years ago, making the same point more eloquently and placing it in appropriate historical context.This diverting discussion aside, it is interesting to think about the ramifications of the fallacy of indifference for Bayesian reasoning in general. To me, the flat prior seems the statistical equivalent of ‘fair and balanced.’ Even-handedness for its own sake cannot solve the problem of ignorance.

leederick 07.19.07 at 8:41 pm

“The big problem in the Copernican approach is this assumption, which is, in effect, that you have no relevant information at all, except for your single sample observation.”It’s a bit harsh to dump that all on the Copernicans.

They may take a grab sample and pretend they’ve a random sample from an object’s lifespan. But you’re not telling me economists have never taken a grab sample of unemployed people and pretended they’ve got a random sample from the population of all unemployed people, and have no other relevant information at all.

Everyone does it. It may be a sin, but it’s not unique to their approach.

rb 07.19.07 at 9:09 pm

They may take a grab sample and pretend they’ve a random sample from an object’s lifespan. But you’re not telling me economists have never taken a grab sample of unemployed people and pretended they’ve got a random sample from the population of all unemployed people, and have no other relevant information at all.Actually the problem isn’t lack of randomness.

The issue is that the Copernican method assumes that the distribution from whence their “sample” (a single datum) has been drawn is known. This misspecifies their avowed state (ignorance) by substituting a statement of very specific knowledge – namely, that all observation times are equally likely.

By contrast, economists studying unemployment presumably get a bunch of data points, and then, you know, plot the empirical density and stuff like that. (Of course this says nothing about whether their data constitutes a random – or more importantly, a representative – sample of the population they are trying to study, but that is beside the point.)

abb1 07.19.07 at 9:21 pm

Dr. Goodman’s comment says:

Yeah, it seems like a stupid prediction; obviously the probability of a head on the next toss is 1/2.

Or is it? Finding a head on the first toss certainly gives you some information: you now know that this is not a fake coin with two tails (though it still could be a fake coin with two heads, right?) So, it seems to me that the probability of a head on the next toss is now greater than 1/2, and it’s not the case of total ignorance anymore.

Now, suppose we keep tossing the coin and we find a head every time, say 1000 times in a row. Now we are pretty sure this is a trick coin, right? Finding a head on the next toss seems more and more likely, no?

And isn’t it the same thing with a show: if you know it’s been running for years, it does tell you something about the show, about the quality of it, so this is, again, not the case of total ignorance.

rb 07.19.07 at 11:21 pm

so this is, again, not the case of total ignorance.But the Copernicans

saythey are in a state of total ignorance. They are endeavoring to represent a state of total ignorance with a uniform probability distribution, which is manifestlynota mathematical statement of ignorance.The point about the coin is that, if you don’t know whether it’s a fair coin or not, it’s not appropriate to START with the assumption that it IS a fair coin (in fact, there are many MORE ways it could be unfair than be fair!).

The 2/3 probability of heads following one heads is predicated on assuming the probability of heads 1/2 in the first place (which the Copernicans mistakenly believe is equivalent to assuming no knowledge at all).

On the contrary, assuming 50% chance of heads to begin with is to make a very strong statement of knowledge. But: I could just as soon assume the probability of tails is 95%, in which case after seeing one heads and applying Bayes theorem, the posterior probability of heads would NOT be greater than 50%.

So in fact it’s the Copernican approach, and not mine, that is hamstrung by a preconceived notion of a fair coin.

John Quiggin 07.20.07 at 12:18 am

I agree that the Copernicans are invoking the Principle of Indifference/Insufficient Reason (I mentioned this in the comments thread on Brian’s post). And it’s also true that it’s difficult (or maybe impossible) to represent complete ignorance in a Bayesian formulation.

But this is getting away from my main point which is that it’s silly to assume complete ignorance about problems where we do in fact have some relevant information.

leederick 07.20.07 at 1:17 am

Guys – Have you forgotten what a Copernican is?

Copernicans do not think they are in a state of total ignorance. They’re not trying to represent this ignorance by presenting a uniform probability distribution as a mathematical statement of ignorance.

Copernicans believe they

do notoccupy a specially favoured position. That’s not a statement of ignorance. It rules out any non-uniform distribution as a representation of their position, because such as distribution would mean they would be likely to find themselves in a special position.They’re not trying to use a uniform distribution on the basis of ignorance, they’re trying to use it after making a knowledge claim which rules out every other distribution.

rb 07.20.07 at 3:18 am

But this is getting away from my main point which is that it’s silly to assume complete ignorance about problems where we do in fact have some relevant information.Agreed, of course. Didn’t mean to hijack the discussion.

But now that I have …

It rules out any non-uniform distribution as a representation of their position, because such as distribution would mean they would be likely to find themselves in a special position.This strikes me as nonsensical. Why should the uniform prior be any less “special” than any other? Perhaps we need a metric of “specialness,” but certainly the universe of distribtions that assigns uneven probabilities to points in the parameter space seems a lot less “special” than the one, unique situation that doesn’t do so.

rb 07.20.07 at 3:28 am

Or another way of saying it is that the statement “there is ‘nothing special’ about any of the possible points in time at which an event may happen” does not have the same implications as the statement “all event times are equally likely.” The latter defines a unique prior probability function over the possible event times, the former does not, and equating the two situations is an error.

John Quiggin 07.20.07 at 3:51 am

OK, since we’re off track, can I point out that there are well-developed multiple priors models that aim to address the problems discussed here. My current work is focused on how to think about problems where we can’t conceive of all relevant states of nature.

abb1 07.20.07 at 6:59 am

All right, this morning I feel that the conundrum here has something to do with the difference between the physical world (e.g. astronomy, sociology) and a purely abstract universe of logicians and statisticians.

In the real physical world, when you know that something has been going on for a long time, you assume that there is a good reason for it and thus it’s likely to keep going for a while: if you’ve been unemployed for 6 months, something must be wrong with you and you’re likely to stay unemployed at least of a while longer.

In a purely abstract world, OTOH, suppose we have a board with horizontal intervals of different lengths drawn randomly on it. You throw a dart and you hit one of the intervals. They tell you that it’s 10 centimeters from the left boundary to your dart – what can you say about the whole length? Why, absolutely nothing, of course.

abb1 07.20.07 at 7:04 am

Wait, clearly you can say something about the

wholelength (greater or equal 10), but you can’t say anything about the length to the right of your dart.Alex 07.20.07 at 12:34 pm

Abb1, I think you’ve put the dart diametrically in the centre of a circle roughly 60cms across.

The problem with all this stuff is that it’s not science, or anything much in terms of intellectual rigour – it just looks-and-feels like it. All they have discovered is that in the absence of data your answers depend purely on your assumptions.

It’s as to science as rugby is to ancient Greek warfare; toned down from the real thing to an enjoyable row.

PS, we could have finished the job much earlier by invoking one of the standard rules for detecting pseudoscience; if someone compares themselves to Copernicus or Galileo, they are probably full of bullshit.

J Thomas 07.22.07 at 4:16 pm

Look at Abb1’s example with Stat101 eyes.

If you had three darts and they measure 10 cm, 9 cm, and 14 cm, what can you say about the length of the board? Your best estimate for the length is 22 cm, and with 3 samples you can get a vague estimate of the variance in your results that gives you a small sense of how reliable it is.

If you had two darts and they measure 10 cm and 14 cm, then your best estimate for the length is 24 cm and with 2 samples you get an even vaguer estimate for the variance.

With one dart at 10 cm your best estimate for the length is 20 cm. You get no estimate for the variance at all, but this is still your best estimate.

Now, let’s ignore the simple-minded idea that N is fixed but unknown and suppose it comes from some distribution. And what we know about this distribution is that summed over the whole distribution, if you have one sample M that is not a success, the probability of success by 2M is 1/2. With fixed M, that gives us a Pareto distribution.

P(N) = 1-(M/N)^k where N>=M and k=1

When the things that we want to estimate durations for happen to fit this distribution, then the copernican estimate ought to be the best.

As to how often we will be sampling N from this distribution, perhaps someone else may speak.

(Disclaimer: I haven’t done this carefully enough to fully back it, but I’m not likely to check it carefully at this point and I’d rather post it than forget it.)

Comments on this entry are closed.