False positives

by Chris Bertram on September 27, 2003

Via the “very interesting blog of Dr Anthony Cox”:http://www.blacktriangle.org/ , I see that Gerd Gigerenzer has “a paper on risk”:http://bmj.bmjjournals.com/cgi/content/full/327/7417/741 in the British Medical Journal. Doctors, it seems, are alarmingly ignorant about statistics:

bq. The science fiction writer H G Wells predicted that in modern technological societies statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. How far have we got, a hundred or so years later? A glance at the literature shows a shocking lack of statistical understanding of the outcomes of modern technologies, from standard screening tests for HIV infection to DNA evidence. For instance, doctors with an average of 14 years of professional experience were asked to imagine using the Haemoccult test to screen for colorectal cancer. The prevalence of cancer was 0.3%, the sensitivity of the test was 50%, and the false positive rate was 3%. The doctors were asked: what is the probability that someone who tests positive actually has colorectal cancer? The correct answer is about 5%. However, the doctors’ answers ranged from 1% to 99%, with about half of them estimating the probability as 50% (the sensitivity) or 47% (sensitivity minus false positive rate). If patients knew about this degree of variability and statistical innumeracy they would be justly alarmed.

{ 29 comments }

1

Keith M Ellis 09.27.03 at 1:26 pm

“…prevalence of cancer was 0.3%…”

Ah, but how do we know that the prevalence of cancer is 0.3%?

This reminds me of the dialog between Brad DeLong and Eugene Volokh about whether law professors understand basic probability or not.

Here’s the link:

http://www.j-bradford-delong.net/movable_type/2003_archives/001430.html

The example was of the bank robber, a witness reporting that he got into a black, not yellow, taxi; a dispatcher following up with a report of how often this witness (okay, this is weird) gets the color of a taxi correct (80% of the time); and a supervisor who then intercedes with the information that 90% of all taxis in the city are yellow.

I had a lively discussion with a friend about this. I really, really want to agree with DeLong’s answer, but I stumble over the fact that the reasoning involved in it leads to a slippery-slope objection of “why stop with the ratio of black to yellow taxis? Why not ask about the ration of non-taxis to taxis? Pedestrians to cars?” Ad infinitum.

2

Jonathan Goldberg 09.27.03 at 2:56 pm

Keith:

Stop injecting so much reality into a hypothetical. The idea of the test is to see whether doctors (lawyers) can arrive at the correct answer in a nice alternate world in which the voice of God has told us what the prevalence of cancer (the ratio of black to yellow taxis) is. The theory is that if the subjects can’t handle this alternate world, their results in the messy real one will not be any better, and probably worse. It seems like a good theory to me.

3

Zizka 09.27.03 at 5:25 pm

The specific problem could easily be solved with a one line explanation entitled “Significance of hemocult results” on the hemocult box, but that would be very bad marketing.

4

Ian 09.27.03 at 5:51 pm

I seem to remember the BMA published a book on risk about 20 years ago. At the time I thought it was aimed at the general public…

5

Evan Allen 09.27.03 at 5:55 pm

One problem is the use of the term “False positive rate”.

Many people incorrectly view that as a rate of tests conducted, versus viewing it as a population based number.

For the example above, a false positive rate means that 3% of people tested will falsely test positive regardless of disease prevalence in that population.

What people often think when seeing a false positive rate is that of the total number of positive tests, only 3% are falsely positive.

Confusing terminology leads to confusion.

Simply changing terms from false positive and false negative to something else won’t solve the problem. The numbers themselves must be reported legitimately.

The same thing goes for absolute versus relative risk.

Relative risks are always larger numbers than absolute risks, so you are hard-pressed to find any major medical studies that use absolute risks, but the absolute risk number is the real key to determining the individual value of a given intervention or test with a given patient.

6

jam 09.27.03 at 6:52 pm

You know what the prevalence of colo-rectal cancer is from biopsy diagnoses before the introduction of the test. Introducing the test will not have changed the underlying behaviour of the system.

My guess is that doctors don’t even know what the terms mean. That’s the only way one can account for estimates equalling the sensitivity or the sensitivity minus the false positive rate. If I don’t know what they mean and I still want to answer the question, my logic would look something like: it’s clear you can’t multiply or divide percentages, because then you wouldn’t end up with a percentage. So I’m going to add or subtract them. False looks like it ought to be subtracted. Voila! This is the sort of thing you’re taught to do in SAT prep classes. In order to become a doctor, you have to take (and do well on) a lot of standardized tests. Whether this is a good or bad thing is another question.

7

adoherty 09.27.03 at 7:17 pm

Can anyone set out the math for this numerically challenged reader? and tell me what “sensitivity” means? Thanks!

8

Keith M Ellis 09.27.03 at 7:36 pm

“Stop injecting so much reality into a hypothetical.”

Isn’t that a political slogan?

But….point taken.

9

Andy Peters 09.27.03 at 8:28 pm

adoherty:

It goes like this:

If you take the Haemocult test, you have a 3% probability of getting a positive result if you don’t have cancer (the false positive rate). You also (assuming the people taking the test are a random draw from the population) have a 0.3% chance of having cancer — but if you do, you only have a 50% chance of getting a positive result (the sensitivity).

So, of the people who take the test, 3% + (0.3% * 50%), or 3.15%, will get positive results. Of that 3.15%, 0.15/3.15, or 4.8%, will actually have cancer.

10

Andy Peters 09.27.03 at 8:33 pm

Note also that, expanding on my parenthetical above, if there’s a screening process before people are given the Haemocult test (i.e., if the test is only given to people who have already been identified as possibly having cancer), then the probability that a positive test result is a real positive is higher.

11

Anthony 09.27.03 at 8:37 pm

Thanks for the link Chris. I feel obliged to mention I have not yet obtained my Phd. Still working on it part-time.

On the subject of sensitivity, a test can have four outcomes:

If you actually have a disease:

positive (true positive)
negative (false negative)

So to take breast cancer as an example, the sensitivity of mammography is the proportion of women who test positive among those who have breast cancer.

Gigerenzer uses mammography as an example in his book,Reckoning with Risk. It does not seem to be in print in the US, hence the link to Amazon.co.uk.

The mammography example he uses is as follows.

You have the following information about a population of 40 to 50 year old women:

The probability that one of these women has breast cancer is 0.8 percent. If a woman has breast cancer, the probability is 90 percent that she will have a positive mammogram. If a woman does not have breast cancer, the probability is 7 percent that she will still have a positive mammogram. Imagine a woman who has a positive test. What is the probability that she actually has breast cancer?

When expressed like this most physicians in tests gave a probability of breast cancer in the region of 70-85%. The actual figure is 9 percent.

Gigerenzer then presented the data in terms of natural frequencies, rather than probabilities.

Eight out of every 1000 women will have breast cancer. Of these 8 women with breast cancer 7 will have a positive mammogram. Of the remaining 992 women who don’t have breast cancer, some 70 will still have a positive mammogram. Imagine a sample of women who have positive mammograms in screening. How many of these women actually have cancer?

Presented like this the majority of physicians got close to the correct answer. (e.g. only 7 of the 77 with a positive test have cancer i.e. 9%).

Gigerenzer’s book should be required reading for everybody these days.

12

Chris 09.27.03 at 8:37 pm

For adoherty:

“Sensitivity” is the probability that if a person has the disease, the test will pick it up.

Take a population of 20,000

If the prevalence is 0.3%, 60 will be suffering from colorectal cancer, but the test will only pick up half of those = 30.

With the false positive rate at 3%, of the 19,940 people who don’t have the disease 598.2 people will nevertheless get a positive test.

We can divide the population according to 4 possible outcomes (some rounding here):

                 Disease Present    Disease absent

Positive test        30                  598
Negative test        30                 19342

Of those who tested positive 30+598=628

only 30 have the disease

30/628 = 0.048 = 4.8%

13

Matt McIrvin 09.27.03 at 10:53 pm

Another way to get it is with Bayes’ Theorem about conditional probability.

The prevalence number means that the prior probability of somebody having cancer, given no information about the test, is 0.003.

Now, the prior probability of a positive test result is

(0.003 * 0.5 + 0.997 * 0.03)

which is the probability of a true positive plus the probability of a false positive.

Given new evidence, in this case a positive result, Bayes’ Theorem says that the new probability that the patient has cancer is calculated from the prior by multiplying by the probability of a positive result if the patient does have cancer (0.5), then dividing by the prior probability of a positive result in all cases. That is

P = 0.003 * (0.5) / (0.003 * 0.5 + 0.997 * 0.03) = 0.048

or 4.8%.

This is equivalent to the other methods above, but it’s the way that I remember it, because Bayes’ Theorem was always written in the old Particle Physics Handbook. It’s a handy formula to know.

14

PG 09.28.03 at 3:47 am

Doctors can’t figure out where their political interests lie; why should they be able to do math?

15

Anarch 09.28.03 at 6:54 am

Does it say how long the doctors were given to answer the question? I’ve been under the weather today, but even as a math grad student I wasn’t able to answer that without some kind of back-of-an-envelope calculation.

Mind you, those doctors *should* be able to estimate that figure off the top of their heads, but the results would be even more damning if they were given a significant amount of time to think about it.

16

Harry 09.28.03 at 2:44 pm

This is unteresting, but not at all surprising, at least if you’ve tried to get useful information about probabilities from doctors. My two personal experiences — discussing the side-effects of a life-long every-day medication; and trying to get advice about whether to schedule a second Caesarian (for my spouse, obviously, who is small, and cursed with having unnaturally large babies) — started out with me believeing they were evasive, but left with me believing that they didn’t understand the questions I was asking. Why expect people who have no training in a highly technical area to know anything about it?

17

Keith M Ellis 09.28.03 at 3:10 pm

“Why expect people who have no training in a highly technical area to know anything about it?” Well, the, uh, irony, is that so much of medicine is all about statitistics and probabilities. Really, physicians should have a very strong grasp of these concepts as well as a strong technical facility.

18

Harry 09.28.03 at 5:06 pm

Sorry to have written so unguardedly Keith — I was being ironic myself. It was what I came to think after these extremely frustrating experiences. The point is that neither medical schools nor the HMOs that employ doctors (or the NHS for that matter) think of doctors as needing to understand these technicalities. I’m not defening them, just observing.

19

Bill 09.28.03 at 6:12 pm

You can also use an “odds” method, where the odds of having the disease equals the probability of having the disease divided by the probability of not having the disease.

Then, the odds of having the disease after seeing the positive test result equals

(odds before test)*(true positive rate/false positive rate)

This isn’t particularly obvious, but you can prove it using the formulas given above, and it quickly gives an answer.

20

pathos 09.28.03 at 8:01 pm

The true philosophical issue though, is what if the person has the disease, the test is not sensitive enough to detect it, but nonetheless gives a “false positive” for unrelated reasons that correctly diagnoses the cancer.

Was the test accurate or not?

21

dsquared 09.28.03 at 8:48 pm

I think I’ll stand up for the doctors here, and also make the same Hayekian comment I make every time one of these stories come round which puports to prove that professionals don’t know anything about their field of expertise. The maths of the matter was stated succinctly above:

So, of the people who take the test, 3% + (0.3% * 50%), or 3.15%, will get positive results. Of that 3.15%, 0.15/3.15, or 4.8%, will actually have cancer.

But note that this is only true if you’re going to take “the people who take the test” as referring to “the population as defined by frequentist probability theory”[1]. In the mind of a doctor, this phrase is much more likely to have the referent “the kind of people who you test for colorectal cancer”. This means that the assumption of non-informative priors made above isn’t valid. I’d say that the doctors are using a rule of thumb (or working from tacit knowledge of colorectal cancer), and that you’d need to do a lot more testing on whether it was a valid rule of thumb before you started calling them ignorant.

On the other hand, back in the days when I was taking an interest in the debate over the harmfulness of the MMR vaccine, I heard enough outright ludicrous statistical arguments coming from members of the medical profession that the underlying accusation is certainly not without grounds.

[1] Of course, a mathematical purist looking at this example would excoriate the lot of us for our ignorance and point out that the only correct answer to the question “what is the probability that someone who tests positive actually has colorectal cancer?” is “1 if they have colorectal cancer and zero if they don’t”,because probability isn’t defined over single events in conventional probability theory.

22

Matt McIrvin 09.29.03 at 5:26 am

When I described this question to my wife, she said the same thing dsquared did: if the doctor is testing the person for the cancer in the first place, then the doctor probably saw some additional indication that the patient might have cancer. So a good prior really ought to take that evidence into account as well; it would therefore be higher than the prevalence in the general population. It’s possible that the doctors were thinking about that when they answered the question, instead of making the assumption that the test is being given to a random person with no independent indications of cancer.

23

Ian 09.29.03 at 10:58 am

So how does this argument affect the probabilities when applied to screening tests such as mammograms?

Also – if a doctor is offering an estimate based on some intuitive judgement about ‘priors’ surely to do so they need to have some statistical data about the population, if they believe the sample isn’t drawn form a truly random population?

24

dsquared 09.29.03 at 2:42 pm

Call me a master of the bleedin’ obvious, but the question here wasn’t asked about mammograms. It was asked in a particular way, with particular framing effects (“imagine giving someone the Haemoccult test”). Notoriously, trained statisticians get statistics problems wrong if you set the problem up with the right frame.

And secondly:

Also – if a doctor is offering an estimate based on some intuitive judgement about ‘priors’ surely to do so they need to have some statistical data about the population, if they believe the sample isn’t drawn form a truly random population?

Not at all, IMO. This is a point of controversy in the Bayesian/Classical statistics debate, and it’s one that interests me greatly. A prior can be informative without being based on statistical data about the population (indeed, a Bayesian purist would argue that if it’s based on statistical data, it isn’t a prior). The relationship of informative Bayesian priors to Hayekian tacit knowledge is a fertile ground for research …

25

Martin 09.29.03 at 4:53 pm

I strongly suspect that communication between my wife and I and doctors was negatively affected in an important circumstance by the tendency of many doctors not to think in statistical terms. In the 1980s, my pregnant wife had an amnocentesis and the relevant doctor initially reported that one of the fetus’s chromosomes appeared short. After further examination, and comparison with our chromosomes, the doctors told us that they had concluded that things were OK. Upon our child’s birth, it turned out that the relevant chromosome was, in fact, missing a portion of genetic material, with very serious developmental consequences. Lookingback on things, I think that, in the genetic counselling sessions, my wife and I implicitly were expecting information in some (necessarily rough) statistical form — “You have about an x% change of a problem.” — and were prepared to make a decision about having an abortion based on this sort of information. The doctors’ instinct, however, seems to have been to reduce the available information to a binary OK/not-OK form. (My wife thinks that the doctors were tilted toward the binary way of presenting the data because, consciously or unconsciously, they did not want to encourage an abortion unless they were sure there was a problem. If so, this had the effect of placing a lot of the relevant moral, social, etc. decision process in the doctors’ hands instead of our’s.)

At the time, I developed a hypothesis, based on no real evidence, that doctors (often) are bad at statistics because in many situations the practical decision they make will be the same for a wide range of statistical results. For example, suppose a test shows a probability of some medical condition of fifteen percent. if the condition is significant, the doctor will have to take action (more tests, precautionary therapy, etc.) while taking into account the possibility that the condition does not exist (e.g., keep an eye out for signs that the patient’s symtoms are caused by something else). However, if the true probability of the condition were 85%, it seems to me that the doctor’s practical decisions would be much the same — take action to deal with the condition while also allowing for the possibility that the condition does not exist. I hypothesize that frequent exposure to such situations trains doctors to ignore the fine points of statistics.

On the other hand, our pediatrician frequently refers to statistical results from the literature when advising us on medical treatment and tests for our son. However, he was recommended to us by a sociology professor, so there’s some selection bias right there.

26

Keith M Ellis 09.29.03 at 8:50 pm

Daniel,

“…is ‘1 if they have colorectal cancer and zero if they don’t’, because probability isn’t defined over single events in conventional probability theory.”

Yes, but isn’t it the case that since this is a hypothetical, we’re not talking about a single event?

This objection comes up occasionaly from people that write to me about my Monty Hall Problem page.

27

dsquared 09.30.03 at 6:52 am

The question is phrased so as to refer to the probability of a single event, surely, and I don’t think that phrasing it in the counterfactual changes the reference sufficiently (I will defer to Mr Weatherson on this one if he disagrees).

In the context of a Monty Haul page, the reply would not be so much that hypotheticals aren’t single events, but that the structure of the question (“Would you switch?”) invites a reply phrased in terms of probability-as-degree-of-belief rather than classical probability. The question “Do you do better by switching?” is explicitly a question about the expectation of a process rather than the probability of an individual event.

28

Brian Weatherson 09.30.03 at 5:21 pm

I agree with dsquared entirely on this one.

On the question of priors, the ‘orthodox’ answer to these questions relies on treating the person taking the test to be a randomly selected member of the population. That’s, to say the least, unsupportable in theory and unreasonable in practice.

On the question of probability, I’m enough of a Bayesian to think that we can talk about a probability here even if it is a single case, although it just means something like reasonable degree of belief. (I don’t think dsquared means to disagree that _Bayesians_ can talk that way.)

If you’re not a Bayesian, it requires an odd interpretation of the terms involved as (something like) generics rather than (something like) referring expressions in order for those terms to denote a class for the terms to be defined over.

Having said all that, I’m still worried the doctors were off by _so_ much. Bayesian/Hayekian/Keynesian considerations can account for some movement from 5% (in particular upwards movement) but not I’d have thought all that we see. It would be interesting to know the proportion of people who _actually_ take the test and test positive who have the disease. If that’s closer to 5% than 50%, the doctors have some explaining to do.

29

Chris 09.30.03 at 5:54 pm

D-squared and Brian: the testing Gigerenzer discusses is routine screening on the lines of breast cancer screening. So those tested aren’t pre-selected because of some symptomatic indication that they might be at risk. Assuming that the prevalence is given for all those in, say, the age group selected for testing (rather than the general population), and that all this was explained to the doctors beforehand, it looks implausible that we can let them off the hook in the way you suggest and that they should, indeed go for the “orthodox” answer.

Comments on this entry are closed.