Comments on: Polls and Margins

By: claxton6

claxton6 — Sun, 24 Aug 2003 18:10:41 +0000

>I, and many other people, many of whom live in Vermont, have Call-Intercept or Call-Blocking or some such feature that systematically skews who is called in phone surveys.

My experience with telephone surveys is largely in Nevada, which may be a little different from Vermont, but we only saw a very small number of households with Call-Intercept or Call-Blocking, and even among those households it was possible to get through to a household member.

Of course, I think that presumes that you have a live person doing the calling, since you have to state who’s calling. I think political polls do this, rather than automated dialling like telemarketers, but I don’t know for sure.

By: bigring55t

bigring55t — Fri, 22 Aug 2003 03:35:35 +0000

Actually, despite all the math the solution lies in the realm of psychology. Kos works for the Dean campaign thus the careful wording is simply a troll prophylactic (borrowed from Atrios) meant to head off pointless accusations of unfairness.

By: kokomo

kokomo — Wed, 20 Aug 2003 20:52:17 +0000

Given the data, there is a small chance that Dean and Kerry are tied, but a reasonable interpretation is that Dean is in the lead. This is what Kos communicated. The problem is an interesting one, but Kos’ statement is not a proper subject for the discussion.

By: pathos

pathos — Wed, 20 Aug 2003 20:24:31 +0000

I am surprised people are still doing phone polls.

I, and many other people, many of whom live in Vermont, have Call-Intercept or Call-Blocking or some such feature that systematically skews who is called in phone surveys. I am guess that the more right-wing you are, the more likely you are to block/screen your calls.

This is a new phenomenon, but it explains why the Republicans did so well in 2002, despite all polls showing that it would be much closer.

I no longer put any faith in polls conducted over the telephone. Might as well be an internet poll.

By: Thomas Dent

Thomas Dent — Wed, 20 Aug 2003 18:56:17 +0000

What Tim said. Maybe Kos got into the habit of thinking that
the MOE means subtracting from one guy and adding to the
other from looking at two-horse races. If we can assume that
the distribution of ‘undecided’s is narrowly peaked and
their number is uncorrelated with either of the two candidates
then going to the MOE +4 for Kennedy means -4 for Nixon and
vice versa.

This is a rather tricky point since what one should be talking
about strictly is a probability distribution over the entire
multidimensional space of possible results adding up to 100%.
Inevitably it doesn’t always makes sense when you try to
summarize it in a single MOE. Truman vs. Dewey vs. Thurmond was
probably a case where quoting a single MOE would be misleading
if you wanted to find the likelihood of the actual numbers being
off by a certain number of points.

And then you have the problem of Clark (1 percent) – with the
MOE being +-4, this should mean that there is a large probability
of Clark’s actual percentage being negative! This piece of
nonsense comes about because MOE assumes that the distributions
are Gaussian, but they can’t be because the Gaussian extends
from minus infinity to plus infinity whereas the percentage
result is strictly between 0 and 100.

And then you have the fact that MOE represents only the statistical
random error, and you still have to contend with systematic biases,
for example Dean supporters being more likely to agree to answer the
poll because of a peculiar character trait that they are more likely
to possess…

If another poll with different methods comes out with similar numbers
it will be much more clear that Dean has a lead.

By: Tim Lambert

Tim Lambert — Wed, 20 Aug 2003 17:54:54 +0000

I don’t think you can work out the answer unless you know to what extent Dean and Kerry are competing for the same supporters.

If total support for Dean and Kerry is fixed at 49% so that any increase for Dean is matched by a decrease for Kerry, then the 95% confidence interval for the difference is +/- 8% so that a 7% difference is not significant.

On the other hand if they are not competing for the same voters (so that half the people will never vote for Kerry and the other half will never vote for Dean) then changes are independent and the 95% confidence interval for the difference is +/- 4sqrt(2) = +/- 5.6% and the difference is significant.

Reality is going to be in between these two cases, so the answer is “it depends”.

By: Jeff Johnson

Jeff Johnson — Wed, 20 Aug 2003 15:48:34 +0000

Ooops, my explanation of confidence level was misleading. It’s not necessarily true that exactly 5 out of every 100 polls will be inaccurate at 95% confidence. That’s only in the limit.

By: Jeff Johnson

Jeff Johnson — Wed, 20 Aug 2003 15:05:15 +0000

The end of my post disappeared. It seems that the blogger doesn’t like the less-than sign. Anyway, I meant to say that there’s an 85% probability that d is between 1% and 13%.

By: Jeff Johnson

Jeff Johnson — Wed, 20 Aug 2003 15:00:33 +0000

I found a z-table and did a few calculations. Suppose we take the margin of error for Dean and Kerry’s poll numbers to be +/-3% instead of 4%. Since Dean got 28% and Kerry 21%, the difference here d=7%. The margin of error for the difference would now be +/-6%. Given our new margin of error and a sample size of 600, the confidence level would be about 85% instead of 95%.

Thus, we might say that there’s a 85% probability that 1%

By: Jeff Johnson

Jeff Johnson — Wed, 20 Aug 2003 14:09:34 +0000

The probability of any particular poll result is very low, given any assumption. This is not how you want to think of the results.

Suppose that the confidence level for the poll is 95%, which is fairly standard and seems to be compatible with the sample size and margin of error. Now, in response to J. Michael Neal, when you’re estimating the difference between two dependent variables, such as Dean’s and Kerry’s support, the margin of error for the difference is twice the margin of error for the individual variables, so a statistical tie would be within the confidence interval, because the margin of error for the difference would be +/- 8%.

What a 95% confidence level means is that if you did 100 polls with the same sample size, 95 of the polls would give results within the margin of error of the actual number in the target population. 5 of the polls, however, would give results which are not within the margin of error of the actual number in the target population. In other words, 5% of the time the polls are going to be dead wrong, even given the margin of error.

Thus, as I think Amit Dubey was suggesting, in order to calculate the probability that Dean is not leading Kerry, you have to take into account, among other things, the possibility that the actual numbers are, for example, Kerry 75% and Dean 3%.

By: Doug Turnbull

Doug Turnbull — Wed, 20 Aug 2003 13:49:36 +0000

Agree with the last post that you need to integrate your liklihood function. Plus, In some cases the liklihood function doesn’t sum to 100% (not sure if this is such a case), so you’d want to do the simulation for each possible result and then normalize to that value, which is a lot of work.

The other thing that I wonder about is whether your simulation would give you a margin of error of 4%, or whether your assumptions give you a smaller margin than that–it’s possible there are other systemic errors in the polling that increase the error margin above a true random sample.

Trying another tack, using the 4% figure and assuming it’s a sigma value (don’t know how they define it), and assuming statistical independance of the Dean and Kerry numbers (certainly not true), then you get a 1/6 probability that Dean’s numbers are 24% or below, and a 1/6 chance that Kerry’s numbers are 25% or above. So you have a roughly 1/36 chance that both are true.

Anyway, I agree with your underlying point that most people take margins of error and assume that they mean any number from the measured value +/- the MOE is equally likely, which is not how statistics or measurements work. It always bugs me when people bring out the “statistically tied” verbage, or some such, since it’s just not true.

By: Amit Dubey

Amit Dubey — Wed, 20 Aug 2003 13:27:42 +0000

Hi,

You should not do this using a simulation. The probability you got was too low because you also have to simulate all other combinations of them being tied, or Kerry beating Dean, then take the integral. (This is the last step you were missing).

What you want to do is to set up a decision rule testing if one mean really is bigger than the other, and then test the hypotheses. Most introductory social science statistics texts should cover this.

By: J. Michael Neal

J. Michael Neal — Wed, 20 Aug 2003 06:08:36 +0000

Then I believe that you have exceeded my statistical competance. I’ll get back to it when I have my degree in a couple of years.

By: Brian Weatherson

Brian Weatherson — Wed, 20 Aug 2003 02:23:08 +0000

If I ran the simulation correctly, it should have taken into account the fact that it’s more probable that Kerry’s vote is under-reported conditional on Dean’s vote being over-reported. Indeed, if I just multiply the probabilities of Dean getting as high as 28 by that of Kerry getting as low as 21 (all conditional on them both really being at 24.5), the result is under 0.1%.

I agree entirely that this isn’t very meaningful 6 months out. I’m just interested in the theoretical question because it’s one that arises fairly frequently, and this looked to be a pretty extreme case.

By: J. Michael Neal

J. Michael Neal — Wed, 20 Aug 2003 01:53:11 +0000

Kokomo,

No, I don’t think that Dean and Kerry being tied actually is within the 95% confidence interval. Either Dean being at 24% *or* Kerry being at 25% is, but not both. This is a case where the very sloppy layman’s use of “margin of error” is incorrect.