Horse Races and Odds

by Brian on March 13, 2008

As Daniel notes, we don’t normally do horse race stuff here. And this is week old horse race stuff. But I thought there was some interesting stuff in the SurveyUSA 50 state polls on Clinton vs McCain and Obama vs McCain. The biggest thing was that they show up an interesting fallacy about probabilistic reasoning that, although pretty obvious when stated baldly, is also pretty hard to avoid in practice.

Those polls suggest that if we just look state by state at which candidate is likely to win, we see Obama and Clinton both narrowly ahead of McCain, with the differences between their performances well within any margin of error. That seems right, though by that measure I’d put Clinton a little ahead, and they put Obama ahead.

But the polls also suggest that if we look at two more important measures, Obama is (according to just this poll) a much stronger candidate. He has a higher expected electoral vote and, more importantly, a much higher win probability. Darryl at Hominid Views produced one model that suggests this, though I suspect his numbers make both Obama and Clinton look more likely to win than they really are. So below I detail a model that I think is a little more realistic. (It’s still a very stylised model, and I’d be interested in knowing from people who do this kind of modelling well what changes might be made to make it better.)

I’m only interested here in modelling what the SurveyUSA poll tells us. So even when it throws up antecedently improbable results (Obama up in North Dakota! Obama losing in New Jersey!) I’m going to take this data at face value.

The model I’m using takes McCain’s percentage lead in a given state to be a random variable whose probability distribution is given by a normal distribution with the mean being his lead in the SurveyUSA poll, and standard deviation 10. That gives the following expected electoral vote totals.

  • Obama 299 – McCain 239
  • Clinton 279 – McCain 259

Obama’s lead is three times Clinton’s. I then ran a Monte Carlo simulation where in each round each state’s McCain lead was calculated independently as a random draw from that distribution. (Possibly it would have made more sense to not have these be completely independent.) In those simulations, Obama beat McCain 78% of the time, and Clinton beat McCain 63% of the time. I ran 10,000 simulations, which is plenty to remove sampling error, though obviously not modelling error.

Obama’s big advantage is that he locks down more Democratic leaning states, and competes in Republican leaning states. So if we think state-by-state, Clinton looks to be as electable as Obama. But it’s not likely that everything that’s likely to happen will happen. It is very likely that there will be surprises. And if Obama’s the candidate, those surprises are more likely to be happy surprises for Democrats. With Clinton, they are more likely to be unpleasant surprises.

Update: I realise I ended this post without saying clearly what the fallacy I was referring to at the top was. It’s the fallacy of inferring from the fact that each of a bunch of things is likely to be the case that it is likely that they’ll all be the case. As I say in the last paragraph, that isn’t generally right. Given enough events, it’s likely that some of them will turn out in unlikely ways. That’s generally important to remember, even if one thinks that the best ways to model this insight are somewhat spuriously precise.



Seth Finkelstein 03.13.08 at 1:04 am

Regarding: “And if Obama’s the candidate, those surprises are more likely to be happy surprises for Democrats. With Clinton, they are more likely to be unpleasant surprises.”

This strikes me as a very debatable assumption – essentially it’s reversed the “vetted” argument. Clinton essentially says everything bad that can come out about her, has come out. Whereas Obama is still very much an unknown quantity nationally.

I continue to believe that the fact that Obama has never faced a really dirty campaign is a major liability.
(no, the primaries don’t count, not given the amount of mud that will be thrown at him if he is the Democratic candidate).


Brian 03.13.08 at 1:08 am

Perhaps that’s right, although I think Obama’s ability to improve the polling numbers in every state where the candidates both come to town tells against it.

But I should have been more precise. I meant that some states that we (quite rationally) don’t expect to be competitive will be, or even will fall to a candidate we’re currently thinking as out of the race. In Obama v McCain, those states will probably go to Obama. In Clinton v McCain, those states will probably go to McCain.


Richard Cownie 03.13.08 at 1:44 am

Nice work. And now that computers are so cheap and
fast, I tend to think that you can bypass a lot of
the conventional statistical theory and just simulate a whole lot of stuff, with interesting results like this.

But the current poll numbers probably don’t mean a whole lot. The big argument against Clinton is that the Republican machine this year is in disarray, with anemic fundraising, a fractured coalition, and a nominee who’s unpopular with the base: and the one thing that seems likely to motivate the Republican base is to have Hillary on the ticket.

Obama, on the other hand, seems to have a tone and rhetoric which is not particularly threatening to the Republican base. He’s more liberal than his image; HRC is less liberal than her image.
The risk in choosing Obama is that maybe some segments of the electorate aren’t yet ready to
vote for a (half-)black candidate. The Ohio exit
polls seemed to show a little of that. But I
think we can’t allow ourselves to be paralyzed by
that fear.

I’d also bet that the famously hot-tempered McCain, who already has a personal feud with Obama from their dealings on senate ethics reform, will come across badly in the debates.


Jessica Polito 03.13.08 at 2:03 am

Why did you pick a standard deviation of 10? That seems extremely large. As I’m sure you know the typical standard deviation from basic statistics, given a sample of 600 people, would be more like 2 percentage points. Certainly there’s more error in polls than that; if your SD supposed to reflect potential changes in people’s opinions between now and election day?


Brian 03.13.08 at 2:37 am

Yep, the choice of 10 was, at least in part, to allow for variation between now and election day. (That’s why I think I probably shouldn’t have run the 50 states as independent samples, since the movement between now and election day is not likely to be independent.)

I also wanted to allow for modelling error in the original survey. The sampling error in the poll is likely to be about 2%, but that’s probably the smallest error in polling of this kind.

Also, and this is a little less scientific, using 10 produced win probabilities for the different states that felt plausible. I tried with 5, and it felt much too settled. As it is with 10, there are way more states in the 95%+ win probability for different candidates than I think is really the case, but that’s a problem I think with any inference from a single poll.


Seth Finkelstein 03.13.08 at 3:34 am

I’m dubious about “In Obama v McCain, those states will probably go to Obama.”. That probably does have an answer that could be projected, but I don’t know if current polls are the way to go. I speculate that when the Republican attack-machine really gets going, an inexperienced African-American (literally) who is campaigning anti-war, will make a very motivational target.


Scott Hughes 03.13.08 at 4:38 am

It seems their general electability is so close that, if someone wants to vote, just vote for who you would prefer out of Obama and Clinton. It’s worth the gamble that you are voting for someone who has may seem to have slightly less of a chance in a general election.


dsquared 03.13.08 at 7:45 am

Brian – Stuart Thiel (who runs the Dr Pollkatz site) reckons, credibly, that there is a lot of information in the correlation matrix of statewise polls (ie for example that the North Dakota poll should have a Bayesian shrink in the direction of the South Dakota poll and so on). I think there might be a big gain in accuracy from dropping the assumption of 50 independent variables.


dsquared 03.13.08 at 7:48 am


glenn 03.13.08 at 11:13 am

This makes alot of sense and coincides with a general belief among centrists and liberals that it would be (much) easier for Obama to beat McCain than it would be for Hillary, since many of the arguments she uses to try to establish her qualifications (experience, vetted-ness, toughness) are actually much stronger arguments for McCain. Also, since they both were for the war (and still seem to be), and Obama was clearly against the war, it’d be very hard for Hillary to draw many contrasts on the subjects that are currently important to many Americans, than it is for Obama.

Obama stands out and is unique. He is an attractive (figuratively) candidate for many because he not only represents newness, but is the only credible alternative (to either of his opponents) to changing the status quo. Clinton and McCain represent the staus quo.


Rich B. 03.13.08 at 2:01 pm

I ran 10,000 simulations, which is plenty to remove sampling error, though obviously not modelling error.

Ah, if only there were a number of simulations you could run that would remove modelling error!


James Wimberley 03.13.08 at 3:24 pm

Rich in #11:
With enough computing power, can’t you simulate across variations in model structure?
Think of the multiverse as a problem in theology and a solution in theodicy. God is trying to get the Creation right just once, to have free intelligent creatures to talk to that don’t screw around. Each universe is defined by different laws of nature. On this hypothesis we just happen to live in one of the failures.


Jeff 03.13.08 at 3:40 pm

Independence bad.

Further, you might as well choose a distribution for the random variable that models your beliefs about reality better than the normal; you don’t need normal’s nice analytical properties if you simulate.


David W. 03.13.08 at 5:28 pm

seth, Hillary Clinton has never had to run a tough campaign herself having cruised to two easy wins in NY.

Nor am I sure that having pro-war Hillary running against pro-war McCain is all that good a bet.


Seth Finkelstein 03.13.08 at 9:21 pm

david w – True, but I think dealing with the Republican attack-machine for eight years of being First Lady does indeed count as significant campaign-relevant experience.

It is not at all clear which Democratic candidate will fair better. But a lot of even middling technically sophisticated analysis I’ve seen seems to me very flawed by projecting the current state of Obama-love into the indefinite future. He’s gotten a gushing, very uncritical, press-ride for a few months, and that’s not going to continue forever. Moreover, I’m no expert on politics, but the idea I see a lot that the Republicans don’t know how to handle him because gosh darn it, he’s just so shiny and new, seems to me to be ludicrous.


D Jagannathan 03.13.08 at 9:51 pm

Aristotle, quoting the poet Agathon, has a nice bit about this fallacy in the Rhetoric (1402a7-15):

Among the rhetoricians, a persuasive argument comes up on the basis of what is not simply probable, but only probable in a certain way. This is not true on the whole, as Agathon says,

“Perhaps someone might say that this is probable:
Many improbable things happen to people.”

That’s because what’s improbable happens, so that what’s improbable is probable [in a certain way]. But it is not simply probable.


Thrasymachus 03.13.08 at 11:44 pm

It is not at all clear which Democratic candidate will fair better.

Um… I think it is. Virtually every poll shows Obama performing better than Clinton in a contest with McCain. Most Republicans regard Obama as the more dangerous opponent. They would much prefer to run against Clinton, who unites and galvanizes the Republican base far better than any other Democrat, and who also has relatively little appeal to independents.

But a lot of even middling technically sophisticated analysis I’ve seen seems to me very flawed by projecting the current state of Obama-love into the indefinite future.

Maybe. And maybe not. None of us can predict what will happen between now and November. There is, however, no good reason to assume that Obama’s ratings can only go down while Clinton’s cannot drop any further.

If you have any evidence to the contrary please provide it. Much of the commentary from Clinton’s supporters seems to consist of wishful thinking unsupported by any evidence.


David W. 03.14.08 at 12:21 am

seth, my feeling is that while Hillary Clinton’s doesn’t have as much downside potential as Obama does, she doesn’t have as much of an upside potential either. In short I think both can win but I prefer taking a slightly greater risk with Obama because I think the chance for a bigger electoral payoff is worth it. I know in Minnesota where I work that the GOP Senate incumbent Norm Coleman would have a much tougher race with Obama at the top of the ticket because Obama will attract more independents to the polls and they’ll be more kindly disposed to vote for someone like Al Franken than Coleman because back in 1998 they voted for Jesse Ventura over Coleman for Minnesota governor. And that scenario also plays out, albeit somewhat differently in other states as well.


Seth Finkelstein 03.14.08 at 2:35 am

Regarding – “There is, however, no good reason to assume that Obama’s ratings can only go down while Clinton’s cannot drop any further.”

The problem is that Obama has never been through the wringer in the way that Clinton has. Taking their current polling as an indicator of their FUTURE appeal without even considering this difference seems to me bad modelling.

Just look at today’s story about Obama’s Pastor Rev. Jeremiah Wright. That’s a taste of things to come.
Oh, here’s another: “One of Obama’s Earmarks Went to Hospital That Employs Michelle Obama”

You’re going to see the evidence very frequently.

david w, I’m not sure how good such a “coattails” effect will be if Obama ends up covered in mud.


mullaghman 03.15.08 at 3:09 pm

Where Theory meets die Praxis

Obama with 299 electoral college votes and Clinton with 279 is simply the 2004 outcome (Dems with 252 votes) plus Florda (27 ec votes) and Ohio (20). We were projecting those outcomes as “what ifs” on election night 2004 on beer napkins without the rigors of a Monte Carlo simulation. And to say that Obama’s lead is 3X Clinton’s is non-sensical: electoral college votes are not discrete, they come bundled. Obama’s lead in the model’s outcome is simply “one state.”

The electoral college imposes a harsh reality and can distort national data:

(1) As impressive as Obama has been, can there be any doubt that his standing in national polls and resulting competiveness vs McCain has been significantly enhanced by his “wins” in Democratic primaries in states which are quite peripheral to the needs of Democrats in the electoral college;

(2) With an Obama nomination, the Democrats will undoubtedly attract a disproportionate share of first-time voters (younger and older) but this will occur mostly in states already “safe” for Democrats; the vast majority of Republcan electoral votes are impervious to cohort changes in voter turnout. The “tunrout” effect attributed to Obama, and his attractiveness to independents/non-voters on a national level are quite different to electoral college impacts.

(3) the ticket will matter: McCain will undoubtedly choose a VP on his “right;” Obama will choose a centrist; and Clinton can pick a VP on her left. Advantage to Clinton who can move to the center and pre-empt space McCain needs to be competitive in. This is why party professionals want the “dream team” ticket of “Clinton/Obama.”

As arcane as it is, the EC works in mysterious ways.

