This Year’s Model

by Daniel on April 25, 2005

A week late and a dollar short, I am now ready to unveil my election forecasting model. Gosh what fun it was to make; there really is no substitute for wading in and making a model if you want to learn about a dataset. I am not uploading it here, because the bloody thing is a 1MB Excel file and I suspect that the bandwidth consequences of this for CT would cost me my stripes. However, if you email me at daniel dot davies at gmail dot com, I’ll send it across to you if you like. Or alternatively, you can download the data yourself from Martin Baxter‘s site and produce your own version which may be safer from my trademark incomprehensible spreadsheet design and boneheaded calculation errors. Below the fold, the recipe for my model, the forecasts themselves, and a bit of psephological analysis which suggests why I think that the Liberal Democrats are really, really badly screwed.

First the recipe. Basically, I am calling this the Allocated Regional Swing Estimate approach (as in, “here’s a forecast that I just pulled out of my ARSE”). The idea is to try and do a bit better than the Uniform National Swing model by at least trying to take into account the information contained in the constituency-by-constituency swings. Unfortunately, I have no real guarantee that the method I am using to do this doesn’t introduce errors which are larger than the variance in the original data, but there you go. The idea is to break down the national swing into 650-odd constituency-specific swings (in the spirit of estimating beta factors, for any stock market types to whom that means anything). This approach is rather scuppered by the fact that I have precisely three useable data points for each constituency; the 1992, 1997 and 2001 elections. This is basically because I am too lazy to do it properly, but also because the effect I’m trying to capture is the big change in election behaviour in the Blair era, so I’m not sure how useful pre-1992 data would be; I doubt the underlying distribution would be stable anyway. If something’s not worth doing, it’s not worth doing properly, that’s what I say.

Anyway, what you do is the following.

1. Take the votes won nationally and in each constituency for each election by Labour, Conservative and LibDem parties and convert them to percentages as if the seat was a three-horse race. This is a scandalous oversimplification, justified on the basis “bloody Nationalist parties, who gives a fuck about them?”

2. Calculate the changes in these percentage shares from 1992 to 1997 and 1997 to 2001. You now have a 2x3x647 panel dataset, containing two observations of the change in each of the three parties’ share (which I rather annoyingly call the “swing”, even though it’s just the crude change in a share rather than a swing from one party to the other) for each of 646 constituencies and nationwide.

3. Now, for each party in each constituency in each year, divide the “local” swing by the national change in vote share to get a “local” swing multiplier. Average over the two periods, and then “shrink” them by a third of the difference between the local swing multiplier and 1 (there is decent Bayesian justification for this rather ad-hoc-looking procedure and it improves the results somewhat). You now have a list of 646 “swing multipliers” which take you from a national change in a party’s share to a change in its local share in every constituency.

4. Bob’s yer uncle. Now you can plug in your favourite poll data, convert to a three horse race, subtract from the 2001 election percentages to get a set of three “swings” and multiply them by the multipliers to get changes in percentage shares for every constituency. Assume that the turnout is unchanged and so is the nationalist and minority vote and you get vote numbers for every constituency (this is a pretty unforgivable oversimplification but most UNS models also make it), which lets you forecast the winner on a constituency by constituency basis.

The advantage of this system is that it produces two-way gains and losses, like a real election – you can have seats going Labour to LibDem and seats going the other way. The disadvantages are many; it doesn’t seem to do much worse in backtesting than Martin Baxter’s UNS model for the aggregate numbers, but it seems to forecast more changes of control than actually happen (I think that this is probably due to my cavalier treatment of the minority parties; Baxter is vastly more painstaking and scrupulous). And, its forecasting of the Liberal Democrats is really rather weird.

But first, the actual forecasts, I hear you demand! OK, on the basis of Martin Baxter’s predictions for the popular vote (33% CON, 39% LAB, 21% LIB), I get the following results:

Conservatives: 186 seats, versus 165 in 2001 and Baxter UNS prediction of 175
Labour: 404 seats, versus 403 in 2001 and Baxter UNS prediction 389
LibDems: 40 seats, versus 51 in 2001 and Baxter UNS prediction 54

This would be a thumping win for Tony Blair, although it is clear that my failure to address the minority party issue is having serious effects here; I have major parties taking 630 seats between them versus Baxter’s 618. Heroically, to adjust for this (and to address my prejudices about poll overstatement), I just knock 3% off the Labour vote. That gives me CON 209, LAB 385, LD 36! Although I note that this hasn’t solved the problem of overstating major party seats, which I will clearly have to do ad hoc closer to the election. I would just like to emphasise at this point that this is by no means my actual election forecast; that will be made on May the first, based on this model, but with a certain amount of subjective adjustment which I will explain in another post on that date. For the moment, we’re just playing with the model.

Clearly and qualitatively, though, the ARSE model is forecasting a complete nightmare for the LibDems. And note that my model has a very strange feature here; knocking 3% off the Labour score in a popular-vote increases the LibDem share in the three-horse race which drives the projections. So this model has the perverse result that, over quite large extents of the possible vote share space, the LibDems do worse the more votes they get! (In fact, the best the Libs can do in my model is if Labour gets 1.5x the Tory vote and the LibDem vote goes to zero! Martin Baxter worries about small technical percentage errors! I’m an idiot! Hahahaha!). What’s going on here?

Well, they do say that financial models reflect the personality of their author. Martin Baxter (by the way, his book on Financial Calculus is one of the best ways to ease yourself into stochastic calculus if you really have to) produced one that is punctilious and thoughtful. I produced one that is entirely experimental, atrociously inaccurate, but which may possibly have a tiny grain of common sense at its kernel. I think I can explain this.

The “problem” (if it is indeed a problem and not a feature of the data which I am capturing) is that it turns out that the LibDems have a lot of seats where they are either the incumbent or the challenger which have negative local multipliers on their vote share. These are in turn driven by the fact that the LibDem vote share nationally fell by 0.6% in 1997 on 1992, but they won seats like Sheffield Hallam by increasing their vote share from 31% to 52%. This gives a multiplier for that year of –38.42, which dominates the average and is not really calmed down all that much by my Bayesian shrinkage.

But is this a problem with the model, or a problem with the LibDems? I’m not so sure that it’s the model that’s at fault here. The issue is this; everyone is talking about tactical voting and “tactical voting unwind as issues that might affect the Labour Party. But it looks to me as if the LibDems have just as much if not more to fear. The problem here is that the LibDems existing 51 seats include a number of places like Sheffield Hallam, where they are basically squatting in what are intrinsically Conservative constituencies. And they are not holding these seats with Labour votes; they’re holding them with Tory votes; of the 21 point increase in LibDem support in the 1997 election, only 5% came from Labour.

What I think has happened here is that the Tory party of 1997 and 2001 was just purely and simply not electable. Although as Chris has shown, the Conservative party’s “base” are people with political views that don’t match up to the rest of the polity, they also have a lot of core voters who aren’t swayed by gimmicky “five days to save the pound” or asylum seeker scares, but who are simply part of the substantial bloc of people in the country who, tribally or otherwise, don’t like socialists. Their natural home is the Tory Party; if they want to punish it for going off the rails they’ll swing to LibDem but not to the enemy – my guess would be that “Vote Blair, Get Brown” would have played a lot better in Sheffield Hallam or Tatton than anyone else seems to think. As soon as a remotely sensible and electable Conservative Party shows up, the LibDems are screwed in Sheffield Hallam and Tatton, and the vagaries of the first-past-the-post system means that as far as I can see, they’ve got the devil’s own job picking up anything else to replace them. The issue of whether Howard’s Tories are a remotely electable Conservative Party is one that I am not yet sure of; the less they bang on about immigration, the more they look like one, so much will depend on the next two weeks campaigning. But in general, I find myself thinking that Poor Old Charlie Kennedy has been handed a hospital pass by the electoral system; he has taken the helm at what might be the high-water mark of LibDem electoral representation forever, with a bunch of seats that he holds mainly through eight-year-old protest votes, and his party’s seat market on Tradesports looks on this preliminary analysis (which I repeat is not my official forecast, still less investment advice for you lot) looks like a serious Sell at 65.



Hektor Bim 04.25.05 at 4:33 pm

Ignoring the nationalist parties seems like a very weird thing to do – it makes forecasts for Scotland, Wales, and especially Northern Ireland totally useless.

I don’t know if I buy this anyway – there is no shift from Labor to Lib Dem at all? So the protest vote angle won’t work out at all?


Daniel 04.25.05 at 4:44 pm

Protest voting doesn’t really fit into a uniform swing model (which this one is, in disguise). Martin Baxter deals with it via an ad hoc adjustment which is reasonably easy to do in his model but more difficult in mine. I’m working on it at the moment. In general, though, I’m not really seeing many Lab/Lib marginals where any Lab-to-Lib switch isn’t outweighed by the unwind of Tory switching (if it happens).

I quite agree on the Nashies, but it really was too much trouble. I may chuck in another adhoc fix.


Walt Pohl 04.25.05 at 7:09 pm

Did you adjust the model to better fit the acronym?


Penta 04.25.05 at 7:41 pm

Daniel: I’m going to have to agree w/ Haktor here.

If you ignore the nationalist parties, then NI is in play. It won’t be, however; in all likelihood, NI will be split between DUP and Sinn Fein, with UUP and SDLP being splattered.

(I don’t know enough about Scotland or Wales to comment, so won’t.)


jim 04.25.05 at 8:48 pm

I don’t see why you do the conversion to a three horse voteshare. Why not simply come up with a national to local actual voteshare multiplier? Then the seats where one minus the total of the three parties voteshare is greater than the largest party voteshare can be ascribed to “other”. It looks like you’re doing more work to get a less accurate model.


Michael Mouse 04.26.05 at 8:20 am

the LibDems have a lot of seats […] which have negative local multipliers on their vote share

I think this points up a fundamental problem with your model. I can conceive of circumstances in which an increase in a party’s national vote share actually decreases its support in a particular constituency in some causal way, but they’re pretty bizarre.

Mind you, that sort of intuition has led me up the garden path before so I could be wrong here. (E.g. I still feel strongly that using imaginary numbers to solve real-world differential equations is Bad and Wrong, but the bridges stay up despite my qualms.)

I think what’s actually going on is the well-attested tactical vote/targetted seat campaigning effect, which gave the non-linear result for the LibDems last time round, and which might give entertainingly non-linear results this time.

In the spirit of the LazyWeb, I’d like to see modelling that takes on the tactical vote issue explicitly. I imagine some fiendishly clever feedback mechanism that magnifies the expected results (poll data?) in a constituency in to a propensity for anti-Tory tactical voters to go one way or ‘tother.

I think this time round we can expect anti-Labour tactical voters in a way we’ve not really seen before, and hence about whom we have insufficient data so cannot make robust predictions. That’s never stopped people before, though!


Kieran Healy 04.26.05 at 9:12 am

Sounds like the LibDems may become a victim of Simpson’s Paradox. Though I haven’t read closely enough to know if that’s right.


Urinated State of America 04.26.05 at 11:34 am

“If you ignore the nationalist parties, then NI is in play. It won’t be, however; in all likelihood, NI will be split between DUP and Sinn Fein, with UUP and SDLP being splattered.”

But the NI parties have close to fuck-all influence on the larger British politic, with the one exception being when Gerry Fitt was stupid enough to vote yes on the no-confidence motion against Jim Callaghan in 1979, thus helping to usher in Thatcher (another six months, and Labour might have recovered in polls sufficiently to win an election). And Plaid are competitive in only 2-3 seats.

So, with the exception of the Scottish Nats, I can’t see why nats would improve Daniel’s ARSE.


dsquared 04.26.05 at 12:13 pm

I think that the convention among we electoral forecasters is that you forecast Great Britain and assume that there will be 15 Ulster Unionists of one stripe or another who group with the Tories. Certainly, nothing I do to the main party vote shares has any seats changing hands in Northern Ireland so I don’t think I have that precise problem.

Mr. Mouse: remember that this is not a causal model; I’m describing a statistical relation here, which (I think) has the interpretation given above. There are plenty of positive swings to LD which have them gaining seats; the split 33C, 37L, 25LD which I am looking at on my screen right now has them picking up 12 seats. They still lose Sheffield Hallam even in this scenario though, because in my model it is basically a Tory seat.


Chris Lightfoot 04.26.05 at 2:50 pm

Daniel — is it possible for you to plot up the regions of different party control on a triangular homogeneous-coordinates diagram (i.e., like this one)? Alternatively if you chuck me a copy of the spreadsheet I can probably manage it, though since you seem to have a love-love relationship with Excel it would probably be easier for you to do so.


Chris Williams 04.27.05 at 3:59 pm

I always thought that the only reason the Lib Dems won Hallam in 97 was that I was stood outside a polling station trying to persuade everyone to vote Labour. More fool me.


dsquared 04.28.05 at 8:16 am

I am going to do my own triangular plot. I am going to do it in Excel, with a macro.


Chris Lightfoot 04.28.05 at 7:13 pm

Fantastic! That’s one Excel spreadsheet[*] I’ll be glad not to see.

[*] among many


bluemax 04.29.05 at 8:57 am

Can you use betting market data rather than poll data ? Betfair users don’t lie about their intentions. Converting the current prices and weight of money indicators into percentage votes might be enlightening (I’d do it myself, but I’m busy canvassing…). The results of your model might then inform some gambling in an unholy howl of feedback.

Comments on this entry are closed.