Merkel and the Markets

by John Q on September 22, 2005

Thinking about the German election outcome, it struck me that this would be an ideal test for betting markets. I’ve always thought that, if there’s a bias in such markets it would be towards the right, so the toughest test for them would be predicting a left-wing upset win like this one (I’m calling a win on the basis that left parties got a majority of the vote, not making a prediction about what goverment might emerge). A quick Google reveals that there is such a market, called Wahlstreet but my German isn’t good enough to deal with their site, which has lots of graphs bouncing around without an obvious control. Hopefully someone will be able to help me.

Anyway, if there’s a contract allowing a bet on the share of votes for the three left parties (SPD, Greens, Left party) and if, two weeks in advance, that market was predicting a vote share of more than 50 per cent (as actually happened), I’ll concede that the case for the superiority of betting markets over polls has been established, at least as a reasonable presumption. [I didn’t follow the polls closely but I had the impression that most of them were predicting a CDU/CSU win until the last days of the campaign].

{ 1 trackback }

09.22.05 at 5:40 am



Daniel 09.22.05 at 5:46 am

I think the polls were predicting a CDU win in the sense that they were predicting that the CSU wouldn’t be able to form a government, but I don’t think they were predicting any absolute majorities.


TH 09.22.05 at 5:56 am

Well, from what I could see (and the site is a lot less than informative), the betting markets don’t have even a shred of an advantage. They predicted 40% for the CDU/CSU party-combination, which follows the polls but not the election results.

In short: forget it.


Matthew 09.22.05 at 6:07 am

This shows the poll evolution. From just eyeballing it, a case can be made the polls weren’t wrong, and there was a very strong late surge


John Quiggin 09.22.05 at 6:10 am

As I understand it, nearly all the votes go to parties that can be classified as either left (SPD, Green, Left) or right (CDU/CSU, FDP) so one or the other must get a majority of votes, or there must be a dead heat. From the graph linked by Matthew, it looks as though the polls were calling a dead heat for the last two weeks, so a win for the markets would require a clear prediction of a combined left majority of votes.

Of course, a working parliamentary majority is more problematic because both SDP and CDU have ruled out working with the Left, at least so far.


Alex 09.22.05 at 6:20 am

They went bust on the CDU – at the end of trading they still had them over 40% (the polls had begun to reflect the drop).


a 09.22.05 at 6:39 am

“I’ll concede that the case for the superiority of betting markets over polls has been established..” From *one* data point? This is madness (or sociology?), not science.


John Quiggin 09.22.05 at 6:54 am

Einstein’s theory of relativity was widely accepted on the basis of a single data point (perihelion shift of Mercury).

All you need is that a theory’s prediction should be both correct and highly improbable if the theory is false. Then apply Bayes’ theorem.


nik 09.22.05 at 7:29 am

All you need is that a theory’s prediction should be both correct and highly improbable if the theory is false. Then apply Bayes’ theorem.

Is your prediction (polls=wrong, markets=right) improbable?

The coin in my pocket has a 50% chance of predicting whether the left or the right would win. Given that they’d botched the call – I’m going to be vindicated 50% of the time. If I had tossed a coin and got it right you wouldn’t conclude the superiority of cleromancy over polls had been established.

As well as saying little about the frequency of (polls=wrong, market=right), your test says nothing about the frequency of (polls=right, market=wrong): which is of some importance.


siron 09.22.05 at 7:33 am

I read somewhere the claim that Wahlstreet’s prediction was slightly better, but havent verified so far.

Here is the best overview about the other polls:


abb1 09.22.05 at 7:36 am

Aren’t most of the market bets based on polls anyway? If not on polls, then what – gut feeling?


nik 09.22.05 at 7:40 am


I think the (theoretical) justification of markets is that they’re all based upon private information as well as public polls. This includes private polling for political parties; knowledge that something damaging about one of the candidate is going to come out; that snow on the day will encourage one lot of supporters to stay inside; right down to your voting intention.


abb1 09.22.05 at 7:59 am

Nik, I think media polls (at least in the US) do try to factor in some (probably most) of these variables when they define a ‘likely voter’. The number (and effect) of people knowing that secret damaging piece of information just has to be negligible.


siron 09.22.05 at 8:04 am


Parkett schlägt Umfragen

Have not tried to verify their claim. Here is an overview of the polls:


nik 09.22.05 at 8:17 am


I’m not suggesting anything about the reality of whether markets actually use private information well – I haven’t a clue about this – I’m just saying that that’s the justification given for them. If markets aren’t any better than polls, there’s a good case that private information can’t be used to improve on polls. If markets do better then polls then there’s got to be a reason for this. If markets do worse than polls it says something worrying about capitalism.

I’m sure most the bets are based on polls, but the idea is those who know a little more because of useful private information (which could just be the feeling in their area) will be able to profit from them, and help improve predictions.


abb1 09.22.05 at 8:45 am

My impression is that this not about any private information, but rather about mystical ‘wisdom of the crowds’, i.e. composite gut feeling of a large number of individuals.


Seth Finkelstein 09.22.05 at 9:48 am

“Prediction markets” can sometimes do better than “polls”, because the typical media presentation of “polls” is often “most hyped, unusual, controversial storyline we can get out of this data”, while “prediction markets” is “what people willing to put up money think of this data”. There are obvious cases where the two diverge meaningfully. But it’s not a “wisdom of crowds”. Just different noisy variables.


a 09.22.05 at 10:15 am

“Einstein’s theory of relativity was widely accepted on the basis of a single data point (perihelion shift of Mercury).

All you need is that a theory’s prediction should be both correct and highly improbable if the theory is false. Then apply Bayes’ theorem.”

This is complete nonsense.

There are many possible data points to look at concerning electronic markets – lots of events that they “predict” and which have poll data. Yet, among all those points, you choose one and say that will have “established” the superiority of markets over polls.

Incidentally Einstein’s theories weren’t confirmed by one data point, and to say so shows a gross ignorance of the history of science.


abb1 09.22.05 at 12:15 pm

Seth, pre-election poll is simply a prediction: party A – x%, party B – y%, based on an opinion survey results adjusted by some algorithm.

I do see how prediction market might diverge, but I don’t necessarily see how it can diverge meaningfully. For example: last year the IEM market had Bush consistently above 50%; my gut was telling me that Bush couldn’t have more than 5%; I bet against Bush and lost 200 bucks; end of story. Was it meaningful? Not as far as I am concerned.


gr 09.22.05 at 1:32 pm

According to the article linked in nr. 14, the Wahlstreet beat all professional pollster’s predictions, but not by a dramatic margin. The Wahlstreet prediction, moreover, got it wrong in just the same way as polling institutes. It didn’t foresee the dramatic drop of the conservative party (CDU) in the actual result and it also didn’t predict that the three parties of the left would together gain a share of more than 50% of the vote. According to Wahlstreet, the CDU/FDP combination was at 49% of the popular vote and the combination of the three parties of the left slightly weaker.

In political terms the betting market expected the same result as the one that was predicted by the polling institutes and the pundits, i.e. a CDU/FDP share of the vote large enough for an absolute majority of seats in parliament.


Seth Finkelstein 09.22.05 at 2:44 pm

abb1: What I meant was that there are often several ways of reading the data to get a “prediction” – adjust for likely voters? what to do with the undecided? any statistical sampling bias? Etc.

A given data set can yield more than one prediction.

Media reports often select for the reading of data which makes for the best-selling story, for sensationalism. A “market” may select for analyst consensus (or maybe not – but it’s possible). So comparing “analyst consensus” vs “hype story” will yield some victories of informed guesses against the silliest of hype. And that’s basically the driving factor in the mysticism.


Brackdurf 09.22.05 at 2:49 pm

Have to briefly concur with “a” here: by my memory of the history of General Relativity, the precession of Mercury’s orbit was already known well before Einstein developed the theory, and was cited as a pre-existing piece of evidence, much as the Michelson-Morely experiment was for Special Relativity. Furthermore, the experiment that supposedly did clinch it after the fact, the Rutherford measurement of the deflection of starlight by the sun’s gravity, was subsequently shown to be within the margin of error the apparatus, and thus was ambiguous in its support of General Relativity. So I think you need another paradigm case for your one-piece-of-evidence-is-enough view. I doubt such a case exists.


John Quiggin 09.22.05 at 3:08 pm

Brackdurf, you’re quite right that deflection of starlight and not precession of Mercury was crucial in the sense I wanted.

As regards the margin of error, this is the point at which Bayesian and classical inference typically part company. From a Bayesian viewpoint, the question is best posed “in the light of the available evidence, which theory is more probable”, not “can we reject the null hypothesis”.

As a matter of historical fact, Einstein’s theory was widely accepted on the basis of the single data point of starlight deflection, and those who accepted it were right to do so.


Dan Goodman 09.22.05 at 4:58 pm

English-language betting on German politics can be found here:’s betting on the 2008 US Presidential election includes some bets I consider rather odd.


Tracy W 09.22.05 at 5:05 pm

It is a reasonably well-established principle that the average of forecasts is better than the best forecast, assuming that the forecasts are unbiased.

As a quick example, imagine two forecasters forecast the unemployment rate next year. One forecasts it will be 6.0%, the other that it will be 4.5%. It turns out to be 5.0%. The average forecast of 5.25% has an error of 0.25%, which is lower than the best forecast’s error of 0.5%.

This concept has been tested in real-life situations, where, unlike my constructed example, forecasts can all be statistically biased (I am not necessarily talking about ideological bias, statistical bias could arise from other neutral causes, e.g. a mistake by the statistics department so everyone’s plugging data into their models which was wrong in the first place). If you don’t know which way the forecasts will be biased ahead of time, it makes sense to go ahead and average. See for a reference.

This is a way in which markets can be better than the best professional forecaster. Participants do not need private information, they just need a wide range of different biases and a way of combining their forecasts (which can be a simple average or a weighted average such as provided by how much money you can bet on it) to improve the final forecast.


Seth Finkelstein 09.22.05 at 6:00 pm

“It is a reasonably well-established principle that the average of forecasts is better than the best forecast …”

I disbelieve.

I am thoroughly unconvinced of the practice of putting someone’s head in an oven and feet in the freezer and saying the average personal climate is more comfortable than a typical day (and of course better than the indivdual parts!)


Tracy W 09.22.05 at 10:22 pm

Seth (no 26). You’re working backwards. You’ve got the outcome and then you’re making the forecasts.

I agree this is a far better method when you can get it to work. And sometimes you can. I’ve heard that a London trader once made a fortune by having a system of carrier pigeons that reported the result of the battle of Waterloo to him before anyone else in London knew about it. The brother in the movie 50 First Dates provides another example.

However, there still remain a percentage of cases where one must make a prediction about the future without actually knowing what it is. This may be a small percentage, but this is the point where we must resort to inferior methods compared to plugging the right answer in, and it is when averaging forecasts comes into its own.

As for your person with the feet in the freezer and their head in the oven, assuming that the oven and freezer are at standard “on” temperatures, then their average temperature level is probably over 90 degrees celsius, which is much more uncomfortable than the normal range of between 0 to 40 degrees celsius. The evaluation of the utility provided by a world state is a separate issue from the accuracy of the forecast.


Brackdurf 09.23.05 at 12:53 am

Returning to General Relativity re John Quiggin’s reply in 23, it was certainly the case that many people came to accept GR’s truth based on Rutherford’s experiment. But as with SR, I think many, like Einstein, believed it purely on theoretical grounds, so that those “convinced” by the starlight experiment were largely a) already convinced on theoretical grounds, or b) didn’t know enough about the math of GR to go against the huge publicity of the Rutherford experiment. I haven’t read about anyone who had deep theoretical doubts but was convinced by the experiment, though that’s quite possible, since Rutherford exaggerated the results (unintentionally, I think) enough to make them convincing.

I don’t know much about Bayesian probability, but in most scientific revolutions I’ve looked at, it has been the theory at least as much as the experiment that has convinced the relevant scientists. An experiment that equally supported GR and the much simpler Newtonian physics should be (and usually is) taken to support the simpler, null hypothesis (if only because, time and again, the simpler theory seems to more often to turn out to be correct, all else being equal). Only really beautiful new theories, like SR or GR, seem to be theoretically compelling enough to grab the mantle of the null hypothesis from the previous theory, such that ambiguous evidence could be taken to support both equally. But I don’t know how many other cases there have been where this happened I can think of, and GR’s immediate success was due (even among scientists) in large part to Einstein’s existing reputation. So I think the impact of the Rutherford experiment was quite possibly a mistake, and one that science actually makes relatively infrequently. In any case, the wisdom-of-crowds arguments are not remotely in the same league as GR in terms of theoretical appeal, and thus bear a much higher burden of proof.

Thinking a bit more about Bayes…based on the little statistics work I’ve done, I think the force of the null hypothesis depends a lot on where you happen to be. That is, for Rutherford, he sets out to test GR, and gets a result whose error bars overlap both the GR and Newtonian (N) predictions; chances are, were N correct, he could just as likely have gone on the other side of the N prediction, with no overlap of the GR number. So the fact that he did get a datum that overlapped both is, as a one-time-thing, slightly in favor of GR, all else being equal. But from my point of view, as anyone other than Rutherford at the time, it sure wouldn’t have made the papers on the other side of the N value. With the fame of Einstein, etc, there’s a huge selection effect. Who knows how many experiments produced N-leaning results before Rutherford came along? This may seem like an irrelevantly realistic issue, but I think it is an important part of Bayesian reasoning, which really is talking about the decisions one makes with a single instance. Rutherford might be right, if he had to bet his life one way or the other right then, to bet on GR, but no one else would be right to do that. Certainly I’m going to be equally skeptical of any data that emerges supporting the betting markets, though I suppose John Quiggin is in Rutherford’s position in that he proposed the test in question, and thus might be willing to bet his life (if forced) on the pro-market side if this particular market did an ambiguous bit better than the polls (since it could have done worse).

For me, since there’s a decent chance of the market accidentally landing on the correct prediction, and a high chance of that accident getting publicized, I’ll have to wait for a) lots of evidence, and b) a convincing theoretical defense of the superiority of markets. Perhaps one piece of evidence could do it if, say, the market predicted something with a poll-based probability less than .0001, but since elections are bounded quantities and polls are almost never off by that much, it is hard to imagine that scenario happening.


John Quiggin 09.23.05 at 1:08 am

Brackdurf, I think the crucial experiment was by Eddington, but maybe you have a different one in mind.

On the general approach, what I state in the post is a willingness to prefer the hypothesis “markets outpredict polls” over its contrary on the balance of probabilities. So I don’t require anything like 0.001 probabilities.

The German outcome was sufficiently surprising that, had the markets predicted it clearly, and well in advance of the polls, I would have shifted my prior beliefs. As it is, they didn’t and I haven’t.


nik 09.23.05 at 2:59 am

“On the general approach, what I state in the post is a willingness to prefer the hypothesis “markets outpredict polls” over its contrary on the balance of probabilities.”


I just want to stress this point.

You knew when you made the prediction that the polls had gotten it wrong. Markets would either get it right (and beat the polls) or wrong (and equal them). But observing markets getting if right doesn’t support the “markets outpredict polls” hypothesis – it just shows that markets and polls give different results. You hace to show that polls don’t beat markets with a greater frequency, your test says nothing about this.

There are four possible outcomes of interest:

(1) Polls Right, Markets Right; draw
(2) Polls Right, Markets Wrong; supports polls
(3) Polls Wrong, Markets Wrong; draw
(4) Polls Wrong, Markets Right; supports markets

You only consider (3) & (4). So long as Polls and Markets aren’t identical, markets will sometimes get it right when polls get it wrong – but that doesn’t show the “superiority of markets”; it just shows polls and markets aren’t identical.

Polls can also get it right when markets get it wrong – (2). The outcomes that support polls are unobservable. Your test is set up to count the hits but not the misses. Hence, my coin example. Tossing a coin will out-guess polls some of the time, but it isn’t superior to them because polls are going to outguess a coin more often. Your test, however, can’t establish this because you’re not looking for (2).


Brackdurf 09.23.05 at 4:36 am

Yes, definitely Eddington, Rutherford was the nucleus–but generous of you to allow that Rutherford might have done his own light-bending experiment!

Anyway, when you next return to the subject of market prediction, I’d like to hear more about how you choose which hypothesis to prefer “on the balance of probabilities,” and also what will count as a “sufficiently surprising” prediction. It seems good to lay these things out as carefully as possible ahead of time, to minimize selection effects when good test situations arise in the future.


Seth Finkelstein 09.23.05 at 5:51 am

tracy (#27) – There’s a class of problems where, in general, many people will give answers all over the map – both very high and very low. I think it’s a poor trick to take something where low and high will cancel each other out, and then proclaim that averaging gives a better answer than the individual estimates. In a sense, that only works because you know the behavior of the samples beforehand with regard to the true value. But if you *don’t* know that behavior, guessing that it averages out, is nothing more than hoping the Central Limit Theorem of statistics is favoring you there. Which is again unimpressive in my view.

Basically, there’s a complicated question-begging. Sometimes errors average out, and sometimes they don’t, but if you don’t know which is the case in any particular problem, there isn’t much justification for blindly averaging.


duane 09.23.05 at 6:00 am

Google seems to think their in-house prediction market provides useful business intelligence. They don’t provide that much information, and it isn’t open to the public, but they do have some interesting graphs.


Seth Finkelstein 09.23.05 at 7:08 am

Ah, “in-house” prediction markets for corporations are a special amusement of mine.

Do you know why they work? Hint: It’s not because of any mystical wisdom-of-crowds.

It’s for a very simple reason:


I’m a programmer by trade. Any programmer can tell you a tale where they can give an accurate estimate of project time, and management doesn’t want to hear it. In fact, the accurate estimate will likely be punished. They want to hear an inaccurate estimate, usually called “aggressive”

But provide a setting where the people who are right have a reward, and not a punishment – surprise, the information has a tendency to converge to the correct answer.

It’s a shame one needs market folderol. But I suppose if management can understand it, and the engineers profit, it’s OK. But don’t think it works because of any wisdom-of-crowds. It works in fact because of wisdom-of-experts.


duane 09.23.05 at 8:30 am

I’m a programmer too, and I pretty much agree with everything you say. At the moment there simply isn’t enough information to evaluate how well the google market is doing. It is unclear from the limited information they give whether the events they are predicting are things that the market participants are directly involved in, although I guess they are. It is also unclear how well the market predictions compare to other estimates, such as the project plan (for product launch) or polls (for elections). We’ll just have to wait and see where they go with it, I guess.


Tracy W 09.24.05 at 11:32 pm

Seth (#32) – the whole point of averaging is that the high and low values cancel out (including the high and low values that are quite close to the true value).

If it is a one-off forecast, of course, then one does not have any particular reason to believe that the whole set of forecasts will be biased above or below the actual forecast. If, however, forecasters have some error-correcting mechanism going on (and a reason to try to improve their errors) then the forecasts shouldn’t be systematically biased one way or another. We’re not talking about random guesses here, the forecasters are trying to get their values right.

Of course, at times the consensus forecast can have errors of the same direction for several periods running, sometimes that’s because people mistakenly identify a trend as a blip or vice-versa, and sometimes for the same reason that you sometimes get four heads in a row if you toss a coin.

And the other problem is – what’s the alternative? Picking the best model is a good idea if the errors are systematically biased, but you don’t always know what the best model is ahead of time.

I’ve constructed a forecasting model for “Other Persons” tax according to this method, and was able to come up with a model that averaged the results of multiple rough-models that got the direction of change right for both in-sample and out-of-sample data. Okay, from memory, I only had 13 data points in total, but the previous models couldn’t get the direction of change right for their in-sample data. At times the averaged forecast gave a higher error than one of the sub-models, but over 13 periods the averaged forecast errors were lower than the forecast errors of the best of the sub-models.

Comments on this entry are closed.