I have a new piece up on The Long And Short, suggesting that the “Evidence Based Policy Making” movement ought to be really very worried about the reproducibility crisis in the psychological and social sciences. In summary, the issue is that most of the problems that the sciences are dealing with are highly likely to be there in policy areas too, meaning that the evidence base for education reform, development economics, welfare and many other policy areas is equally likely to be packed with fragile and non-replicable results. I do suggest a solution for this problem (or rather, I endorse Andrew Gelman’s solution), but point out that it is likely to be expensive and time-consuming and to mean that evidence-based approaches are going to be a lot slower and deliver a lot less in the way of whizzy new policy ideas than people might have hoped.
I got quite a bit of pushback. Responding here to a few points made:
“It’s surely better than nothing”. The idea here is that the fragile and non-reproducible evidence base we are likely to have at the moment in a number of policy areas is still good enough to make decisions with. I don’t see how anyone can say this with any degree of confidence at all. The point about non-reproducible evidence is that it doesn’t constitute a valid test of the underlying true model.
“Should we go back to anecdote and political prejudice then?”. I think this objection is also ill-formed, something which is easiest to see if you think about it in Bayesian terms. The weight that you can put on fragile evidence is low, because you know that its statistical significance has been overstated by an unknown amount. In which case, you would only regard the evidence as shifting your view if your original prior was very weak. So perhaps in entirely new policy areas it might make sense to create policy on the basis of a compromised evidence base. When you have a status quo which seems to be broadly working (rather than in crisis, a case which I’ll come to below), then it seems to me unlikely that non-reproducible evidence ought to convince you that a big improvement is possible. Note also that weak evidence might not even be directionally correct; it’s certainly possible that there is material evidence in the literature in favour of policies which might make things worse.
“We’ve got to do something“. Well, do we? And equally importantly, do we have to do something right now, rather than waiting quite a long time to get some reproducible evidence? I’ve written at length, several times, in the past, about the regrettable tendency of policymakers and their advisors to underestimate a number of costs; the physical deadweight cost of reorganisation, the stress placed on any organisation by radical change, and the option value of waiting. A lot of my scepticism about evidence-based policymaking is driven by a strong belief in evidence-based not-policymaking.
Finally, I’d note that the use of a published social sciences evidence base is not at all necessarily inconsistent with making policy based on political prejudice and anecdote. In my experience, a lot of the Leading British Evidence-Based Community seem to combine a deep sense of paranoia about policymakers wanting to ignore evidence and go back to prejudice and anecdote, with an equally deep naivete about the same policymakers cherry-picking from the evidence offered, a problem which would still exist even if the evidence itself was robust. Since they are in large part advising people like Michael Gove, I think this is a bit of a blind spot.
In summary, my view then is that what we need is genuine, robust-evidence-based policy making, and (therefore) a lot less of it. What we’re likely to get is policy making based on a biased selection from an already weak evidence base, combined with a structural attempt to delegitimise any protest or critique of that policymaking as Luddite and anti-scientific. People need to be worrying about this.
 There’s a temptation to read about all the problems with p-values and presume that this is all a problem of frequentist statistics and that we wouldn’t have a reproducibility problem if everyone converted to Bayesianism. Unlikely, IMO. The underlying problem here is methodological, not mathematical. There are some interesting issues which are created by the arbitrary choice of 5% as a significance level, but the general issue of institutional incentives to overstate statistical significance would be there whatever framework you use.
 Historical note: I didn’t. I think I had a point here and meant to fill it in, but went to sleep instead and forgot it. Presumably it was to do with the fact that the perception of a policy area as being “in crisis” in the first place, and therefore requiring immediate action, is itself a political decision and subject to huge amounts of cherry-picking. The decision of what you need to gather evidence on is one which itself ought to be evidence-based; we can note that we have huge amounts of evidence-based suggestions for education policy but only John Quiggin appears to be regularly making an attempt at evidence-based intelligence policy.