Reasoning and open data

by Victoria Stodden on June 26, 2012

It’s hard to argue with increased government transparency and accountability. Who wouldn’t welcome a bulwark against opportunist backroom deals and increased incentives for rulemakers to think carefully about policy decisions? However, the link between these goals and open data isn’t obvious and depends on what is being made available, and how it is being made available. I argue that what’s actually useful is the reasoning process that underlies decision making, of which the data are just one part.

A very real “open data” movement is occurring right now in the computational sciences largely independently of open government data. Scientists are just as affected as anyone else, or perhaps more so, by digital technology: they are making use of new computational tools and answering questions that weren’t possible before our societal obsession for data collection. Not all computational scientists use the term “open data” to describe this movement, and rightly so. The complexity of scientific research in the digital age is rendering the traditional communication mechanism – the published scientific article – woefully inadequate. In order to make verifiable claims, scientists are finding they need to communicate more information to explain their work that can be included in the traditional published article, in particular the precise computational steps that were taken to generate the results. Efforts to do this are emerging at the grassroots level in myriad fields, from biostatistics to geophysics to applied math to medical imaging [1] [2].

Accessing the data is a key part of verifiability of scientific results, but it isn’t enough. Scientists have the burden of convincing a skeptical audience they have done everything possible to root error out of their research, and so they need to explain (and justify) the steps from data collection to final results. This is nothing new, as reproducibility has been part of the scientific method since the 1660’s, but now the reproducible research movement advocates the sharing the computer code as well as the data that generated the results in the published article. is a new effort to facilitate broad scientific understanding that I am involved in, for example.

An interesting question is whether any of the structure developed for scientific knowledge dissemination can be carried over to the political case and further aims of understanding how decisions came about i.e. transparency and accountability. There appears to be at least some superficial concordance between the scientific and governance goals: communicating in complete detail how outcomes were reached, and the deep importance of broad community buy-in. If we follow standards for scientific communication, provenance and justification become paramount. In the political context this might mean providing explanations for how new rules were arrived at, including the evidence used in reasoning. For this perhaps idealistic goal open data is clearly a crucial element.

There is a nice link between the movement toward reproducible computational science and the Obama administration’s push for evidence based policy when that evidence is scientific in nature. For these types of government decisions, the reproducible scientific research movement can supply the data and reasoning behind the supporting scientific findings, such as those from climate science or public health, and fill in this part of the chain of reasoning. Evidence based policy evaluation by governments is another area where there seems to be a tighter link between the methods used in communicating scientific findings and the explanation behind government policy funding decisions – convincing skeptics of your evidence through transparency in the reasoning process [3]. Here, scientific practice could provide a useful framework to help guide principles of government communication [4].

Scientific publication is, in some sense, “linearizing a nonlinear process.” Interest lies in understanding the steps necessary to replicate the results, not in following all the steps that actually took place during the discovery process. How the political decision making process gets “linearized,” if it even does, is an open question. We’d like to know every influence on the outcome, but not the inconsequential ones. The fact that we don’t have enough government data to implement this vision today doesn’t bother me, as long as such data exist and they are making their way into the public sphere with a rapidity eclipsing that of Mickey Mouse entering the public domain, and a narrative can be constructed by decision makers that accurately captures how data were used in the political process. Data release seems to be steadily occurring, but a greater emphasis on communicating the reasoning made from the data could move the discussion toward transparency in government decision making.

[1] See e.g. “Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science”
[2] Scientists for Reproducible Research – Google Group:
[3] See
[4] The Obama administration is inserting itself directly (and usefully) in the Reproducible Research movement in science. See



bemused 06.26.12 at 3:15 pm

It would be helpful if your website had at least one working example.


Henry 06.26.12 at 3:22 pm

Try looking under ‘companion websites …’


Cahokia 06.26.12 at 4:45 pm

The Bush administration seemed to try to model an answer to open policy data. No data: no email, only phone chats.


gordon 06.27.12 at 12:02 am

Does this imply that there is, in principle, a purely mechanical process that can lead (or should lead) from information to decision? Could one, in principle, write a Governing Programme, which a Government Computer could run and thereby provide a completely transparent route from information to decision?


Peter T 06.27.12 at 1:40 am

“It’s hard to argue with increased government transparency and accountability”

People don’t want access to the processes just do they can study them. They want access so they can contest the outcome. Often this is a good thing. Taken too far, it’s a bad thing, as it leads to endless review (which is sometimes the outcome sought), and to a requirement for minute documentation of process (which is often an impediment to good decision-making). And often the case for rests on the mistaken assumption that politics is easy where it’s not corrupt because everyone is reasonable, no?


gavinf 06.27.12 at 3:12 am

I’ve worked for many years in government employment policy and program development. “Evidence-based policy” has been the mantra for a while, but in the attempts to implement this approach a number of problems keep coming up. One is that you need to invest to longitudinal research and put that in place for years before you get the statistically reliable results you need. You end up having to justify doing this time and again in each department and program and with the returns far in the future it is a difficult argument to sustain when as a program manager your job is to meet the current year’s targets. It can and is being done but in the meantime it is all about getting input from stakeholders, the bukering down with that and a piece of paper telling you how much money you can spend and working out what is the best you can do for the money.

Another is that sophiticated governments already have lots of rich administrative program data available, but they are usually very reluctant to allow researchers anwhere near it for fear that the results could be used by the opposition to criticise it in comparsion to their version of the program or their alternate policy solution. Most politicians want evidence not to be more equitable and efficient, but in order to have a stiffer weapon to bash their ideological opponents.

Thirdly, many social programs only have a small number of basic measurements, outcomes, unit cost etc. It can be difficult to argue that more extensive measurement will produce sufficient improvement to justify the cost of implementation. Also, project based programs that have been designed for flexibility won’t have the same baseline for measurement.


wilfred 06.27.12 at 3:19 am

There is a difference between information and scientific data, no? If a person finds himself on the No Fly list, say, there is simply no way to find out why, i.e. the government will not divulge any of the reasons, all of which have everything do with data bases. Thus:

“Data release seems to be steadily occurring, but a greater emphasis on communicating the reasoning made from the data could move the discussion toward transparency in government decision making.”

Given the Obama administrations targeting of whistleblowers, prevention of leaks, etc. wouldn’t it be wise to remember that good intentions matter more than anything else? A fact is a fact, a datum a datum. It’s not just the what you do with them, but the why – that’s what should be transparent.


Salient 06.27.12 at 8:43 pm

Wow, the Run My Code website’s quite a nice fledgling project–easy direct access to the MATLAB code, an interface that let me load sample data to test the code, an FAQ where the coder can answer questions specifically about the code. It’s like open source for academic journals with a user interface.

Quite a lot of that kind of code is available on researchers’ web pages alongside their preprints, at least in my little subniche of math, but even when that’s the case a centralized hub like this site has a handful of advantages. I like that it’s possible to browse code or search a topic, just to get a general sense of implementation techniques. The ability to test the code without having to load Octave and check dependencies or whatever is fantastic. If the FAQ takes off as a well-used feature, it’ll be useful to read through what kinds of questions other more experienced researchers are asking about the computations (the same way I trawl through MathOverflow posts on a given topic to get a sense of what questions others are asking about it).

There’s probably no hope for accurate reasoning information from politicians up for election, but it does seem promising to extend this kind of site from academic papers to various government agencies’ reports, their regulation decisions, and/or their justifications for granting or withholding a license. I’ve always hoped for a website that would allow a user to enter various parameters and then generate the unemployment numbers produced by various metrics, with code I could read through to see how seasonal adjustments are implemented, etc. It seems like that’s the same ‘user interface attached to underlying code’ idea.

Anyway, the site’s is a really cool expansion of what it means to satisfactorily report one’s methodology, and it would be nice to see this kind of implementation extended into the public sector, or at least some subsectors where the concordance is strongest (come to think of it, it’d be especially useful for circumstances in which some of the data itself is confidential).


Systematician 06.27.12 at 9:19 pm

I suggest having a ready-to-hand range of options to avoid discussions stuck on an all-or-nothing model of information release. Some orthogonal dimensions and ranges:

* Time delay — e.g., immediate v. short v. long delay (with different implications for national security, personal embarrassment, etc.)

* Breath of access — e.g., a well-indexed public website v. on request v. oversight committee v. subpoena-like request (with the results presented in open or closed proceedings).

* Comprehensiveness of data — e.g., select formal records v. all written records v. 24/7 audio and video recordings.

Entering discussions with a well-known spectrum of options in the background would position the current, too-frequent default — no disclosure, ever, of anything, to anyone — as an extreme option.


tomslee 06.28.12 at 1:32 am

I like the phrase “evidence based policy when that evidence is scientific in nature” as it puts the focus on something achievable.

Some governments (eg the current Canadian one) will move actively away from evidence-based policy even when that evidence is scientific, but I do think there is a point to demanding a certain standard of behaviour of governments, and that the standard can be raised (otherwise, eg, freedom of information acts would never get passed). Some governments can be shamed into committing themselves to behave in certain ways (and doing so by law), and openness around scientific data driven decisions seems like a good place to sart.


Alex 06.28.12 at 9:17 am

Well, the biggest problem with evidence based policy is that politics isn’t an evidence-based activity, so it is constantly compromised and subjected to policy-based evidence.


Scott Martens 06.28.12 at 9:42 am

“evidence based policy when that evidence is scientific in nature”

How is this different from “we take the consensus of qualified scientists at their word, when things fall within their bailiwick”? I don’t actually think that’s an entirely wrong-headed approach, but let’s call it what it is: a technocratic approach to certain aspects of policy. Consider the objection that would be raised if I’d said “we take the consensus of qualified economists at their word, when things fall within their bailiwick”. People would object pretty strongly to that, including, I’m pretty sure, some economists on CT.

Is it really credible to say “science is different from economics” as a justification? Especially when sometimes it isn’t? And considering that large numbers of people have no notion how physical science and economics differ in methodology and claims to truth (including a few physical scientists and probably some economists), how do you get the public to make that distinction? Millions of Americans think global warming is a self-serving conspiracy of liberal university professors to get themselves grants. Considering the way economics programs are funded, and assuming one knows nothing about how science is conducted, is that really so hard to believe?


gordon 06.29.12 at 11:01 pm

Scott Martens at 12 (and maybe others) might be interested in a post by Prof Thoma on economics as science (or not) here:

Comments on this entry are closed.