by Steven Berlin Johnson on June 27, 2012

Sometime in the early 1840s, a British doctor and statistician named William Farr took control of the Weekly Returns Of Births And Deaths, a publication of the Registrar General’s office where Farr worked. Variants of the Weekly Returns had been published by the state for at least two centuries before Farr took over, but for most of that time the Returns recorded only the name of the newly born or newly dead, and the parish where they resided. But Farr was what we would now call an Open Data advocate, and over time he greatly expanded the information disseminated through the Weekly Returns. By the mid 1850s, the Returns tracked age, cause of death, occupation—even the elevation of the dead’s primary residence. (Farr believed that people living in higher altitudes had healthier lives.) Inspired by a debate with one of his contemporaries, the Soho doctor John Snow, Farr even added information on the deceased’s regular source of drinking water.

I knew nothing about William Farr, or indeed the Weekly Returns, until I sat down to research my book The Ghost Map, which tells the story of John Snow’s brilliant solution to the riddle of cholera, as it emerged in the middle of devastating outbreak in the summer of 1854. Snow is rightly famous for creating a map of the outbreak that helped convince authorities of his waterborne theory of cholera’s origins. But it turned out that Snow was greatly assisted by the data that Farr had accumulated in the Weekly Returns. Indeed, it is an open question whether Snow would have been able to make his case to the authorities—and thus likely save hundreds of thousands of lives around the world—without the additional information he drew from Farr’s dataset.

Farr’s rationale for releasing that data is very much in sync with the argument for open data today. No, it was not a solution in and of itself, and its successes were unpredictable (and often indirect, as in the case of Snow and cholera.) But Farr recognized—as I think many of us have come to understand in a contemporary context—that a much larger network of minds existed outside of government, outside the public health authorities, minds that might perceive patterns in the data that escaped the eyes of the authorities. John Snow happened to have one of those minds: a classic 19th-century amateur intellectual, pursuing the great mystery of cholera as a hobby while he kept his day job as a local doctor and anesthesiologist. Most of the official public health establishment had ignored his ideas about cholera, but Farr’s data helped Snow change their minds in the end.

When I look at most of the Open Government initiatives today, I can’t help but see them as a kind of search probe for all the John Snows out there across the country, unaffiliated with government, but willing and able to solve some small piece of the puzzle. This is good news for at least three reasons. The first is simple enough: we will have better ideas inside of government, and a sharper understanding of the problems that confront us, if more people are focused on those problems, even if that focus comes in their spare time, on someone else’s payroll. This is a core principle in Henry and Cosma’s notion of cognitive democracy; with the right tools, when we expand the density and diversity of minds engaged in solving problems, we get better solutions.

The second benefit is almost as direct. By creating platforms that encourage the John Snows of the world, we make more John Snows. In other words, we expand the ranks of the semi-pros and the hobbyists, the people who spend some part of their live trying to improve their government, beyond just voting every couple of years. The line that divides the politicians/bureaucrats from ordinary citizens becomes more porous. And as the class of part-time participants widens, it attracts more people who have other, equally useful, talents to share. The end result is more engagement, more civic participation, and an increased awareness of the services that states provide, and the challenges they face.

Open data has an additional benefit that is worth mentioning, given “the future of news” debates of recent years. I’ve argued elsewhere that there is a great deal to be optimistic about in terms of long-term journalistic trends, even if part of that story is the slow demise of Newspapers As We Know Them. But I think skeptics like Paul Starr are right to be worried about the fate of the investigative journalist, the city hall reporter that unearths corruption (or, on occasion, showcases civic achievements.) Open data can subtly help us avoid this bleak scenario—not by paying for investigative journalism directly, but rather by making it cheaper. When public data is actually public, the investigative side of being an investigative journalist gets a lot easier, or at least it gets more easily crowdsourced by a large group of amateurs and hobbyists who want to help out. Yes, information abundance meant that the newspapers lost their local advertising monopolies to Craigslist and Groupon, but it also means that the crucial data they used to have to unearth by hanging around City Hall for months is now available to anyone with a Web browser or an API key. We may well have fewer investigative journalists on the payroll of newspapers, but if we play our Open Data cards right, we might well end up with more investigations.



Scott Martens 06.28.12 at 6:22 am

Matt, sounds like a serious abuse of statistics. But yeah, there is a risk that with more data comes more spurious correlations, and picking and choosing from them scores political points. One is always tempted to say “you can’t have more data til you show enough sense and humility to use it wisely” but that’s got a history of not working all that well.


JRHulls 06.28.12 at 6:02 pm

What is really interesting is the connection between Farr, Florence Nightingale and Babbage, who was head of the statistics branch of the British Association for the Advancement of Science, now the British Science association. There’s a good review here:

However, there is an equally significant case in the US. In 1899, the bubonic plague, or Black Death, spread from Hong Kong to San Francisco. Well documented in Marilyn Chase’ excellent 2003 book, The Barbary Plague: The Black Death in Victorian San Francisco, tells a tale of protection of business interests, failure to understand science and statistics, withholding of government funds and the escape of the bubonic plague to the ground squirrels of the East Bay hills. It was only the heroic research and statistical efforts of Rupert Blue, a federal health official, that stopped it being far worse disaster for California. However, he was thwarted in his efforts to stop the spread of the disease beyond San Francisco. From the East Bay, the disease eventually spread all over the American west, where it still occasionally emerges in the rodent population.

Relevant to today’s Supreme Court ruling, Blue was one of the early advocates of universal health care.


piglet 06.28.12 at 6:57 pm

FromArseToElbow 07.01.12 at 6:20 pm

We should be a little cautious about the push for open data. While the principle is obviously sound, and practically democratic, there are those for whom “making it public” ultimately equates to “making it private”.

For example, a lot of the clamour for the opening of NHS data in the UK comes from cheerleaders for Big Pharma. The NHS has a very valuable store of clinical data, built up over decades of public investment, that pharma companies would like to get their hands on. The current dynamics of the pharma industry mean that primary R&D is becoming more problematic. “Enclosing the commons” is an attractive alternative.

For a good example of how the cause of open data can be bastardised (and Karl Popper misrepresented) see this article in the Guardian by a regular Tory stooge:
David Cameron’s science lesson


ajay 07.02.12 at 11:20 am

there are those for whom “making it public” ultimately equates to “making it private”.

Like who? The example you give of the NHS data is not an example of making data private, nor is it similar to enclosing the commons. Have you another example that might be closer to the mark?


FromArseToElbow 07.02.12 at 4:08 pm

@ajay, if you read the article linked to you will notice that the presentation to David Cameron was organised by McKinseys. They are not a pro bono organisation committed to open data as a public good. They are a management consultancy with extensive involvement in privatisation in the UK. To quote from the article (written by a sympathiser, it’s worth noting) …

“Pharma companies used to work with thousands of researchers on staff but now they are sacking those employees and instead working with universities whose research companies can spin off and make their own products … They also hope to reduce the cost of clinical testing for rare diseases by opening up NHS data – people anonymously offering up the study of their genomes for the good of wider society”.

Clearly what is being suggested here is not that NHS data should be made freely available to amateurs to try and develop new drugs in their garden sheds, but that pharma companies (via business-friendly universities, in some cases) should have privileged access. The aim is to lever public investment (in both the NHS and higher education) for private profit.


ajay 07.02.12 at 4:20 pm

Thanks, I did read the article and I know what McKinsey is. (I don’t know who wrote the Guardian article, though; it appears not to be bylined.) But I am still not seeing any evidence for your claim that the pharma companies will have privileged access over non-commercial users such as, say, a university researcher. As I understand it, the data is not at present “public” – it is held by the NHS. If these changes happen, it will be available to commercial-sector research. It will not stop being available to the NHS or anyone else to whom it is currently available, which is what your “enclosing the commons” analogy suggests. Your phrase “privileged access” also suggests that this will be access that other entities will not have, and I’d like to see your evidence for this.

Is your position that the private sector should instead be spending its R&D budgets duplicating the data gathering efforts of the NHS? Why?


FromArseToElbow 07.02.12 at 6:01 pm

@ajay, the article was written by Allegra Stratton, who subsequently left The Guardian to become political editor for BBC Newsnight.

NHS data is currently public in the sense that it is a public good. UK taxpayers have paid for it and the benefit it brings is made available to everyone (“free at the point of use”). Your definition of public seems to be synonymous with “public domain”. My original comment concerned the way that the democratic cry of “open it up to the public” could be used as cover to privately profit from already public goods.

Pharma will have “privileged access” because (in my opinion) such data is unlikely to be wholly opened-up to the public domain for fear of irresponsible use. I suspect that access to NHS clinical data will be practically limited to academic and commercial labs. Increasingly the former are dependent on the pharma sector for funding, as the Guardian article pointed out.

While it is true that this movement does not preclude the NHS from using its own clinical data for research, the growing pressure on public expenditure and the undoubted economies of scale that global pharma enjoys will (in my opinion) lead to further reliance on outsourcing for clincial developments (this trend is already well-established). In other words, the public sector will be a source of data but will be expected to look to the private sector for clincial applications.

Let us imagine for a moment that this is a fair exchange: the NHS provides the data and gets discounted drugs (developed using that data) in return. What the NHS (and thus the public taxpayer) won’t get is a share in the profits of those drugs sold outside the UK. It is this sense that the commons have been enclosed.

