Open Data Journalism

by Matthew Yglesias on June 28, 2012

In the practical community of professional journalists writing about political events, the term “open data” is hardly ever in circulation. And yet, to those who are doing the best work it’s an invaluable tool. David Simon succeeded in turning the idea that information age journalists need to learn to “do more with less” into a national joke, but the underlying concept makes perfect sense. The very same information technology revolution that’s undermined the business models of traditional newspapers has done an enormous amount to increase the productivity of working journalists. Open data is an enormous part of that.

Especially for those of us who want to do informed commentary on economic issues, the FRED database and associated tools that the Federal Reserve Bank of St Louis has compiled is invaluable. Its companion set ALFRED that let’s you compare different iterations of the same data series as agencies revise their estimates is, if anything, even more amazing. For example, “it took me about fifteen minutes”: to throw together a chart comparing current GDP estimates for the critical Q3 2007–Q3 2009 period to those available to policymakers in 2009. Debates about the adequacy of the policy response to the recession should be informed by the reality that the economic shrinkage began a full quarter earlier than was contemporaneously known and that the decline during the winter of 2008-2009 was much more severe than people realized.

This basic National Income and Product Account data has always in some sense been available, but the internet and the determination of the FRED team have made it much more available than ever before. And it makes a difference, as FRED outputs are a regular feature on my blog, on Joe Weisenthal’s policy writing at Business Insider, on Ezra Klein’s Wonkblog, on Paul Krugman’s blog for the _New York Times,_ and wherever else on the internet serious economic policy discussion is taking place.

In debates on the value of open data, some put what I think is undue weight on a distinction between commercial and civic activity that the case of journalism tends to undermine. The New York, clearly, is a commercial enterprise that’s also primarily controlled by a founding family that sees it as serving some civic functions. Krugman, personally, is paid for his work but it beggars belief to imagine that he’s driven by purely pecuniary motivations. And journalists of all kinds are dependent, on one level or another, on non-compensated contributions from quoted sources, experts used for background, or freely available data sources. An civic-minded person might want to write for or be quoted in a commercial publication precisely because the engine of commerce is a powerful motive to widely disseminate information.

The fundamental issue is that as the marginal cost of transmitting information falls ever closer to zero, two things happen simultaneously. One is that it becomes increasingly difficult to internalize the value of information-production because the facts (or “facts”) once unleashed into the world tend to spread beyond the control of the producer. The second is that for that very same reason, information becomes more socially valuable. Governments are ideally situation to serve as producers of these goods. In the U.S. debate this is widely acknowledged in the special case of basic scientific research, where there’s a strong bien pensant consensus that subsidies are socially and economically valuable. But the issue has nothing in particular to do with science. As the marginal cost of information distribution falls, market systems increasingly fail to produce it at an optimal level. Governments should step in wherever it seems feasible to do so. The push for “open data” is best viewed as, like scientific research, a particular case of this general principle.

Whether this will actually lead to better politics in the end has more than a little to do with the question of to what extent political decisions are actually driven by information. I’m somewhat skeptical on this score that they are. But even if they aren’t, all you need to believe is that some important decisions of some sort are driven by information to conclude that more production and more open dissemination of data of all kinds is of enormous pote



William Timberman 06.28.12 at 4:11 pm

Information is power. Disinformation is power. Power is power. Making a sane and beneficial synthesis of those three statements is the challenge we face. Open data has changed the battlefield, but not the war. As things stand, I don’t see that we’re making much progress, but then it’s always easier to see what doesn’t work the way it used to, and the ruin of hopes based on it working that way, than it is to see new opportunities. This sequence of posts is the sort of thing that’ll help sort out what we know and what we don’t, and what we can/should do about it. For that, I’m grateful, especially considering that we’ve got the best and the brightest here sharing their perspectives.


Scott Martens 06.28.12 at 5:23 pm

I can see your point about raising the productivity of journalists, although I do have to ask how one can measure such a thing. Background info in ordinary newspapers has become a lot more competent since my youth, I suspect because journalists check Wikipedia when they aren’t sure about something. And yeah, access to key time-series data is pretty important.

But… I look at the recent NYT exposé of New Jersey halfway houses. I doubt the database checks were the biggest time consumer for those journalists. A lot of working journalists (sorry Matt, you very especially included) are working as information filters. Yeah, I can know anything I want to thanks to the ‘Net. But I can’t pay attention to everything, so I read a few papers and blogs and when I care, I look into it deeper. That’s a kind of journalism, but it’s not the only kind.

In linguistics, we have this debate (“shouting match” is closer to the truth, but we’re all now too tired to keep shouting at people not listening) between two kinds of linguistics: introspective and corpus-driven. Introspective methods (parodied as “armchair linguistics”) involved linguists sitting and thinking and most of the time interrogating their own linguistic knowledge, then making theories. Corpus linguists acquire a large body of language data – a corpus – then run all kinds of statistical tests and machine learning techniques against it, and write up their conclusions and theories based on that. The introspective linguists claim that the corpus linguists are being idiots – there is nothing to be learned from data, because far more informative examples can come from your own imagination. Corpus linguists claim introspectionists are just making stuff up without any basis in reality. Neither group is engaged with actual speakers and users of language, so sociolinguists, field linguists, anthropologists, applied linguists and related tradespeople tend to roll their eyes at the lot. This is part of why well-informed language workers tend to hate linguists.

I think of pundits as akin to introspectionists, and ‘Net-driven data wonks as a corpus linguists. I miss the kinds of reporters who went places, talked to people, and came back to tell me things I couldn’t ever have known some other way. I don’t see them a lot. Maybe they were never that commonplace, but I know there used to be some. I don’t want open data to make it cheaper to just replace them with data-grinders who sit in front of a browser all day.


Data Tutashkhia 06.28.12 at 9:22 pm

Krugman practice advocacy journalism. A vast majority of advocacy journalists work for plutocracy. These new fancy ways of massaging data are giving them better opportunities to produce tendentious and misleading arguments.


mpowell 06.28.12 at 9:48 pm

Data @3: Have to disagree here. Those people don’t need data to do their job. They just make stuff up to tell their story when need be. Someone who wants to actually understand what is going on, however, needs solid information. The benefit is assymetric.


Data Tutashkhia 06.28.12 at 9:58 pm


straightwood 06.28.12 at 10:15 pm

Betrayal by elites is now a truism, and the journalistic elites are no exception. It is amusing to contemplate the eagerness of elite institutions to embrace the truth as delivered by Open Everything. The vicious treatment of Julian Assange by the New York Times speaks volumes about what we can expect from such organizations in their approach to dismantling institutional secrecy.


John Quiggin 06.29.12 at 12:31 am

To quote Mark Twain “Truth is mighty and will prevail. There is nothing the matter with this, except that it ain’t so.”

But I prefer mpowell’s view. In the short run, those with power don’t need truth, those who want to challenge it do. And, in the long run, ruling elites that become disconnected from the truth eventually fail.


Antti Nannimus 06.29.12 at 1:54 am


Data is mind stuff, so beware. Information is even worse.

Have a nice day!


Witt 06.29.12 at 2:58 am

The very same information technology revolution that’s undermined the business models of traditional newspapers has done an enormous amount to increase the productivity of working journalists.

As others have noted, this is a rather narrow definition of productivity. While federal datasets are indeed useful — and this would be a good time for anyone who values the American Community Survey data to speak to your Congressperson about Census funding — it is not the existence of datasets, labor-saving though they may be, that make journalists better or more productive.

Indeed, I would go so far as to hypothesize that journalism is like housework — when a labor-saving device is invented, the standards of cleanliness achievement rise simultaneously, so the homemaker/journalist ends up investing the same amount of time to achieve the same level of respect/admiration.

(Also, I see that the lack of copyediting has followed Yglesias from other haunts. In the fourth paragraph, there is an entirely missing word — “The New York____.”)


Walt 06.29.12 at 9:01 am

But we’re not talking about housework. We’re talking about presenting truthful information. If journalists work just as hard, but the quality of information goes up, then that’s a gigantic victory for the public.


Alex 06.29.12 at 10:29 am

FRED is, indeed, wonderful.


Harold 06.29.12 at 5:35 pm

The response of journalists to yesterday’s Supreme Court decision shows that the human brain is still limited in its ability to process data, no matter how abundant. In short, those journalists don’t read and don’t know how to read. An education in the humanities (history, ethical philosophy, poetry, grammar, and rhetoric) is supposed address that issue issue.


Witt 06.30.12 at 1:23 am

If journalists work just as hard, but the quality of information goes up, then that’s a gigantic victory for the public.

Right, but it’s not necessarily the case that access to better data sets, or even time invested in writing articles based on better data sets, is going to result in higher quality of information.

Journalists can put a lot of time into making fancy charts and miss the important point, or can use data to obscure the point rather than to make it (either unintentionally or intentionally).

I feel as though I’m not saying this clearly, so apologies if it’s coming across as cross. I’m trying to get at the idea that simply having a labor-saving device (big data sets) does not necessarily result in better journalism. For example, the standards of what employers consider “good” can go up — which may or may not have anything to do with what is actually good. We’ve all seen gorgeous PowerPoints that were in the end surprisingly contentless.

Here’s an example: No Child Left Behind has resulted in an enormous surge of data, some of it available to journalists. There are certainly increased expectations of journalists by their bosses, that they will create and use data in new and interesting ways.

And yet, in the major urban school district near me, which has been in perpetual and national-headline-grabbing crisis for years now, no journalist has taken the rudimentary step of creating a timeline showing the size of the district’s supposed deficit juxtaposted with contemporaneous quotes from officials’ public statements.

I don’t want to hammer at this particular example too hard — for one thing, there is a nonprofit news entity covering the public schools here that does truly amazing work with data — but the above is the kind of thing that makes me reflexively skeptical of claims like the OP’s apparent contention that more open data = better stories. Maybe, maybe not.


Harold 06.30.12 at 2:05 am

When you have the Gates foundation paying people to lie their heads off about data, it is no longer a public good.


Freshly Squeezed Cynic 06.30.12 at 7:55 pm

I’m just dashing off to work, so I can’t reply to this as fully as I would like to, but it seems like a pretty overoptimistic and facile analysis of the journalism industry as it currently exists.

Productivity in journalism has skyrocketed in the past few decades, it’s true, but that’s not so much to do with the availability. It’s simply that each journalist has been required to do a significantly larger amount of content; Nick Davies notes that the number of pages (and as such, space for content) in most newspapers has increased massively while staff has been slashed. (Trust and family-owned papers like the NYT and the Guardian are somewhat protected from these pressures, but not entirely.) If you’re doing more stories in the same period of time, there’s no guarantee that the quality of the stories will be improved, especially if you have less time to go out on the street, facilitate contacts, talk to people, get different points of view, check things up etc. You’re going to rely more and more on stuff coming in to you than following stories up yourself that might go nowhere. A lot of latitude has disappeared.

The other question is, when you have all this data, what do you do with it? Or more importantly, how do you deal with it? The vast amount of data available necessitates filtering that content lest you be overwhelmed, and all too often journalists, having to fill that story space, turn to helpfully produced information from PR, which have a rather obvious agenda, as well as relying far more often on official sources (which are instantly credible, even if they’re factually incorrect). More data does not mean good data; we saw just how these failings can interact in the news media’s coverage of the run-up to the Iraq War.

FRED and TheyWorkForYou are excellent tools, but David Simon is right.

Comments on this entry are closed.