Cooperation and Corruption

by Clay Shirky on July 2, 2012

tl;dr The Open Data movement is good at improving service, but bad at
rooting out corruption

Tom Slee has done us a favor by “kicking”:http://whimsley.typepad.com/whimsley/2012/05/why-the-open-data-movement-is-a-joke.html “off”:http://whimsley.typepad.com/whimsley/2012/05/open-data-movement-redux-tribes-and-contradictions.html a conversation about the values, goals, and coherence of the Open Data movement. I share his sense that the movement has been a disappointment to date. However, as my principles differ from his, my sense of disappointment, and of what to do about it, differ as well.

Before I get to that, I want to position myself relative to Slee’s three summary assertions about the Open Data movement. (The points are Slee’s; the reactions mine.)

1. It’s not a movement in a political or cultural sense of the word.

I think Slee has this one wrong. In particular, two of his rationales — the Open Data movement has no political goals, and what goals it does have are too variable to cohere — seem to me to be willful attempts to deny use of a word he likes to a movement he doesn’t. Slee commits a sin he accuses the Open Data people of, namely over-focusing on technology and under-focusing on the political aspects of the work. The people improving bus schedules and the people uncovering graft may differ in their aims, but they share core values: they want to reconfigure the relationship between government and citizens concerning what the government knows and citizens don’t. This is an inherently political goal.

2. It’s doing nothing for transparency and accountability in government.

This is trivially wrong — it is plainly doing _something_, as Slee later notes — but his formulation in the body of the essay is more interesting: the net effect of transparency and accountability could end up being negative. I’ll agree with this assertion (though for somewhat different reasons than Slee), and spend the bulk of the essay on it. (I’ll also concentrate on the US case; I admire the work of Canadian participants in the Open Data movement, especially “David Eaves”:http://eaves.ca, but I don’t know enough about the Harper Government to make my comments useful.)

3. It’s co-opting the language of progressive change in pursuit of a small-government-focused subsidy for industry.

This is partly true, in that the Open Data movement does not strongly distinguish between for-profit and non-profit use. Slee’s use of ‘co-opting’ reflects his disapproval of commercial re-use; people who approve of the private sector creating new services with government data would use different language. Slee clearly regards the Commercial Service Delivery quadrant of his map as Mordor; this is where I disagree with him most strongly.

I’ve gotten extraordinary value out of commercial services like Google Maps and Weather Underground, value I don’t think the government could deliver as well. Furthermore, open access to this data limits pre-open-data monopolies of the sort enjoyed by AccuWeather or Westlaw, an improvement pursued most aggressively by Carl Malamud, our Living National Treasure of open data since before the movement had a name. I’ll adopt “the observation made by Tom Lee”:http://sunlightfoundation.com/blog/2012/05/02/defending-the-big-tent-open-data-inclusivity-and-activism/ (not Slee) as my own: “I think it’s flatly wrong to consider private actors’ interest in public data to be uniformly problematic.”

With that out of the way, I’ll say that for me, the Open Data movement has been a net disappointment. In the middle of the last decade, I attended a meeting of the then-nascent movement. We gathered in a loft filled with techies and journalists and good government people, all looking for common ground. It was like a tent revival, so infections was the excitement. The job at hand, or so it then seemed, was to fit every government database with an API (that magical acronym!), whereupon bus schedules would appear on our phones and corrupt politicians would be driven from office.

We got the APIs. We got the bus schedules. The politicians, however, have yet to lose much sleep over open data.

There are several possible explanations for this. Here are some I am explicitly rejecting: I don’t believe corruption in the US is rare. I don’t believe it’s expertly hidden; “Bethany Mclean”:”http://money.cnn.com/2006/01/13/news/companies/enronoriginal_fortune/index.htm helped doom Enron by reading their financial statements carefully. I don’t believe action is impossible; when ProPublica and the LA Times “exposed the incompetence of the California nursing board”:http://www.propublica.org/series/nurses, the Governor fired that entire board the next day.

Instead, I believe the broad failure of the Open Data movement to root out much corruption is tied to organizational failure, or rather failures. So, following Slee, I’ll offer three observations of my own.

1. The institutions that are good with data tend to be bad at story telling.

People don’t consume facts. They consume stories. People who understand the importance of data are generally the people most enthusiastic about interpreting it; as a result, we systematically overestimate how general citizen interest in data actually is. (The best expression of this gap remains Tom Steinberg’s “Asking the wrong question about Data.gov”:http://steiny.typepad.com/premise/2011/04/asking-the-wrong-question-about-datagov.html.

People choose proxies for understanding complex issues, not because they are lazy, but because “we can’t not”:http://www.amazon.com/How-You-Know-Economics-Knowledge/dp/0691137552. They look, for example, for assertions that global climate change is or is not real, rather than searching out charts of temperature charts or maps of sea level. For civil liberties activists and data journalists to have even a fraction of the effect they intend, they will have to set aside the fantasy that telling the truth is enough. They will have to get good at telling true stories, or get good at partnering with organizations that are good at telling those stories.

Which brings me to the second failure.

2. Institutions that are good at story telling tend to be bad with data.

News organizations are paid to tell true stories. Unfortunately, much of this story telling is uninformed by the kind of numeracy that would be required to take advantage of even the simplest open data.

One tiny but illustrative example — newspaper articles often feature statements about income distribution like these:

bq. “Sunnyvale”:http://www.mercurynews.com/bay-area-news/ci_20736434/sunnyvale-top-ten-american-city-raising-kids “boasts an average family income of 123,647.”

bq. “Two companies opening warehouses”:http://readingeagle.com/article.aspx?id=386674 “are expected to employ 1,000 workers, with an average employee’s annual salary of 37,000.”

You would not guess, reading these articles, that a sizable majority of families in Sunnyvale make less than 123,000 a year, or that the salaries of most workers in the new Bethel, PA warehouses will be less than 37,000.

Journalists routinely underreport degrees of financial inequality, because they routinely treat averages as if they represented something the normal participant in the system would see. (It’s like that old joke: Bill Gates walks into a bar and everyone inside becomes a millionaire, on average.) Newspaper style guides, to their credit, clearly define what averages are supposed to mean; editors, to their detriment (and ours) simply do not enforce the correct use of even this basic mathematical concept.

The Open Data movement often puts forward visions of sophisticated, interactive uses of data that provides citizens with valuable insights. This does sometimes happen, as with Dollars for Docs or the visualization of gay rights state by state. But these are rare cases; as appealing as it is to imagine a press corps that exposes new truths by interpreting new data, the normal case is that they do not even correctly express existing truths with existing data.

3. Transparency is often mere translucency.

If all that were going on was a cultural misfit between people who understand data and people who understand narrative, we could improve things with a few kumbaya meetings. The third great obstacle, though, is that powerful actors do not want transparency. I agree with Slee’s “observation”:http://whimsley.typepad.com/whimsley/2012/05/why-the-open-data-movement-is-a-joke.html that “A government can simultaneously be the most secretive…in recent memory and be welcomed into the club of “open government”. Slee talks about the decision by the Canadian government to “abandon StatsCan”:http://www.cbc.ca/news/canada/story/2010/07/21/statistics-canada-quits.html, a decision similar to Republican attempts to “reduce the effectiveness of our census”:http://www.nytimes.com/2012/05/20/sunday-review/the-debate-over-the-american-community-survey.html. The problem is far broader, however.

As Wendy Wagner put it in her 2010 paper “Administrative Law, Filter Failure, and Information Capture”:http://scholarship.law.duke.edu/cgi/viewcontent.cgi?article=1463&context=dlj:

bq. [E]very successful reform movement has its unintended consequences. What few administrative architects anticipated from the new commitment to “sunlight” was that a dense cloud of detailed, technical, and voluminous information would move in to obscure the benefits of transparency.

When we focus on how much data is made available, we create a world where powerful actors can live up to their nominal commitment to openness, while in practice reducing the utility of that data, by making data hard to understand or use, making it inconsistent over time, or producing high volumes of low-quality data while holding back low volumes of high quality data. In extreme cases, as with StatScan, a government can decide not to know certain things, rather than be forced to share that knowledge with citizens.

Now if this were just a technical issue, where laws needed to be written with cleaner specifications, solving these problems might be easy. But the deep problem is this — service delivery involves shared effort between public and private actors, while transparency must be oppositional to be valuable.

The distinction between service and transparency is a distinction between partnership and opposition. For me, this tension, far more than the commercial v.s non-commercial split that so exercises Slee, is the largest problem embedded in the current form and biases of the movement. I’m afraid the success of service delivery, wonderful as it is, has convinced many governments that they can make citizens happy by sharing useful data useful, while preserving secrecy in the very areas where political discipline matters most. The House Appropriations Committee has recently “proposed cutting off bulk access”:http://campaign2012.washingtonexaminer.com/blogs/beltway-confidential/hill-may-freeze-thomas-digital-past/572706 to legislative data. If this proposal succeeds, then the Federal Government will both release more data in 2013 than in 2012 _and_ the actions of our elected representatives will become even harder to oversee.

The likeliest scenario for the service/transparency coalition is that “Lee’s distinction”:http://sunlightfoundation.com/blog/2012/05/02/defending-the-big-tent-open-data-inclusivity-and-activism/ between open-as-in-data.gov and open-as-in-FOIA remains unresolved, with the transparency movement being relegated to second-class citizenship in the Open Data movement. Another scenario is that the movements split — “Code for America”:http://codeforamerica.org/, “See, Click, Fix”:http://seeclickfix.com/, and all the other groups trying to make government data more useful will come to define themselves less around the access to data and more around patterns of use and re-use. This could be salutary for the transparency movement, as the necessarily oppositional character of their efforts would become clearer, though they would also become harder to pursue, in part because of that clarity.

The third possibility, though (and for me, the best argument for the Open Data movement as a movement) is that the two halves of the movement make common cause. This would entail the service people saying to politicians “In order to take more credit for making the public’s life better, you also need to be more transparent about your own behavior.” This probably won’t happen — it will be hard for service-oriented groups to apply this kind of pressure without having it backfire — but it would create a better bundle than we have today.

Long after the last pothole has been seen, clicked, and fixed, I think the legacy of the Open Data movement is going to be assessed on its ability to limit powerful actors. I’m afraid that that legacy will be minimal, in part because I’m afraid the transparency people have brought a knife to a gun fight. It’s possible for private actors to make common cause with elected politicians and career civil servants around snow removal, but useful transparency will always require harder tactics, tactics especially including ways of using open data to tell stories that enrage the public.

{ 15 comments }

1

William Timberman 07.02.12 at 4:12 pm

Yes, we can get bus schedules on our iPhone, but no, we can’t find out what the military-industrial complex is up to in Honduras, or Diego Garcia, nor can we find out who’s on the no-fly list, or why — we just have to go to the airport and hope for the best.

This isn’t a paradox, or a conundrum, it’s just business-as-usual. The metaphor of war-on-this-or-that is overused these days, but if you have to fight one to find out what’s going on, imagine what you’ll have to suffer to do anything about it.

2

Wonks Anonymous 07.02.12 at 5:24 pm

The Enron case doesn’t seem generalizable to other cases of corruption. They were a for-profit company that wasn’t making as much profit as they let on, when speculators get wise they can short the stock and creditors can stop lending money. If they were genuinely profitable but as the result of corruption, they could continue to stay in business.
By the way, your Bethany Mclean link is broken.

3

aepxc 07.02.12 at 6:31 pm

Any sufficiently complex system will always suffer obscurity through complexity. Either it will have the data about it summarised (offering scope to manipulate and conceal undesired facts), or it will produce so much data that most people will not know where to look.

Facing such constraints, it might be useful to take a page out of the playbook of the security services and their anti-terrorism activities. Rather than trying to figure out how to bolster the FOIA, ‘tension’ approach (which is largely the equivalent of the visa application question that asks if you are a terrorist), simply push for the release of more and more data. With a sufficiently big data set, one can then search for patterns to flag for further (more traditional) investigation. The patterns could either be established heuristics for from previously observed malfeasance, or they may simply be set up to look for any possible outliers or idiosyncrasies (since, almost certainly, a person engaged in malfeasance or criminality is doing something that no one else is).

Such pattern seeking mechanisms are all Big Data methods – something that we have not really ever had much practice with until now. Thus, it is not the Open Data movement that has been underwhelming, it is that some have under-appreciated the complexity (and, therefore, the development time) of the ecosystem necessary to make Open Data work. The same way that private companies can profoundly (and unilaterally) violate privacy if they hoover up enough ostensibly anonymous and inconsequential personal data from a large enough variety of sources, so too there will eventually be a greatly diminished scope for government secrecy if governments are pushed to release enough seemingly innocuous data about their activities.

4

Matt 07.02.12 at 8:26 pm

The Open Data movement as described in Wikipedia, and in this seminar, appears not to intersect the older idea of Open Source Intelligence, but it should. I don’t know if there’s a post upcoming about this, but surely Wikileaks, Cryptome, Cary Sublette, Chuck Hansen, the FAS Project on Government Secrecy, Arms Control Wonk, and other oddball Openers of Data deserve recognition even if they don’t attend the same seminars.

Tom Slee previously observed:

Open data advocates commonly address privacy issues by reference to personally identifiable information, but there is no clear dividing line between data that identifies individuals and data that doesn’t. It is well known that the right way to think of privacy when it comes to data made available in a “release and forget” manner (which open data is by definition) is in terms of information entropy or, to be less jargony, in a twenty-questions kind of way. Each question reveals a little more about the subject; no one question tells us what we need to know, but by successive filtering we arrive at the only possible answer.

Happily, this cuts both ways. Chuck Hansen compiled tidbits of data available under FOIA, combined different versions of documents, and applied logical reasoning to uncover nuclear secrets and embarrassments that the FBI thought must have come from stolen classified information.

One secret of Hansen’s success was to obtain various versions of the same document from separate official sources, and compare them. As different security officers censored the papers, some blocked out words that others left untouched. Thus Hansen could compile the most detailed public version.

Even if the Open Data movement as described in this seminar generally takes a less critical, oppositional role toward governments and their secrets, large data sets combined with cross-analysis and reasoning may nonetheless make governments more transparent in ways that they did not intend. It will probably be different analysts making the connections than the first cooperatively-minded developers who thought partnership instead of oversight when they talked different departments into opening data.

When I read The Transparent Society in 1999 I found it thought-provoking but a bit too dire about the capability of governments, corporations, and rich individuals to always violate the privacy of the median citizen. Surely we could update the privacy laws already on the books and craft new ones as necessary to deal with further developments. After the Hysteria on Terror got into full swing, and it became clear that every law was breakable in the name of National Security, I reconsidered. There is no privacy to be begged from the powerful, so it’s time to stop being considerate and granting the benefit of the doubt to the secrets of financially and politically powerful people and institutions. I’ve seen many comments along the lines of “the Wikileaks cables dump didn’t really fix injustices, it just embarrassed the government.” But with the current and historical behavior of the US government in foreign and military affairs, I’m going to call it an intrinsic good when humiliation weakens its power to exercise influence overseas. Open Data advocates, please continue to play nice long enough to get governments to open up, but rest assured that some of us are going to use those fruits in an adversarial, critical, and frankly uncooperative manner — because at this point, poetic justice seems like the only justice on offer.

5

William Timberman 07.02.12 at 11:31 pm

Matt @ 4

Your picture of the future is what I call neuromancy, after William Gibson, who was among the first to conjure with the implications of the kind of data-accessibility that computers and the Internet make possible.

The traditional first defense of the custodians of the status quo is to hide the data that they consider sensitive. In some cases this will be the government, in others it’ll be the tobacco or nuclear industries, or what have you. Once upon a time I was doing some research for my local town council on the effectiveness of copper-smelting waste remediation of the type being proposed by the owner of a 100-year old slag heap on the edge of our town. I found a number of EPA reports on similar projects, which determined that the effectiveness of such projects was questionable, and a number of reports by mining companies, which pronounced them to be totally successful in all cases.

When the council member I was doing the research for looked up my references a month later, the EPA reports had disappeared — from the public Internet, anyway. This was probably a coincidence, as all this took place during the supposedly national security-related withdrawal of documents from public scrutiny which the Bush Administration had announced in the aftermath of 9/11, but it seems indicative of what we may expect as private, politically motivated mining of government data becomes more widespread, and more effective.

If hiding the data doesn’t work, the next step is to publish disinformation of one kind or another. Given that we’ve already been inundated in a deluge of global warming denialism, books by Bob Woodward on presidential angst, reports by Judith Miller on the immanence of mushroom clouds, and benign interpretations of all sorts of the WikiLeaks cable releases, I don’t think any specific examples of this sort of propaganda assault are necessary to make my point.

Finally, if hiding stuff, and making stuff up doesn’t work, it becomes more tempting to shoot the messenger. What’s happened to Julian Assange, or to Thomas Drake or Bradley Manning was meant to send a message. What isn’t clear is whether or not the message that was sent is exactly the message that was received. As I read it, the message was this: What you are doing may or not be politically necessary, but it’s become dangerous. Act accordingly. That may not mean to Gibson’s data cowboys what the government thinks it means.

The result, I suspect, will be a long-running guerrilla war of the kind foreseen by Gibson, and visible already in the U.S. and Europe, and in more extreme forms in China and the Middle East. It may not be visible at all times to all people, but what everyone will be able to see is the destabilized information space which results, as well as all sorts of localized collateral damage, from the replacement of a universally trusted Walter Cronkite with pairs of polar opposites like Rush Limbaugh and Rachel Maddow, to the forced resignation of Shirley Sherrod, or to the suicide of a middle-school student bullied on FaceBook.

Bus schedules, indeed.

6

ponce 07.03.12 at 12:31 am

“Any sufficiently complex system will always suffer obscurity through complexity. Either it will have the data about it summarised (offering scope to manipulate and conceal undesired facts), or it will produce so much data that most people will not know where to look.”

@3 For example, me here two months ago:

“In a few months, after discovering squat, the multi-billion dollar Large Hadron Collider will be shut down for two(read four) years for some very expensive “upgrades.”

The charlatans running it, faced with a worsening European financial picture, are scrambling to come up with a hackneyed cliff hanger “We just about found that Higgs boson, honest!” story to secure the massive funding they need.

Expect a flurry of “Is that it? Sure looks like it to me, and I’m an expert!” stories in the gullible press in the coming months.”

https://crookedtimber.org/2012/05/04/the-chronicle-has-some-splaining-to-do/#comment-413333

And the AP today:

“The focus of the excitement is the Higgs boson, a subatomic particle long sought by physicists.

Researchers at the European Organization for Nuclear Research, or CERN, say that they have compiled vast amounts of data that show the footprint and shadow of the particle, even though it has never actually been glimpsed.”

The value of a theory can be judged by how accurately its prediction are…

7

bianca steele 07.03.12 at 1:15 am

I was interested in the comment to Tom Lee’s post linked above, about the risk that Open Data will reproduce the pathologies of Open Source. My somewhat distant (but direct) observation has been that there is some bitter politics in parts of the open source movement (“politics” in the sense of the interpersonal jockeying that distracts from important things and that no one really likes). It’s difficult from outside to tell what the problem is, though. Is it a repeated personality conflict that’s congealed into bitterness, maybe between groups that like data and groups that like stories, as Clay Shirky might suggest? Is it a conflict between progressives and free-market aficionados, as Tom Slee might suggest? Is it something else? I have no idea.

8

bianca steele 07.03.12 at 1:15 am

By “direct” I mean this is my own observation, not (as “distant” might mean) relayed to me by another person.

9

aepxc 07.03.12 at 11:41 am

@bianca steele, as long as a group is sufficiently diverse (which is generally a good thing) and as long as human beings continue to be inclined to be more aware of the complexities behind their own reasoning than they are of the complexities behind the reasoning of others (one of the most fundamental and widely-spread cognitive biases), there will always be bitter politics. The successful movement, then, will be one designed to work despite such tensions, rather than one that tries to eradicate them via consensus (impossible) or suppression (undesirable).

The problem with FOSS (as with all utopians) is their conclusion that they are so wonderful, loveable, and enlightened that anyone who disagrees with them can only be stupid or evil. Hence you see each faction convinced that the other is counterrevolutionary. With reference to Life of Brian, it’s the Judean People’s Front versus the People’s Front of Judea. The more one believes that one is changing the world (as opposed to creating new methods for dealing with it), the more distraught one will be at any instance of “splitters”. Open Data to me seems, thankfully, more practically-minded.

10

Metatone 07.03.12 at 3:40 pm

I’m not particularly filled with any more optimism than anyone else about being able to force access to data that records wrong doing.

However, I have to take issue with Shirky’s first 2 of 3 thoughts, because I don’t think it accurately represents the organisational failures that we actually see.

Simply, it’s not about a gap between storytellers and data crunchers. As Shirky notes, that could be solved by bringing them together. However, while those in power have in interest in opacity, I don’t think that’s what’s holding us back right now. Rather, there’s a basic lack of investment in data crunching, which is compounded by a lack of investment in sense-making. aexpc @ comment 4 touches on this. The power of “Big Data” was never really that all of a sudden, you open the spreadsheet and instantly solve the problem. Rather it’s about pattern-matching and inferences that lead to more clues that eventually lead to the big story. Sounds quite a lot like investigative journalism.

And that’s the key point, yes, there is a shortage of investigative journalists with data skills, but much more crucially, there’s just a huge current lack of investment in investigative journalism.

We have datasets that can tell us quite a lot already. Talented amateurs often come up with things of interest on their blogs. (The Yorkshire Ranter comes to mind as one such talent.) The big problem is, no-one is hiring him away from his day job and paying for the computing infrastructure that would let him use the new job time in a really effective way.

If we had a new, working model of how investigative journalism gets paid for, then we might find our rulers casting new obstacles in our way. But we might be able to use democracy to challenge those obstacles. However, since we don’t have any effective model to pay for investigation, we’re not even at the races…

11

bianca steele 07.04.12 at 1:53 pm

@9
Saying “there’s always politics” isn’t very helpful in a specific situation and doesn’t say anything about whether that situation is average, better than average, or worse than average. It comes across as condescending. If there’s a specific problem that can be identified as contributing to the conflict (which seemed to me to spread outward from the open source community as people got caught up in it), and we think the problem is one that ought to be fixed, saying “there’s always politics” seems worse than helpful. With due respect to Tom Slee, Clay Shirky, and the other participants discussing this issue, trying to power through it by assuming the conflict is trivial may be counterproductive. Sorry if that is obvious.

And not the right thread, but I keep thinking “everything bad is good for you” and thus it’s probably “good for you” to have bad or insufficient data because it makes you smarter.

12

bianca steele 07.04.12 at 2:07 pm

@Metatone
I can’t quite put my finger on it, but something about the assumption that there’s a Thing called “Investigative Journalism” that exists, independent of brick-and-mortar institutions, and that society needs in order to function so there are always people who perform this function (though of course funding methods for the flesh-and-blood ij’ers vary from decade to decade), is bugging me.

What’s the solution you’re proposing (if you’re proposing something)? For people with data skills to get politicized and put their skills to us? For people who already think like investigative journalists to learn to wield data skills?

13

Harold 07.04.12 at 2:16 pm

I. F. Stone used to say that you could learn a lot from reading the papers.

14

Metatone 07.05.12 at 4:04 pm

@bianca steele

I have no proposal.
I don’t mean that “IJ” exists independent of anything – rather I wanted to contest the idea that skills were the issue.

The Guardian has at least 3 good journalists I know of who are good with data and produce interesting pieces. But The Guardian is not doing great financially. If they were doing great it isn’t that hard to think of people for them to hire to do more with data and journalism.

My conclusion from that is that the problem isn’t data skills or story skills and it also isn’t that data isn’t a useful approach. The problem is that we haven’t got a model for how to fund data work. That’s what we need to fix.

In the meantime? Without money, the best we can do is bring together some of the data people and the story people ad-hoc.

15

Metatone 07.05.12 at 4:06 pm

Oops… that wasn’t very clear.
I mean that skills are there and can be hired, but there is no model for funding.
I also believe that useful things could be done if we could find the funding (contra Shirky.)

Comments on this entry are closed.